We review unnormalized and normalized information distances based on incomputable notions of Kolmogorov complexity and discuss how Kolmogorov complexity can be approximated by data compression algorithms. We argue tha...
详细信息
ISBN:
(纸本)0780382536
We review unnormalized and normalized information distances based on incomputable notions of Kolmogorov complexity and discuss how Kolmogorov complexity can be approximated by data compression algorithms. We argue that optimal algorithms for data compression with side information can be successfully used to approximate the normalized distance. Next, we discuss an alternative information distance, which is based on relative entropy rate (also known as Kullback-Leibler divergence), and compression-based algorithms for its estimation. We conjecture that in bioinformatics and computational Linguistics this alternative distance is more relevant and important than the ones based on Kolmogorov complexity.
Summary form only given. We are witnessing the emergence of the "data rich" era in biology. The myriad data in biology ranging from sequence strings to complex phenotypic and disease-relevant data pose a hug...
详细信息
ISBN:
(纸本)0780384393
Summary form only given. We are witnessing the emergence of the "data rich" era in biology. The myriad data in biology ranging from sequence strings to complex phenotypic and disease-relevant data pose a huge challenge to modern biology. The standard paradigm in biology that deals with hypothesis to experimentation (low throughput data) to models is being gradually replaced by data to hypothesis to models and experimentation to more data and models. And unlike data in physical sciences, that in biological sciences is almost guaranteed to be highly heterogeneous and incomplete. In order to make significant advances in this data rich era, it is essential that there be robust data repositories that allow interoperable navigation, query and analysis across diverse data, and a plug-and-play tools environment that will facilitate seamless interplay of tools and data. Further, the integrated data will enable the reconstruction and modeling of biological systems. This talk with address several of the challenges posed by enormous need for scientific data integration and modeling in biology with specific exemplars and possible strategies. The issues addressed will include: architecture of data and knowledge repositories; flat, relational and object-oriented databases ; ontologies in biology; reduction and analysis of data; legacy knowledge integration with data and systems level modeling in biology.
The systems biology is a recent scientific discipline that arose from the need to combine biology with mathematics, physics, chemistry and computer science. Partly driven by the availability of a morass of data and pa...
详细信息
ISBN:
(纸本)076952138X
The systems biology is a recent scientific discipline that arose from the need to combine biology with mathematics, physics, chemistry and computer science. Partly driven by the availability of a morass of data and partly driven by the availability of computational resources, the field of systems biology was reborn few years ago. Recently, a number of computational methods have been developed to model cellular pathways and networks. One of the major issues in building mathematical models of cellular processes is the difficulty of estimating parameters. Due to a wide numerical range within which parameters operate, it takes a large number of iterations to find the biologically relevant values. In this paper I will describe how grid technology can be used to salvage this situation and help in building robust in-silico models.
The proceedings contain 31 papers from the Proceedings of the Third ieee International conference on Cognitive Informatics ICCI 2004. The topics discussed include: language, logic, and the brain;on autonomous computin...
详细信息
ISBN:
(纸本)0769521908
The proceedings contain 31 papers from the Proceedings of the Third ieee International conference on Cognitive Informatics ICCI 2004. The topics discussed include: language, logic, and the brain;on autonomous computing and cognitive processes;is entropy suitable to characterize data and signals for cognitive informatics?;on the cognitive informatics foundations of software engineering;a parallel language for cognitive informatics;concept formation and learning: a cognitive informatics perspective;specification of the RTPA grammar and its recognition;formal description of the cognitive process of problem solving;and cognitive bioinformatics: computational cognitive model for dynamic problem solving.
作者:
D. PuiuCSBC
Virginia Commonwealth University USA
SUPERCONTIGS is a genome-finishing tool that orders, orients and groups contigs based on clone pair information and an alignment with a related genome. The program can be used in the genome finishing and gap closing p...
详细信息
SUPERCONTIGS is a genome-finishing tool that orders, orients and groups contigs based on clone pair information and an alignment with a related genome. The program can be used in the genome finishing and gap closing process. SUPERCONTIGS is a Perl script that runs on Unix-like systems. It is available at http://***/csbc/bccl/*** as a copy-left public domain program.
Guanine-rich sequences, including those that form G-quadruplexes, are abundant in regions of biological significance. In order to map putative G-quadruplex elements within mammalian genes we have created a suite of co...
详细信息
Guanine-rich sequences, including those that form G-quadruplexes, are abundant in regions of biological significance. In order to map putative G-quadruplex elements within mammalian genes we have created a suite of computational tools. This suite contains algorithms to search genes for occurrences of the G-quadruplex motif and determine their distribution near RNA processing sites.
This paper describes a novel approach of computing pairwise sequence alignment for guarantee that the generated alignment satisfies some particular blocks should be aligned together. This approach can increase the acc...
详细信息
This paper describes a novel approach of computing pairwise sequence alignment for guarantee that the generated alignment satisfies some particular blocks should be aligned together. This approach can increase the accuracy of the resulting alignment by incorporating prior knowledge about the sequences. It brings into the alignment the method used by biologists which consists in forcing the alignment of occurrences certain biological structures or the function. In the first part, the problem of this algorithm is outlined. Later, a general introduction of related work is described. Finally, a modified Needleman-Wunsch algorithm and experimental results are presented. The importance of this process lies on its effect of keeping common blocks in sequences to be aligned accurately.
Summary form only given. The maturation of high-throughput technologies and the availability of whole genome sequences make it possible to apply holistic computational approaches to the study of biological systems. Th...
详细信息
Summary form only given. The maturation of high-throughput technologies and the availability of whole genome sequences make it possible to apply holistic computational approaches to the study of biological systems. The use of high-throughput technologies requires the development of advanced computational methods and tools that would enable the elicitation of significant biological knowledge from the vast amounts of data generated by these methods. Our group has been developing a battery of such methodologies and incorporated some of them in several tools such as CLICK, PRIMA, SAMBA, EXPANDER, SHARP, Binding Site Evolution and MetaReg.
The trypanosomes are a class of eukaryotic parasites that diverged from Saccharomyces cerevisiae about 800 million years ago. Many possible gene structures are present within these genomes but most appear to be non-fu...
详细信息
The trypanosomes are a class of eukaryotic parasites that diverged from Saccharomyces cerevisiae about 800 million years ago. Many possible gene structures are present within these genomes but most appear to be non-functional and do not code for proteins. Initial analyses of these genomes suggest that over 70% of the putative genes have no biological function.
The Synechococcus WH8102 knowledge base (http://***/WH8J02) is a Web based relational database developed to facilitate computational effort to reconstruct regulatory pathways and serve as a gateway for biologist to ac...
详细信息
ISBN:
(纸本)0769521940
The Synechococcus WH8102 knowledge base (http://***/WH8J02) is a Web based relational database developed to facilitate computational effort to reconstruct regulatory pathways and serve as a gateway for biologist to access the data. It is the repertoire that integrates a variety of knowledge derived both from literature and computational prediction. Those data are organized in hierarchical fashion. The basic building blocks are functional annotation and structure prediction of individual molecule. Those data are then organized into clusters based on computationally predicted operon, regulon and molecular complexes. Finally all data are complied into pathways derived from combined efforts of literature mining and computational prediction. A number of tools have been developed to facilitate the data retrieval including a SQL query engineer and several viewers to browse genome, molecular complexes and pathways.
暂无评论