This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine (Cell/B.E.) recently developed at IBM. In particular we focus on three highly popular bioinformatics applications - FAST...
详细信息
ISBN:
(纸本)1424409101
This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine (Cell/B.E.) recently developed at IBM. In particular we focus on three highly popular bioinformatics applications - FASTA, ClustalW, and HMMER. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell/B.E. processing platform. The price and power advantages afforded by the Cell/B.E. processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware. (C) 2008 Elsevier B.V. All rights reserved.
We present here an instantiation of BioProvider, a tool that efficiently provides data for biological applications, tailored to the way BLAST demands data. We briefly discuss some of the factors that may influence dat...
详细信息
ISBN:
(纸本)9783540855569
We present here an instantiation of BioProvider, a tool that efficiently provides data for biological applications, tailored to the way BLAST demands data. We briefly discuss some of the factors that may influence data availability and performances.
Functional RNAs (fRNAs) play a key role in gene regulation, at both the transcriptional and translational levels. Identification of fRNA genes can be difficult, given that some classes of fRNAs (especially microRNAs) ...
详细信息
Functional RNAs (fRNAs) play a key role in gene regulation, at both the transcriptional and translational levels. Identification of fRNA genes can be difficult, given that some classes of fRNAs (especially microRNAs) have short coding regions and do not use classical signals common to protein coding genes. Following on previous experiments to generate pattern recognition models for the identification of fRNA genes using evolved neural networks, here we apply these approaches to the classification of fRNA genes in M . musculus.
In comparative genomics, algorithms that sort permutations by reversals are often used to propose evolutionary scenarios of large-scale genomic mutations between species. One of the main problems of such methods is th...
详细信息
In comparative genomics, algorithms that sort permutations by reversals are often used to propose evolutionary scenarios of large-scale genomic mutations between species. One of the main problems of such methods is that they give one solution while the number of optimal solutions is huge, with no criteria to discriminate among them. Bergeron et al. started to give some structure to the set of optimal solutions, in order to be able to deliver more presentable results than only one solution or a complete list of all solutions. The structure is a way to group solutions into equivalence classes, and to identify in each class one particular representative. However, the design of an algorithm to compute this set of representatives without enumerating all solutions was stated to be an open problem. We propose, in this paper, an answer to this problem, that is, an algorithm which gives one representative for each class of solutions and counts the number of solutions in each class, with a better theoretical and practical complexity than the complete enumeration method. We give an example of how to reduce the number of equivalence classes obtained, using further constraints. Finally, we apply our algorithm to analyze the possible scenarios of rearrangements between mammalian sex chromosomes.
Protein mass spectrometry is an integration of mass spectrometry and biological chip techniques, and it shows great potential for exploration of biomarkers and diagnosis of diseases. But the curse of dimensionality in...
详细信息
Protein mass spectrometry is an integration of mass spectrometry and biological chip techniques, and it shows great potential for exploration of biomarkers and diagnosis of diseases. But the curse of dimensionality inherently from mass spectrometry data makes the dimensionality reduction a necessary phase of proteomic pattern recognition before classification. This paper presents a simulated annealing algorithm to select discriminant feature subsets. Experiments indicate that this wrapper feature selection method performs well and outperforms the other reported methods.
The post-genomic era has witnessed an explosion in the quality, quantity and variety of biological data sequence, structure, and networks. However, when building computational models on these data, some abstractions r...
详细信息
ISBN:
(纸本)9780898716474
The post-genomic era has witnessed an explosion in the quality, quantity and variety of biological data sequence, structure, and networks. However, when building computational models on these data, some abstractions recur often. In particular, graph-based computational models are a powerful, flexible and efficient way of modeling many biological systems. Graph models are used in systems biology where the goal is to understand relationships among biological entities, and in structural bioinformatics where a graph is used to represent the amino acid (or atom) interaction relationships in a protein or the secondary structure base-pairing relationships in RNA. For many of these problems, we can develop algorithms that explore the fact that certain key parameters have complexity dependent on the treewidth of the system, which is typically very small for a variety of biological systems. When treewidth is large, we can still use spectral methods to find biologically sound solutions in an efficient manner.
Dynamic Bayesian networks are of particular interest to reverse engineering of gene regulatory networks from temporal transcriptional data. However, the problem of learning the structure of these networks is quite cha...
详细信息
Dynamic Bayesian networks are of particular interest to reverse engineering of gene regulatory networks from temporal transcriptional data. However, the problem of learning the structure of these networks is quite challenging. This is mainly due to the high dimensionality of the search space that makes exhaustive methods for structure learning not practical. Consequently, heuristic techniques such as Hill Climbing are used for DBN structure learning. Hill Climbing is not an efficient method for this purpose as it is prone to get trapped in local optima and the learned network is not very accurate.
The accurate recognition of translation initiation sites (TISs) is an important stage in genome annotation. Due to the complicated nature of the genetic information and our incomplete understanding of it, TIS predicti...
详细信息
The accurate recognition of translation initiation sites (TISs) is an important stage in genome annotation. Due to the complicated nature of the genetic information and our incomplete understanding of it, TIS prediction remains a challenging undertaking. Many computational approaches have been proposed in the literature, some of which have yielded quite impressive performance. However, most of them either investigate the genomic sequences from one single perspective or apply some static central fusion mechanism on a fixed set of features. In this paper, we extend our previous work which proposed a novel multi-agent architecture for TIS prediction and explore the application of reinforcement learning into the negotiation process. Experimental results on three benchmark data sets have shown the effectiveness and robustness of incorporating reinforcement learning in the system.
Protein methylation is one important type of post-translational modifications of proteins. Experimentally identifying methylation positions in protein sequences is time-consuming and costly. In order to provide insigh...
详细信息
Protein methylation is one important type of post-translational modifications of proteins. Experimentally identifying methylation positions in protein sequences is time-consuming and costly. In order to provide insightful advice and reduce cost for further experiments, we propose a novel granular decision fusion framework based on granular computing, computationalintelligence, and statistical learning. Algorithms are designed under this framework to predict methylation sites. Since methylation sites rarely appeared, the known data are imbalanced. Sampling and clustering is used to create different sub-sets and represent them with cluster centers. Support vector machine (SVM) classifiers are built for these sub datasets. Finally, granular decisions are fused to determine possible methylation sites. Simulation results show that the new granular decision fusion system has high prediction accuracy.
The indispensable prerequisites in characterizing information content of DNA molecules by computational methods are the numerical representations of symbolic DNA sequences. Current numerical representation methods for...
详细信息
The indispensable prerequisites in characterizing information content of DNA molecules by computational methods are the numerical representations of symbolic DNA sequences. Current numerical representation methods for DNA sequences do not contain the genetic code context information, which may play an important role in defining protein coding regions. We propose a novel numerical representation of DNA sequences based on genetic code context within DNA sequences and explore the feasibility of applying this method to identify protein coding regions in genomes. computational experiments indicate that incorporating genetic code information into numerical representations is a promising approach in which DNA sequences are uniquely represented and more information is represented so that digital processing tools can be applied to the periodicity analysis in DNA sequences effectively.
暂无评论