Many computational and systems biology challenges, in particular those related to big data analysis, can be formulated as optimization problems and therefore can be addressed using heuristics. Beside the typical optim...
详细信息
ISBN:
(纸本)9783319244624;9783319244617
Many computational and systems biology challenges, in particular those related to big data analysis, can be formulated as optimization problems and therefore can be addressed using heuristics. Beside the typical optimization problems, formulated with respect to a single target, the possibility of optimizing multiple objectives (MO) is rapidly becoming more appealing. In this context, MO Evolutionary Algorithms (MOEAs) are one of the most widely used classes of methods to solve MO optimization problems. However, these methods can be particularly demanding from the computational point of view and, therefore, effective parallel implementations are needed. This fact, together with the wide diffusion of powerful and low-cost general-purpose Graphics Processing Units, promoted the development of software tools that focus on the parallelization of one or more computational phases among the steps characterizing MOEAs. In this paper we present a fine-grained parallelization of the Fast Non-dominating Sorting Genetic Algorithm (NSGA-II) for the CUDA architecture. In particular, we will discuss how this solution can be exploited to solve multi-objective optimization task in the field of computational and systems biology.
Rapid advances in genomic sequencing technology have resulted in a data deluge in biology and bioinformatics. This increase in data volumes has introduced computational challenges for frequently performed sequence ana...
详细信息
ISBN:
(纸本)9781509021413
Rapid advances in genomic sequencing technology have resulted in a data deluge in biology and bioinformatics. This increase in data volumes has introduced computational challenges for frequently performed sequence analytics routines such as DNA and protein homology searches, these must also preferably be done in real-time. In this paper, we propose a scalable and similarity-aware distributed storage framework, Mendel, that enables retrieval of biologically significant DNA and protein alignments against a voluminous genomic sequence database. Mendel fragments the sequence data and generates an inverted-index, which is then dispersed over a distributed collection of machines using a locality aware distributed hash table. A novel distributed nearest neighbor search algorithm identifies sequence segments with high similarity and extending them to find an alignment. This paper includes an empirical evaluation of the performance, sensitivity, and scalability of the proposed system versus the National Center for Biotechnology Information's non-redundant protein dataset. Mendel demonstrates higher sensitivity and faster query evaluations when compared to other modern frameworks.
Recent advances in RNA research and the steady growth of available RNA structures call for bioinformatics methods for handling and analyzing RNA structural data. Recently, we introduced SETTER-a fast and accurate meth...
详细信息
Recent advances in RNA research and the steady growth of available RNA structures call for bioinformatics methods for handling and analyzing RNA structural data. Recently, we introduced SETTER-a fast and accurate method for RNA pairwise structure alignment. In this paper, we describe MultiSETTER, SETTER extension for multiple RNA structure alignment. MultiSETTER combines SETTER's decomposition of RNA structures into non-overlapping structural subunits with the multiple sequence alignment algorithm ClustalW adapted for the structure alignment. The accuracy of MultiSETTER was assessed by the automatic classification of RNA structures and its comparison to SCOR annotations. In addition, MultiSETTER classification was also compared to multiple sequence alignment-based and secondary structure alignment-based classifications provided by LocARNA and RNADistance tools, respectively. MultiSETTER precompiled Windows libraries, as well as the C++ source code, are freely available from http://***/multisetter.
The construction and understanding of Gene Regulatory Networks (GRNs) are among the hardest tasks faced by systems biology. To infer gene regulatory networks from gene expression data has been a vigorous research area...
详细信息
The construction and understanding of Gene Regulatory Networks (GRNs) are among the hardest tasks faced by systems biology. To infer gene regulatory networks from gene expression data has been a vigorous research area. It aims to constitute an intermediate step from exploratory to gene expression analysis. In recent years, many reverse engineering methods have been proposed. In practice, different model approaches will generate different network structures. Therefore, it is very important for users to assess the performance of these algorithms. We present a comparative study with three different reverse engineering methods, including the S-system Parameter Estimation Method (SPEM), the Graphical Gaussian Model (GGM) and the TimeDelay-ARACNE. Our approach consists of the analysis of real gene expression data with the different methods, and the assessment of algorithmic performances by sensitivity, specificity, precision and F-score.
Motif finding is a computationally expensive procedure subject to noise and false positives, but of major importance in understanding gene expression and cancer. Several authors argued in favor of using higher order b...
详细信息
ISBN:
(纸本)9781479999118
Motif finding is a computationally expensive procedure subject to noise and false positives, but of major importance in understanding gene expression and cancer. Several authors argued in favor of using higher order background models to better discriminate motifs. This paper studies the effect of using Markov higher order models in three commonly used algorithms to identify the ZFX transcription factor's binding sites from a mouse embryonic stem cells dataset. We conclude that there are particular Markov orders that yield improved outcomes for each algorithm.
Recent advances in molecular biology, bioinformatics techniques have brought to an explosion of the information about the spatial organisation of the DNA inside the nucleus. In particular, 3C-based techniques are reve...
详细信息
Recent advances in molecular biology, bioinformatics techniques have brought to an explosion of the information about the spatial organisation of the DNA inside the nucleus. In particular, 3C-based techniques are revealing the genome folding for many different cell types,, permit to create a more effective representation of the disposition of genes in the three-dimensional space. This information can be used to re-interpret heterogeneous genomic data (multi-omic) relying on 3D maps of the chromosome. The storage, computational requirements needed to accomplish such operations on raw sequenced data have to be fulfilled using HPC solutions,, the the Cloud paradigm is a valuable, convenient mean for delivering HPC to bioinformatics. In this work we describe a data analysis work-flow that allows the integration, the interpretation of multi-omic data on a sort of "topographical" nuclear map, capable of representing the effective disposition of genes in a graph-based representation. We propose a cloud-based task farm pattern to orchestrate the services needed to accomplish genomic data analysis, where each service represents a special-purpose tool, playing a part in well known data analysis pipelines.
The latest issue of Transactions On computationalbiology and bioinformatics (TCBB) contained extended versions of works that were presented at the Brazilian symposium on bioinformatics 2013 (BSB 2013), which was held...
The latest issue of Transactions On computationalbiology and bioinformatics (TCBB) contained extended versions of works that were presented at the Brazilian symposium on bioinformatics 2013 (BSB 2013), which was held in Recife, Brazil from November 3-6, 2013. Eighteen papers were presented at BSB 2013 out of which four of these were invited for submission to a special section of TCBB. These four manuscripts were carefully evaluated by 11 international experts. These four papers cover a wide arc of results, ranging from the most theoretical to the practical and emerging as an excellent showcase of the work presented at BSB 2013, including computationalbiology.
Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times s...
详细信息
Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.
Protein alignment is a basic step for many molecular biology researches. The BLOSUM matrices, especially BLOSUM62, are the de facto standard matrices for protein alignments. However, after widely utilization of the ma...
详细信息
Protein alignment is a basic step for many molecular biology researches. The BLOSUM matrices, especially BLOSUM62, are the de facto standard matrices for protein alignments. However, after widely utilization of the matrices for 15 years, programming errors were surprisingly found in the initial version of source codes for their generation. And amazingly, after bug correction, the "intended" BLOSUM62 matrix performs consistently worse than the "miscalculated" one. In this paper, we find linear relationships among the eigenvalues of the matrices and propose an algorithm to find optimal unified eigenvectors. With them, we can parameterize matrix BLOSUMx for any given variable x that could change continuously. We compare the effectiveness of our parameterized isentropic matrix with BLOSUM62. Furthermore, an iterative alignment and matrix selection process is proposed to adaptively find the best parameter and globally align two sequences. Experiments are conducted on aligning 13,667 families of Pfam database and on clustering MHC II protein sequences, whose improved accuracy demonstrates the effectiveness of our proposed method.
The use of mobile devices grow continuously in many aspects of our everyday lives. It is essential that bioinformatics and biomedicine adapt to this trend because these platforms can provide universal access to comput...
详细信息
暂无评论