The incongruence between gene trees and species trees is one of the most pervasive challenges in molecular phylogenetics. In this work, a machine learning approach is proposed to overcome this problem. In the machine ...
详细信息
Gene function annotations are key elements in biology and bioinformatics. A typical annotation is the association between a gene and a feature term that describes a functional feature of the gene by using a controlled...
详细信息
ISBN:
(纸本)9781479945351
Gene function annotations are key elements in biology and bioinformatics. A typical annotation is the association between a gene and a feature term that describes a functional feature of the gene by using a controlled vocabulary term (e.g. a Gene Ontology (GO) feature term). Unfortunately, available annotations contain errors and biologically validated ones are incomplete by definition, since new knowledge is continuously discovered. Thus, computational algorithms which are able to provide ranked lists of predicted new gene annotations are an excellent contribution to the bioinformatics research. Here, we propose two variants of the known Latent Dirichlet Allocation (LDA) algorithm applied to the prediction of gene annotations. LDA is a very efficient machine learning method built on a set of multinomial probability distributions over a set of topics, given a document (a gene, in our case), and on a set of multinomial probability distributions over a set of words (feature terms, in our case), given a topic. In topic modeling, a topic can be considered as a latent meta-category of words, and a document as a mixture of topics. Our two LDA variants use the collapsed Gibbs Sampling method during the training phase, with two distinct initialization approaches to adapt the LDA mathematical model to the biomolecular annotation scenario. Using six outdated datasets of GO annotations of human and brown rat genes, we compared the annotations predicted by our methods to the ones given by the truncated Singular Value Decomposition (tSVD) method previously developed;then, we validated them by using the annotations available in an updated version of the same datasets. Obtained results show the efficiency of our new proposed algorithms.
Modern data acquisition has forced the field of large data on the scientific community. This papers gives a rapid technique for clustering data. The technique is based on an off-line process for packing points chosen ...
详细信息
ISBN:
(纸本)9781538613993
Modern data acquisition has forced the field of large data on the scientific community. This papers gives a rapid technique for clustering data. The technique is based on an off-line process for packing points chosen from a data space. Once the off-line process has been run, the clustering may be re-run on different data sets of the same type in linear time. The clustering takes the form of a Voronoi tiling of the data space with the tile centres being the elements of the point packing. The data items within each tile form the clusters. The evolutionary algorithm is an adaptation of one, based on the Conway crossover operator, that has been used to create error correcting codes over the Levenstein metric;the tile centres are a form of code, but over the Euclidean metric. The technique generalizes smoothly to other metric spaces and may be used on any type of data for which a distance metric can be devised. The data set used in this study captures information about codon usage bias in human genes. The clustering is validated by looking for GO term over-representation in the clusters, with significant results.
This book is a contribution of translational and allied research to the proceedings of the International conference on computationalintelligence and Soft Computing. It explains how various computationalintelligence ...
详细信息
ISBN:
(数字)9789811003912
ISBN:
(纸本)9789811003905;9789811003912
This book is a contribution of translational and allied research to the proceedings of the International conference on computationalintelligence and Soft Computing. It explains how various computationalintelligence techniques can be applied to investigate various biological problems. It is a good read for Research Scholars, Engineers, Medical Doctors and bioinformatics researchers.
This paper presents a new algorithm for local alignment search which has less computational complexity than the Smith- Waterman algorithm. Increasing the accuracy of sequence matching and reducing computational comple...
详细信息
This Special Issue includes a selection of papers presented at the 11th International Symposium on bioinformatics Research and Application (ISBRA), which was held at Old Dominion University in Norfolk, VA, USA, on May...
详细信息
This Special Issue includes a selection of papers presented at the 11th International Symposium on bioinformatics Research and Application (ISBRA), which was held at Old Dominion University in Norfolk, VA, USA, on May 7-10, 2015. The ISBRA symposium provides a forum for the exchange of ideas and results among researchers, developers, and practitioners working on all aspects of bioinformatics and computationalbiology and their applications. In 2015, 98 papers were submitted in response to the call for papers, out of which 12 papers were invited to submit extended versions of their conference abstracts to this Special Issue. Selected papers illustrate the variety of applications that computational methods find in the field of nanobioscience, ranging from protein classification to de novo sequencing. Furthermore, selected papers convincingly demonstrate the central role played by computational methods in contemporary nanobioscience research - a role that is bound only to increase in the future. The Guest Editors then briefly describe each of 12 accepted papers.
The phylogenetic inference strategies aim to propose hypotheses to explain the evolutionary relationships for different organisms. These resultant evolutionary histories are often represented as phylogenetic trees. In...
详细信息
ISBN:
(纸本)9781728194684
The phylogenetic inference strategies aim to propose hypotheses to explain the evolutionary relationships for different organisms. These resultant evolutionary histories are often represented as phylogenetic trees. In computer science, the phylogenetic inference has been treated as an optimisation problem. The literature has proposed different criteria to select the optimal tree between the possible topologies. In order to reduce the bias associated to the dependency on the selected criterion, different multi-objective optimisation strategies have been proposed during the last decade. These strategies search by solutions using operators and metrics based on the objective space. However, a recent work concluded that the topological features of the trees (decision space) and the objective space in the multi-objective phylogenetic inference context are not related, becoming phylogeny in a multimodal problem. It means that the current multi-objective strategies could discard solutions from different regions of the decision space, limiting the searching process and the resultant topologies. In this work, we propose a new version of the Memetic algorithm based on an NSGA-II scheme for phylogenetic inference, which include a multimodal operator that considers the diversity of the topologies of the trees based on the decision space to rank the solutions. The inclusion of this operator improved the diversity of solutions according to the decision and the objective space, increasing the hypervolume metric compared to the base version of this memetic algorithm.
In this study we propose an early lung cancer detection methodology using nucleus based features. First the sputum samples from patients are labeled with Tetrakis Carboxy Phenyl Porphine (TCPP) and fluorescent images ...
详细信息
ISBN:
(纸本)9781467358750
In this study we propose an early lung cancer detection methodology using nucleus based features. First the sputum samples from patients are labeled with Tetrakis Carboxy Phenyl Porphine (TCPP) and fluorescent images of these samples are taken. TCPP is a porphyrin that is able to assist in labeling lung cancer cells by increasing numbers of low density lipoproteins coating on the surface of cancer. We study the performance of well know machine learning techniques in the context of lung cancer detection on Biomoda dataset. We obtained an accuracy of 81% using 71 features related to shape, intensity and color in our previous work. By adding the nucleus segmented features we improved the accuracy to 87%. Nucleus segmentation is performed by using Seeded region growing segmentation method. Our results demonstrate the potential of nucleus segmented features for detecting lung cancer.
We present a novel algorithm that combines a recurrent neural network (RNN) and two swarm intelligence (SI) methods to infer a gene regulatory network (GRN) from time course gene expression data. The algorithm uses an...
详细信息
ISBN:
(纸本)1424406234
We present a novel algorithm that combines a recurrent neural network (RNN) and two swarm intelligence (SI) methods to infer a gene regulatory network (GRN) from time course gene expression data. The algorithm uses ant colony optimization (ACO) to identify the optimal architecture of an RNN, while the weights of the RNN are optimized using particle swarm optimization (PSO). Our goal is to construct an RNN whose response mimics gene expression data generated by time course DNA microarray experiments. We observed promising results in applying the proposed hybrid SI-RNN algorithm to infer networks of interaction from simulated and real-world gene expression data.
computational prediction of transcription factor's binding sites and regulatory target genes has great value to the biological studies of cellular process. Existing practices either look into first-hand gene expre...
详细信息
ISBN:
(纸本)1424406234
computational prediction of transcription factor's binding sites and regulatory target genes has great value to the biological studies of cellular process. Existing practices either look into first-hand gene expression data which could be costly for large scale analysis, or apply statistical or heuristic learning methods to discover potential binding sites which have limited accuracy due to the complexity of the data. Based on well-studied information retrieval theories, this paper proposes a novel systematic approach for transcription factor target gene prediction. The key of the approach is to model the prediction problem as a classification task by representing the features of the sequential data into vector data points in a higher-order domain. The proposed approach has produced satisfactory results in our controlled experiment on Auxin Response Factor (ARF) target gene prediction in Arabidopsis.
暂无评论