Gene function annotations are key elements in biology and bioinformatics. A typical annotation is the association between a gene and a feature term that describes a functional feature of the gene by using a controlled...
详细信息
ISBN:
(纸本)9781479945351
Gene function annotations are key elements in biology and bioinformatics. A typical annotation is the association between a gene and a feature term that describes a functional feature of the gene by using a controlled vocabulary term (e.g. a Gene Ontology (GO) feature term). Unfortunately, available annotations contain errors and biologically validated ones are incomplete by definition, since new knowledge is continuously discovered. Thus, computational algorithms which are able to provide ranked lists of predicted new gene annotations are an excellent contribution to the bioinformatics research. Here, we propose two variants of the known Latent Dirichlet Allocation (LDA) algorithm applied to the prediction of gene annotations. LDA is a very efficient machine learning method built on a set of multinomial probability distributions over a set of topics, given a document (a gene, in our case), and on a set of multinomial probability distributions over a set of words (feature terms, in our case), given a topic. In topic modeling, a topic can be considered as a latent meta-category of words, and a document as a mixture of topics. Our two LDA variants use the collapsed Gibbs Sampling method during the training phase, with two distinct initialization approaches to adapt the LDA mathematical model to the biomolecular annotation scenario. Using six outdated datasets of GO annotations of human and brown rat genes, we compared the annotations predicted by our methods to the ones given by the truncated Singular Value Decomposition (tSVD) method previously developed;then, we validated them by using the annotations available in an updated version of the same datasets. Obtained results show the efficiency of our new proposed algorithms.
Modern data acquisition has forced the field of large data on the scientific community. This papers gives a rapid technique for clustering data. The technique is based on an off-line process for packing points chosen ...
详细信息
ISBN:
(纸本)9781538613993
Modern data acquisition has forced the field of large data on the scientific community. This papers gives a rapid technique for clustering data. The technique is based on an off-line process for packing points chosen from a data space. Once the off-line process has been run, the clustering may be re-run on different data sets of the same type in linear time. The clustering takes the form of a Voronoi tiling of the data space with the tile centres being the elements of the point packing. The data items within each tile form the clusters. The evolutionary algorithm is an adaptation of one, based on the Conway crossover operator, that has been used to create error correcting codes over the Levenstein metric;the tile centres are a form of code, but over the Euclidean metric. The technique generalizes smoothly to other metric spaces and may be used on any type of data for which a distance metric can be devised. The data set used in this study captures information about codon usage bias in human genes. The clustering is validated by looking for GO term over-representation in the clusters, with significant results.
Omics refers to a field of study in biology such as genomics, proteomics, and metabolomics. Investigating fundamental biological problems based on omics data would increase our understanding of bio-systems as a whole....
详细信息
ISBN:
(纸本)9781467358750
Omics refers to a field of study in biology such as genomics, proteomics, and metabolomics. Investigating fundamental biological problems based on omics data would increase our understanding of bio-systems as a whole. However, omics data is characterized with high-dimensionality and unbalance between features and samples, which poses big challenges for classical statistical analysis and machine learning methods. This paper studies a minimal-redundancy-maximal-relevance (MRMR) feature selection for omics data classification using three different relevance evaluation measures including mutual information (MI), correlation coefficient (CC), and maximal information coefficient (MIC). A linear forward search method is used to search the optimal feature subset. The experimental results on five real-world omics datasets indicate that MRMR feature selection with CC is more robust to obtain better (or competitive) classification accuracy than the other two measures.
This proceedings presents recent practical applications of computationalbiology and bioinformatics. It contains the proceedings of the 9th International conference on Practical Applications of computationalbiology &...
ISBN:
(数字)9783319197753;9783319197760
ISBN:
(纸本)9783319197753;9783319197760
This proceedings presents recent practical applications of computationalbiology and bioinformatics. It contains the proceedings of the 9th International conference on Practical Applications of computationalbiology & bioinformatics held at University of Salamanca, Spain, at June 3rd-5th, 2015. The International conference on Practical Applications of computationalbiology & bioinformatics (PACBB) is an annual international meeting dedicated to emerging and challenging applied research in bioinformatics and computationalbiology. Biological and biomedical research are increasingly driven by experimental techniques that challenge our ability to analyse, process and extract meaningful knowledge from the underlying data. The impressive capabilities of next generation sequencing technologies, together with novel and ever evolving distinct types of omics data technologies, have put an increasingly complex set of challenges for the growing fields of bioinformatics and computationalbiology. The analysis of the datasets produced and their integration call for new algorithms and approaches from fields such as Databases, Statistics, Data Mining, Machine Learning, Optimization, Computer Science and Artificial intelligence. Clearly, biology is more and more a science of information requiring tools from the computational sciences.
The incongruence between gene trees and species trees is one of the most pervasive challenges in molecular phylogenetics. In this work, a machine learning approach is proposed to overcome this problem. In the machine ...
详细信息
This special section includes a selection of eight papers presented at the 10th International conference on Intelligent Computing (ICIC) held in Taiyuan, China, on August 3–6, 2014.
This special section includes a selection of eight papers presented at the 10th International conference on Intelligent Computing (ICIC) held in Taiyuan, China, on August 3–6, 2014.
Technological advances in DNA sequencing due to Next Generation Sequencing (NGS) technology revolutionized research in many areas including medicine. bioinformatics as a science has developed to address computational ...
详细信息
ISBN:
(纸本)9781509055104
Technological advances in DNA sequencing due to Next Generation Sequencing (NGS) technology revolutionized research in many areas including medicine. bioinformatics as a science has developed to address computational challenges related to the analyses of large amounts of data generated by NGS technology. Consequently, educators faced challenges in developing effective methods to teach bioinformatics. This paper presents a pilot study, in which an interdisciplinary collaborative learning approach combined two sister courses in bioinformatics for computer science and biology students concurrently. Projectbased research that included the analysis of NGS data was incorporated in these courses and resulted in a journal article. The present paper will contribute to the ongoing dialog between the educators about the most effective ways of teaching bioinformatics.
In this study we propose an early lung cancer detection methodology using nucleus based features. First the sputum samples from patients are labeled with Tetrakis Carboxy Phenyl Porphine (TCPP) and fluorescent images ...
详细信息
ISBN:
(纸本)9781467358750
In this study we propose an early lung cancer detection methodology using nucleus based features. First the sputum samples from patients are labeled with Tetrakis Carboxy Phenyl Porphine (TCPP) and fluorescent images of these samples are taken. TCPP is a porphyrin that is able to assist in labeling lung cancer cells by increasing numbers of low density lipoproteins coating on the surface of cancer. We study the performance of well know machine learning techniques in the context of lung cancer detection on Biomoda dataset. We obtained an accuracy of 81% using 71 features related to shape, intensity and color in our previous work. By adding the nucleus segmented features we improved the accuracy to 87%. Nucleus segmentation is performed by using Seeded region growing segmentation method. Our results demonstrate the potential of nucleus segmented features for detecting lung cancer.
computational prediction of transcription factor's binding sites and regulatory target genes has great value to the biological studies of cellular process. Existing practices either look into first-hand gene expre...
详细信息
ISBN:
(纸本)1424406234
computational prediction of transcription factor's binding sites and regulatory target genes has great value to the biological studies of cellular process. Existing practices either look into first-hand gene expression data which could be costly for large scale analysis, or apply statistical or heuristic learning methods to discover potential binding sites which have limited accuracy due to the complexity of the data. Based on well-studied information retrieval theories, this paper proposes a novel systematic approach for transcription factor target gene prediction. The key of the approach is to model the prediction problem as a classification task by representing the features of the sequential data into vector data points in a higher-order domain. The proposed approach has produced satisfactory results in our controlled experiment on Auxin Response Factor (ARF) target gene prediction in Arabidopsis.
The eight papers in this special section were presented at the 14th Asia Pacific bioinformaticsconference (APBC2016), which was held in San Francisco, USA, 11-13 January 2016.
The eight papers in this special section were presented at the 14th Asia Pacific bioinformaticsconference (APBC2016), which was held in San Francisco, USA, 11-13 January 2016.
暂无评论