Essential genes constitute the minimal set of genes an organism needs for its survival. Identification of essential genes is of theoretical interest to genome biologist and has practical applications in medicine and b...
详细信息
Essential genes constitute the minimal set of genes an organism needs for its survival. Identification of essential genes is of theoretical interest to genome biologist and has practical applications in medicine and biotechnology. This paper presents and evaluates machine learning approaches to the problem of predicting essential genes in microbial genomes using solely sequence derived input features. We investigate three different supervised classification methods - Support Vector Machine (SVM), Artificial Neural Network (ANN), and Decision Tree (DT) - for this binary classification task. The classifiers are trained and evaluated using 37830 examples obtained from 14 experimentally validated, taxonomically diverse microbial genomes whose essential genes are known. A set of 52 relevant genomic sequence derived features is used as input for the classifiers. The models were evaluated using novel blind testing schemes Leave-One-Genome-Out (LOGO) and Leave-One-Taxon-group-Out (LOTO) and 10-fold stratified cross validation (10-f-cv) strategy on both the full multi-genome datasets and its class imbalance reduced variants. Experimental results (10 X 10-f-cv) indicate SVM and ANN perform better than DT with Area under the Receiver Operating Characteristics (AU-ROC) scores of 0.80, 0.79 and 0.68 respectively. This study demonstrates that supervised machine learning methods can be used to predict essential genes in microbial genomes by using only gene sequence and features derived from it. LOGO and LOTO Blind test results suggest that the trained classifiers generalize across genomes and taxonomic boundaries.
Atopobium parvulum (Weinberg et al. 1937) Collins and Wallbanks 1993 comb. nov. is the type strain of the species and belongs to the genomically yet unstudied Atopobium/Olsenella branch of the family Coriobacteriaceae...
详细信息
Atopobium parvulum (Weinberg et al. 1937) Collins and Wallbanks 1993 comb. nov. is the type strain of the species and belongs to the genomically yet unstudied Atopobium/Olsenella branch of the family Coriobacteriaceae. The species A. parvulum is of interest because its members are frequently isolated from the human oral cavity and are found to be associated with halitosis (oral malodor) but not with periodontitis. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Atopobium, and the 1,543,805 bp long single replicon genome with its 1369 protein-coding and 49 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.
Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biologicaldata resources such as the integrated microbial genome (IMG) syste...
详细信息
Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biologicaldata resources such as the integrated microbial genome (IMG) system at the joint genome institute (JGI). For a genome sequencing center to provide multiple-genome comparison capabilities, it must keep pace with exponentially growing collection of sequence data, both from its own genomes, and from public genomes. We present an example of how ScalaBLAST, a high-throughput sequence analysis program, harnesses increasingly critical high-performance computing to perform sequence analysis, enabling, for example, all vs. all BLAST runs across 2 million protein sequences within a day using thousands of processors as opposed to conventional comparison methods that would take years to complete.
The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected...
详细信息
The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity for microbial communities, needs to be conducted in the context of a comprehensive datamanagement and analysis system. We present in this paper IMG/M, an experimental metagenome datamanagement and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context. IMG/M is available at http://***/m. CONTACT: ***
biologicaldatamanagement includes the traditional areas of data generation, acquisition, modelling, integration, and analysis. Although numerous academic biologicaldatamanagement systems are currently available, e...
ISBN:
(纸本)9781595931542
biologicaldatamanagement includes the traditional areas of data generation, acquisition, modelling, integration, and analysis. Although numerous academic biologicaldatamanagement systems are currently available, employing them effectively remains a significant challenge. We discuss how this challenge was addressed in the course of developing the Integrated Microbial Genomes (IMG) system for comparative analysis of microbial genome data.
biologicaldatamanagement includes the traditional areas of data generation, acquisition, modelling, integration, and analysis. Although numerous academic biologicaldatamanagement systems are currently available, e...
详细信息
ISBN:
(纸本)1595931546
biologicaldatamanagement includes the traditional areas of data generation, acquisition, modelling, integration, and analysis. Although numerous academic biologicaldatamanagement systems are currently available, employing them effectively remains a significant challenge. We discuss how this challenge was addressed in the course of developing the Integrated Microbial Genomes (IMG) system for comparative analysis of microbial genome data.
暂无评论