Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the integrated microbial genome (IMG) syste...
详细信息
Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the integrated microbial genome (IMG) system at the joint genome institute (JGI). For a genome sequencing center to provide multiple-genome comparison capabilities, it must keep pace with exponentially growing collection of sequence data, both from its own genomes, and from public genomes. We present an example of how ScalaBLAST, a high-throughput sequence analysis program, harnesses increasingly critical high-performance computing to perform sequence analysis, enabling, for example, all vs. all BLAST runs across 2 million protein sequences within a day using thousands of processors as opposed to conventional comparison methods that would take years to complete.
Recent research has demonstrated the utility of using supervised classification systems for automatic identification of low quality microarray data. However, this approach requires annotation of a large training set b...
详细信息
Recent research has demonstrated the utility of using supervised classification systems for automatic identification of low quality microarray data. However, this approach requires annotation of a large training set b...
详细信息
Recent research has demonstrated the utility of using supervised classification systems for automatic identification of low quality microarray data. However, this approach requires annotation of a large training set by a qualified expert. In this paper we demonstrate the utility of an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and naive Bayes classification. On our test set, this system exhibits performance comparable to that of an analogous supervised learner constructed from the same training data.
The RCSB Protein data Bank has developed a portal for structural genomics resources at http://***. Reports about the worldwide contributing centers are available, including summary reports for target lists, target sta...
详细信息
The RCSB Protein data Bank has developed a portal for structural genomics resources at http://***. Reports about the worldwide contributing centers are available, including summary reports for target lists, target status progress, targets in the PDB, and sequence redundancy analyses, and links to each center's resources.
Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely addres...
Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.
暂无评论