咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Distributed memory building bl... 收藏
Distributed memory building blocks for massive biological se...

Distributed memory building blocks for massive biological sequence analysis

作     者:Pan, Tony C. 

作者单位:Georgia Institute of Technology 

学位级别:博士

导师姓名:Srinivas Aluru

授予年度:2018年

主      题:High performance computing Bioinformatics K-mer index K-mer counting De bruijn graph Next generation sequencing Parallel algorithms Distributed memory Distributed algorithms SIMD vectorization Cache friendly algorithms MPI 

摘      要:K-mer indices and de Bruijn graphs are important data structures in bioinformatics with multiple applications ranging from foundational tasks such as error correction, alignment, and genome assembly, to knowledge discovery tasks including repeat detection and SNP identification. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the data sets at the current generation rate of 1.8 terabases every 3 days. The volume and velocity with which sequencing data is generated necessitate efficient algorithms and implementation of k-mer indices and de Bruijn graphs, two central components in bioinformatic applications. Existing applications that utilize k-mer counting and de Bruijn graphs, however, tend to provide embedded, specialized implementations. The research presented here represents efforts toward the creation of the first reusable, flexible, and extensible distributed memory parallel libraries for k-mer indexing and de Bruijn graphs. These libraries are intended for simplifying the development of bioinformatics applications for distributed memory environments. For each library, our goals are to create a set of API that are simple to use, and provide optimized implementations based on efficient parallel algorithms. We designed algorithms that minimize communication volume and latency, and developed implementations with better cache utilization and SIMD vectorization. We developed Kmerind, a k-mer counting and indexing library based on distributed memory hash table and distributed sorted arrays, that provide efficient insert, find, count, and erase operations. For de Bruijn graphs, we developed Bruno by leveraging Kmerind functionalities to support parallel de Bruijn graph construction, chain compaction, error removal, and graph traversal and element query. Our performance evaluations showed that Kmerind is scalable and high performance. Kmerin

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分