咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Efficient haplotype block reco... 收藏

Efficient haplotype block recognition of very long and dense genetic sequences

很长、稠密的基因序列的有效 haplotype 块识别

作     者:Taliun, Daniel Gamper, Johann Pattaro, Cristian 

作者机构:European Acad Bolzano Bozen EURAC Biomed Ctr Bozen Bolzano Italy Free Univ Bozen Bolzano Bozen Bolzano Italy 

出 版 物:《BMC BIOINFORMATICS》 (英国医学委员会:生物信息)

年 卷 期:2014年第15卷第1期

页      面:1-18页

核心收录:

学科分类:0710[理学-生物学] 0836[工学-生物工程] 10[医学] 

基  金:NIH [R01 GM031575] National Institutes of Health [NO1-AR-2-2263, RO1-AR-44422] National Arthritis Foundation 

主  题:Linkage Disequilibrium Haplotype Block Strong Linkage Disequilibrium Memory Complexity Block Partition 

摘      要:Background: The new sequencing technologies enable to scan very long and dense genetic sequences, obtaining datasets of genetic markers that are an order of magnitude larger than previously available. Such genetic sequences are characterized by common alleles interspersed with multiple rarer alleles. This situation has renewed the interest for the identification of haplotypes carrying the rare risk alleles. However, large scale explorations of the linkage-disequilibrium (LD) pattern to identify haplotype blocks are not easy to perform, because traditional algorithms have at least Theta (n(2)) time and memory complexity. Results: We derived three incremental optimizations of the widely used haplotype block recognition algorithm proposed by Gabriel et al. in 2002. Our most efficient solution, called MIG(,)(++) has only Theta (n) memory complexity and, on a genome-wide scale, it omits 80% of the calculations, which makes it an order of magnitude faster than the original algorithm. Differently from the existing software, the MIG(++) analyzes the LD between SNPs at any distance, avoiding restrictions on the maximal block length. The haplotype block partition of the entire HapMap II CEPH dataset was obtained in 457 hours. By replacing the standard likelihood-based D variance estimator with an approximated estimator, the runtime was further improved. While producing a coarser partition, the approximate method allowed to obtain the full-genome haplotype block partition of the entire 1000 Genomes Project CEPH dataset in 44 hours, with no restrictions on allele frequency or long-range correlations. These experiments showed that LD-based haplotype blocks can span more than one million base-pairs in both HapMap II and 1000 Genomes datasets. An application to the North American Rheumatoid Arthritis Consortium (NARAC) dataset shows how the MIG(++) can support genome-wide haplotype association studies. Conclusions: The MIG(++) enables to perform LD-based haplotype block recognit

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分