版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Beijing Inst Technol Beijing Engn Res Ctr High Volume language Informa Sch Comp Sci & Technol Beijing 100081 Peoples R China Univ Missouri Dept Comp Sci Columbia MO 65211 USA City Univ Hong Kong Dept Comp Sci Kowloon Hong Kong Peoples R China Chinese Acad Sci Inst Comp Technol Key Lab Intelligence Informat Proc Beijing 100190 Peoples R China Univ Wisconsin Dept Stat Madison WI 53706 USA
出 版 物:《IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS》 (IEEE/ACM Trans. Comput. BioL. Bioinf.)
年 卷 期:2015年第12卷第3期
页 面:686-694页
核心收录:
学科分类:0710[理学-生物学] 0808[工学-电气工程] 08[工学] 0714[理学-统计学(可授理学、经济学学位)] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Program on Key Basic Research Project (973 Program) [2013CB329605] National Key Technology R&D Program of China [2012BAK11B01] National Natural Science Foundation of China (NSFC) [61472040, 61003168, 31270834, 61272318] Beijing Higher Education Young Elite Teacher Project [YETP1198] GRF/ECS [9041901, CityU 118413]
主 题:Protein alignment substitution matrix parameterized BLOSUM matrices
摘 要:Protein alignment is a basic step for many molecular biology researches. The BLOSUM matrices, especially BLOSUM62, are the de facto standard matrices for protein alignments. However, after widely utilization of the matrices for 15 years, programming errors were surprisingly found in the initial version of source codes for their generation. And amazingly, after bug correction, the intended BLOSUM62 matrix performs consistently worse than the miscalculated one. In this paper, we find linear relationships among the eigenvalues of the matrices and propose an algorithm to find optimal unified eigenvectors. With them, we can parameterize matrix BLOSUMx for any given variable x that could change continuously. We compare the effectiveness of our parameterized isentropic matrix with BLOSUM62. Furthermore, an iterative alignment and matrix selection process is proposed to adaptively find the best parameter and globally align two sequences. Experiments are conducted on aligning 13,667 families of Pfam database and on clustering MHC II protein sequences, whose improved accuracy demonstrates the effectiveness of our proposed method.