咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Performance comparison of gene... 收藏

Performance comparison of gene family clustering methods with expert curated gene family data set in <i>Arabidopsis thaliana</i>

与专家 curated 基因家庭数据聚类方法的基因家庭的性能比较在 Arabidopsis thaliana 设定

作     者:Yang, Kuan Zhang, Liqing 

作者机构:Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA Virginia Tech Virginia Bioinformat Inst Blacksburg VA 24061 USA Virginia Tech Program Genet Bioinformat & Computat Biol Blacksburg VA 24061 USA 

出 版 物:《PLANTA》 (植物学)

年 卷 期:2008年第228卷第3期

页      面:439-447页

核心收录:

学科分类:0710[理学-生物学] 071001[理学-植物学] 07[理学] 

基  金:National Science Foundation  NSF  (IIS-0710945  IIS-0710945) 

主  题:Arabidopsis complete linkage gene family hierarchical clustering algorithm K-means clustering single linkage TribeMCL 

摘      要:With the exponential growth of genomics data, the demand for reliable clustering methods is increasing every day. Despite the wide usage of many clustering algorithms, the accuracy of these algorithms has been evaluated mostly on simulated data sets and seldom on real biological data for which a correct answer is available. In order to address this issue, we use the manually curated high-quality Arabidopsis thaliana gene family database as a gold standard to conduct a comprehensive comparison of the accuracies of four widely used clustering methods including K-means, TribeMCL, single-linkage clustering and complete-linkage clustering. We compare the results from running different clustering methods on two matrices: the E-value matrix and the k-tuple distance matrix. The E-value matrix is computed based on BLAST E-values. The k-tuple distance matrix is computed based on the difference in tuple frequencies. The TribeMCL with the E-value matrix performed best, with the Inflation parameter (=1.15) tuned considerably lower than what has been suggested previously (=2). The single-linkage clustering method with the E-value matrix was second best. Single-linkage clustering, K-means clustering, complete-linkage clustering, and TribeMCL with a k-tuple distance matrix performed reasonably well. Complete-linkage clustering with the k-tuple distance matrix performed the worst.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分