检索结果-内蒙古大学图书馆

Fast NJ-like algorithms to deal with incomplete distance matrices

BMC BIOINFORMATICS 2008年第1期9卷 1-16页

作者： Criscuolo, Alexis Gascuel, Olivier Univ Montpellier 2 CNRS LIRMM Equpe Methodes & Algorithmes Bioinformat F-34392 Montpellier 05 France Univ Montpellier 2 CNRS ISEM Grp Phylog Mol F-34095 Montpellier 05 France Univ Strasbourg 1 LSIIT Equipe Bioinformat Theorique F-67412 Illkirch Graffenstaden France

Background: Distance-based phylogeny inference methods first estimate evolutionary distances between every pair of taxa, then build a tree from the so-obtained distance matrix. These methods are fast and fairly accurate. However, they hardly deal with incomplete distance matrices. Such matrices are frequent with recent multi-gene studies, when two species do not share any gene in analyzed data. The few existing algorithms to infer trees with satisfying accuracy from incomplete distance matrices have time complexity in O(n(4)) or more, where n is the number of taxa, which precludes large scale studies. agglomerative distance algorithms (e. g. NJ [1,2]) are much faster, with time complexity in O(n(3)) which allows huge datasets and heavy bootstrap analyses to be dealt with. These algorithms proceed in three steps: (a) search for the taxon pair to be agglomerated, (b) estimate the lengths of the two so-created branches, (c) reduce the distance matrix and return to (a) until the tree is fully resolved. But available agglomerative algorithms cannot deal with incomplete matrices. Results: We propose an adaptation to incomplete matrices of three agglomerative algorithms, namely NJ, BIONJ [3] and MVR [4]. Our adaptation generalizes to incomplete matrices the taxon pair selection criterion of NJ ( also used by BIONJ and MVR), and combines this generalized criterion with that of ADDTREE [5]. Steps (b) and (c) are also modified, but O(n3) time complexity is kept. The performance of these new algorithms is studied with large scale simulations, which mimic multi-gene phylogenomic datasets. Our new algorithms-named NJ*, BIONJ* and MVR* infer phylogenetic trees that are as least as accurate as those inferred by other available methods, but with much faster running times. MVR* presents the best overall performance. This algorithm accounts for the variance of the pairwise evolutionary distance estimates, and is well suited for multi-gene studies where some distances are accurately e

关键词： Distance Matrix Distance Matrice Deletion Rate Matrix Reduction agglomerative algorithm

来源：评论

学校读者我要写书评

暂无评论

Statistical Approach for Community Mining in Social Networks

Statistical Approach for Community Mining in Social Networks

引用

2008 IEEE International Conference on Service Operations and Logistics, and Informatics(IEEE/SOLI’2008)(IEEE服务运作、物流与信息年会)

作者： M.P.S.Bhatia Pankaj Gaur Department of Computer engineeringNetaji Subhas Institute of TechnologyUniversity of Delhi INDIA Department of Computer engineering Netaji Subhas Institute of Technology University of Delhi INDIA

The popularity of social networking on the web and the explosive combination with data mining techniques open up vast and so far unexplored opportunities for social intelligence on the web.A network community is a special subnetwork that contains a group of nodes sharing similar linked *** community mining algorithms have been developed in the *** this work,we have presented a new algorithm BFC (breadth first clustering) which uses statistical approach for community mining in social *** algorithm proceeds in breadth first way and incrementally extract communities from the *** algorithm is simple,fast and can be scaled easily for large social *** effectiveness of this approach has been validated using network examples.

关键词： community mining statistical approach agglomerative algorithm

来源：评论

学校读者我要写书评

暂无评论

Towards hierarchical clustering

引用

2nd International Computer Science Symposium in Russia (CSR 2007)

作者： Levin, Mark Sh. Russian Acad Sci Inst Informat Transmiss Problems Moscow 127994 Russia

ISBN: (纸本)9783540745099

In the paper, new modified agglomerative algorithms for hierarchical clustering are suggested. The clustering process is targeted to generating a cluster hierarchy which can contain the same items in different clusters. The algorithms are based on the following additional operations: (i) building an ordinal item pair proximity ('distance') including the usage of multicriteria approaches;(ii) integration of several item pair at each stage of the algorithms;and (iii) inclusion of the same items into different integrated item pairs/clusters. The suggested modifications above are significant from the viewpoints of practice, e.g., design of systems architecture for engineering and computer systems.

关键词： hierarchical clustering agglomerative algorithm hierarchy system architecture multicriteria analysis

来源：评论

学校读者我要写书评

暂无评论

AN ERROR VARIANCE APPROACH TO 2-MODE HIERARCHICAL-CLUSTERING

引用

JOURNAL OF CLASSIFICATION 1993年第1期10卷 51-74页

作者： ECKES, T UNIV SAARLAND FACHRICHTUNG PSYCHOLW-6600 SAARBRUCKENGERMANY

A new agglomerative method is proposed for the simultaneous hierarchical clustering of row and column elements of a two-mode data matrix. The procedure yields a nested sequence of partitions of the union of two sets of entities (modes). A two-mode cluster is defined as the union of subsets of the respective modes. At each step of the agglomerative process, the algorithm merges those clusters whose fusion results in the smallest possible increase in an internal heterogeneity measure. This measure takes into account both the variance within the respective cluster and its centroid effect defined as the squared deviation of its mean from the maximum entry in the input matrix. The procedure optionally yields an overlapping cluster solution by assigning further row and/or column elements to clusters existing at a preselected hierarchical level. Applications to real data sets drawn from consumer research concerning brand-switching behavior and from personality research concerning the interaction of behaviors and situations demonstrate the efficacy of the method at revealing the underlying two-mode similarity structure.

关键词： CLUSTERING 2-MODE DATA ULTRAMETRIC REPRESENTATION agglomerative algorithm HETEROGENEITY INDEX BRAND-SWITCHING BEHAVIOR-SITUATION CONGRUENCE

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：