This paper focuses on the stability-based approach for estimating the number of clusters K in microarray data. The cluster stability approach amounts to performing clustering successively over random subsets of the av...
详细信息
This paper focuses on the stability-based approach for estimating the number of clusters K in microarray data. The cluster stability approach amounts to performing clustering successively over random subsets of the available data and evaluating an index which expresses the similarity of the successive partitions obtained. We present a method for automatically estimating K by starting from the distribution of the similarity index. We investigate how the selection of the hierarchicalclustering (HQ method, respectively, the similarity index, influences the estimation accuracy. The paper introduces a new similarity index based on a partition distance. The performance of the new index and that of other well-known indices are experimentally evaluated by comparing the "true" data partition with the partition obtained at each level of an HC tree. A case study is conducted with a publicly available Leukemia dataset.
The proposed divisive clustering method performs simultaneously a hierarchy of a set of objects and a monothetic characterization of each cluster of the hierarchy. A division is performed according to the within-clust...
详细信息
The proposed divisive clustering method performs simultaneously a hierarchy of a set of objects and a monothetic characterization of each cluster of the hierarchy. A division is performed according to the within-cluster inertia criterion which is minimized among the bipartitions induced by a set of binary questions. In order to improve the clustering, the algorithm revises at each step the division which has induced the cluster chosen for division. (C) 1998 Elsevier Science B.V. All rights reserved.
There is mounting evidence to suggest that the complete linkage method does the best clustering job among all hierarchical agglomerative techniques, particularly with respect to misclassification in samples from known...
详细信息
There is mounting evidence to suggest that the complete linkage method does the best clustering job among all hierarchical agglomerative techniques, particularly with respect to misclassification in samples from known multivariate normal distributions. However, clusteringmethods are notorious for discovering clusters on random data sets also. We compare six agglomerative hierarchicalmethods on univariate random data from uniform and standard normal distributions and find that the complete linkage method generally is best in not discovering false clusters. The criterion is the ratio of number of within-cluster distances to number of all distances at most equal to the maximum within-cluster distance.
暂无评论