Several external indices that use information not present in the dataset were shown to be useful for evaluation of representative basedclustering algorithms. However, such supervised measures are not directly useful ...
详细信息
ISBN:
(纸本)9780769545745
Several external indices that use information not present in the dataset were shown to be useful for evaluation of representative basedclustering algorithms. However, such supervised measures are not directly useful for construction of better clustering algorithms when class labels are not provided. We propose a method for identifying internal cluster evaluation measures that use only information present in the dataset and are related to given external indices. We utilize these internal measures for the construction of representative basedclustering algorithms. Both identification and utilization steps of the proposed method are enabled by use of a component-based clustering algorithm design. Experiments on 432 algorithms using gene expression data sets provide evidence that some internal measures could be used as surrogates for external indices proposed in the literature. Moreover, the obtained results suggest that internal measures correlated to selected external indices can guide the algorithms toward significantly better cluster models.
暂无评论