文献详情 >Determining the number of clus... 收藏

Determining the number of clusters using information entropy for mixed data

为混合数据用信息熵决定簇的数字

作者：Liang, Jiye Zhao, Xingwang Li, Deyu Cao, Fuyuan Dang, Chuangyin

作者机构：Shanxi Univ Key Lab Cornputat Intelligence & Chinese Informat Minist Educ Sch Comp & Informat Technol Taiyuan 030006 Shanxi Peoples R China City Univ Hong Kong Dept Mfg Engn & Engn Management Hong Kong Hong Kong Peoples R China

出版物：《PATTERN RECOGNITION》 (图形识别)

年卷期：2012年第45卷第6期

页面：2251-2265页

核心收录：

学科分类：0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：National Natural Science Foundation of China [71031006, 70971080, 60970014] Special Prophase Project on National Key Basic Research and Development Program of China (973) [2011CB311805] Foundation of Doctoral Program Research of Ministry of Education of China Key Problems in Science and Technology Project of Shanxi [20110321027-01]

主　　题：Clustering Mixed data Number of clusters Information entropy Cluster validity index k-Prototypes algorithm

摘要：In cluster analysis, one of the most challenging and difficult problems is the determination of the number of clusters in a data set, which is a basic input parameter for most clustering algorithms. To solve this problem, many algorithms have been proposed for either numerical or categorical data sets. However, these algorithms are not very effective for a mixed data set containing both numerical attributes and categorical attributes. To overcome this deficiency, a generalized mechanism is presented in this paper by integrating Renyi entropy and complement entropy together. The mechanism is able to uniformly characterize within-cluster entropy and between-cluster entropy and to identify the worst cluster in a mixed data set. In order to evaluate the clustering results for mixed data, an effective cluster validity index is also defined in this paper. Furthermore, by introducing a new dissimilarity measure into the k-prototypes algorithm, we develop an algorithm to determine the number of clusters in a mixed data set. The performance of the algorithm has been studied on several synthetic and real world data sets. The comparisons with other clustering algorithms show that the proposed algorithm is more effective in detecting the optimal number of clusters and generates better clustering results. (C) 2011 Elsevier Ltd. All rights reserved.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Determining the number of clusters using information entropy for mixed data

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Determining the number of clusters using information entropy for mixed data

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：