检索结果-内蒙古大学图书馆

Clustering techniques in data mining - A survey

IETE JOURNAL OF RESEARCH 2001年第1-2期47卷 19-28页

作者： Pujari, AK Rajesh, K Reddy, DS Univ Hyderabad Dept Comp & Informat Sci Hyderabad 500046 Andhra Pradesh India

In last few years there has been tremendous research interest In devising efficient data mining algorithms. Clustering is a very essential component of data mining techniques. Interestingly, the special nature of data mining Makes the classical clustering algorithms unsuitable. These characteristics are usually very large datasets;the dataset need not be necessarily numeric and hence importance should be given to efficient I/O operations instead of algorithmic complexity. As a result in last few years a number of clustering algorithms are proposed for data mining. The present paper gives a brief overview of these algorithms. The first part of the paper discusses numerical clustering which are classified into partitioned clustering and hierarchical clustering. In the second part the paper discusses the clustering algorithms for categorical data.

关键词： clustering hierarchical clustering DBSCAN partitional algorithms density-reachable

来源：评论

学校读者我要写书评

暂无评论

A Mathematical Theory for Clustering in Metric Spaces

引用

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING 2016年第1期3卷 2-16页

作者： Chang, Cheng-Shang Liao, Wanjiun Chen, Yu-Sheng Liou, Li-Heng Natl Tsing Hua Univ Inst Commun Engn Hsinchu 300 Taiwan Natl Taiwan Univ Dept Elect Engn Taipei Taiwan

Clustering is one of the most fundamental problems in data analysis and it has been studied extensively in the literature. Though many clustering algorithms have been proposed, clustering theories that justify the use of these clustering algorithms are still unsatisfactory. In particular, one of the fundamental challenges is to address the following question: What is a cluster in a set of data points? In this paper, we make an attempt to address such a question by considering a set of data points associated with a distance measure (metric). We first propose a new cohesion measure in terms of the distance measure. Using the cohesion measure, we define a cluster as a set of points that are cohesive to themselves. For such a definition, we show there are various equivalent statements that have intuitive explanations. We then consider the second question: How do we find clusters and good partitions of clusters under such a definition? For such a question, we propose a hierarchical agglomerative algorithm and a partitional algorithm. Unlike standard hierarchical agglomerative algorithms, our hierarchical agglomerative algorithm has a specific stopping criterion and it stops with a partition of clusters. Our partitional algorithm, called the K-sets algorithm in the paper, appears to be a new iterative algorithm. Unlike the Lloyd iteration that needs two-step minimization, our K-sets algorithm only takes one-step minimization. One of the most interesting findings of our paper is the duality result between a distance measure and a cohesion measure. Such a duality result leads to a dual K-sets algorithm for clustering a set of data points with a cohesion measure. The dual K-sets algorithm converges in the same way as a sequential version of the classical kernel K-means algorithm. The key difference is that a cohesion measure does not need to be positive semi-definite.

关键词： Clustering hierarchical algorithms partitional algorithms convergence K-sets duality

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：