In current digitization world, data are growing with high density rapid rate. Therefore, it is necessary to manage the complexity of data in an efficient way with less effort. In order to handle the complex data, a co...
详细信息
We present a novel spectral-based algorithm for clustering categorical data that combines attribute relationship and dimension reduction techniques found in Principal Component Analysis (PCA) and Latent Semantic Index...
详细信息
ISBN:
(纸本)9781605586731
We present a novel spectral-based algorithm for clustering categorical data that combines attribute relationship and dimension reduction techniques found in Principal Component Analysis (PCA) and Latent Semantic Indexing (LSI). The new algorithm uses data summaries that consist of attribute occurrence and co-occurrence frequencies to create a set of vectors each of which represents a cluster. We refer to these vectors as "candidate cluster representatives." The algorithm also uses spectral decomposition of the data summaries matrix to project and cluster the data objects in a reduced space. We refer to the algorithm as SCCADDS (Spectral-based clustering algorithm for CAtegorical Data using Data Summaries). SCCADDS differs from other spectral clustering algorithms in several key respects. First, the algorithm uses the attribute categories similarity matrix instead of the data object similarity matrix (as is the case with most spectral algorithms that find the normalized cut of a graph of nodes of data objects). SCCADDS scales well for large datasets since in most categorical clustering applications the number of attribute categories is small relative to the number of data objects. Second, non-recursive spectral-based clustering algorithms typically require K-means or some other iterative clustering method after the data objects have been projected into a reduced space. SCCADDS clusters the data objects directly by comparing them to candidate cluster representatives without the need for an iterative clustering method. Third, unlike standard spectral-based algorithms, the complexity of SCCADDS is linear in terms of the number of data objects. Results on datasets widely used to test categorical clustering algorithms show that SCCADDS produces clusters that are consistent with those produced by existing algorithms, while avoiding the computation of the spectra of large matrices and problems inherent in methods that employ the K-means type algorithms. Copyright 2009 ACM.
The article proposes the algorithm to solve objects clustering problem for such subject areas as education and labour market. Such objects are competence, discipline, specialty, vacancy, etc. The main problem in clust...
详细信息
In this paper, we present the application of a clustering algorithm to exploit lexical and syntactic relationships occurring between natural language requirements. Our experiments conducted on a real-world data set hi...
详细信息
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques. So far, a lot of sp...
详细信息
In gene expression data analysis, clustering is a fruitful exploratory technique to reveal the underlying molecular mechanism by identifying groups of co-expressed genes. To reduce the noise, usually multiple experime...
详细信息
This paper presents a method for semantic classication of onomatopoetic words like "ひゅーひゅー (hum)" and "からんころん (clip clop)" which exist in every language, especially Japanese being rich ...
详细信息
clustering is the unsupervised classification of patterns into groups. A clustering algorithm partitions a data set into several groups such that similarity within a group is larger than among groups The clustering pr...
详细信息
Through analyzing kernel clustering algorithm and rough set theory, a novel clustering algorithm, Rough kernel k-means clustering algorithm with adaptive parameters, is proposed for clustering analysis in this paper. ...
详细信息
In the online balanced graph repartitioning problem, one has to maintain a clustering of n nodes into clusters, each having k= n/ nodes. During runtime, an online algorithm is given a stream of communication requests ...
详细信息
暂无评论