The main purpose of this research is to improve the clustering accuracy of mixed attributes data. Therefore, glowworm swarm optimisation (GSO) algorithm is introduced into k-prototypes algorithm to form a new clusteri...
详细信息
The main purpose of this research is to improve the clustering accuracy of mixed attributes data. Therefore, glowworm swarm optimisation (GSO) algorithm is introduced into k-prototypes algorithm to form a new clustering algorithm. First, GSO algorithm is improved by using the good point set. Then, the improved GSO algorithm is employed to search extreme points of density in the space of data objects. The initial clustering centre of k-prototypes algorithm is chosen from the extreme points of density. Meanwhile, a unified method is designed for the distance of numeric data and categorical data. On this basis, a new clustering algorithm flow (GSOkP) is designed. Finally, the UCI datasets of numeric data, categorical data and mixed data are selected to test GSOkP algorithm. And the effectiveness of GSOkP algorithm is analysed in terms of clustering accuracy through experimental comparison.
In the era of big data, more effective hybrid data clustering methods are expected in data mining and data preprocessing. As one of the typical clustering algorithms, k-prototypes can deal with mixed type dataset. How...
详细信息
ISBN:
(纸本)9781845648299;9781845648282
In the era of big data, more effective hybrid data clustering methods are expected in data mining and data preprocessing. As one of the typical clustering algorithms, k-prototypes can deal with mixed type dataset. However, there remains two limitations: (1) the randomly selected initial points lack representativeness;(2) the dissimilarity definition is still rough. In this paper, we propose an improved algorithm called L.k-prototypes to remedy above two drawbacks. To evaluate the performance of this algorithm, we implemented and conducted experiments with a real dataset. The results showed that L.k-prototypes can improve both the accuracy and efficiency of clustering.
In many situations, the data are often encountered in mixed attributes. The k-prototypes algorithm is one of the principals for clustering this type of data objects. In view of the shortcomings of this algorithm, an i...
详细信息
ISBN:
(纸本)9781467383158
In many situations, the data are often encountered in mixed attributes. The k-prototypes algorithm is one of the principals for clustering this type of data objects. In view of the shortcomings of this algorithm, an improved algorithm is proposed to determine the initial points based on grouping and averaging method. Then we use the actual data set to test the improved algorithm. Detailed data prove that the improved algorithm has good stability and validity.
In cluster analysis, one of the most challenging and difficult problems is the determination of the number of clusters in a data set, which is a basic input parameter for most clustering algorithms. To solve this prob...
详细信息
In cluster analysis, one of the most challenging and difficult problems is the determination of the number of clusters in a data set, which is a basic input parameter for most clustering algorithms. To solve this problem, many algorithms have been proposed for either numerical or categorical data sets. However, these algorithms are not very effective for a mixed data set containing both numerical attributes and categorical attributes. To overcome this deficiency, a generalized mechanism is presented in this paper by integrating Renyi entropy and complement entropy together. The mechanism is able to uniformly characterize within-cluster entropy and between-cluster entropy and to identify the worst cluster in a mixed data set. In order to evaluate the clustering results for mixed data, an effective cluster validity index is also defined in this paper. Furthermore, by introducing a new dissimilarity measure into the k-prototypes algorithm, we develop an algorithm to determine the number of clusters in a mixed data set. The performance of the algorithm has been studied on several synthetic and real world data sets. The comparisons with other clustering algorithms show that the proposed algorithm is more effective in detecting the optimal number of clusters and generates better clustering results. (C) 2011 Elsevier Ltd. All rights reserved.
Every region in Indonesia has different potentials and need to be analyzed for national development considerations. This analyzed can be accomplished with clustering Indonesian regional potential data, which is collec...
详细信息
ISBN:
(纸本)9781479965274
Every region in Indonesia has different potentials and need to be analyzed for national development considerations. This analyzed can be accomplished with clustering Indonesian regional potential data, which is collected from PODES enumeration. This data consist of both numeric and categorical attributes. However, most of clustering algorithm can be applied on either numeric or categorical data. k-prototypes algorithm, as clustering algorithm which can deal with mix data types, has limitation such as distance measurement. Selecting distance measures properly is thus important to increase its performance. This paper presents a comparison of distance measures for clustering mix attribute type data. We have applied k-prototypes algorithm with several distance measures on PODESII-DES,k dataset and used Silhouette index for clustering evaluation. The results show that the best clustering is accomplished by applying Ratio on Mismatches distance for categorical attributes. For numeric attributes, there is no one best performing distance measure since the performance of numeric distance measures varies for each treatment.
Many factors could affect the achievement of students in distance learning settings. Internal factors such as age, gender, previous education level and engagement in online learning activities can play an important ro...
详细信息
ISBN:
(纸本)9781509054671
Many factors could affect the achievement of students in distance learning settings. Internal factors such as age, gender, previous education level and engagement in online learning activities can play an important role in obtaining successful learning outcomes, as well as external factors such as regions where they come from and the learning environment that they can access. Identifying the relationships between student characteristics and distance learning outcomes is a central issue in learning analytics. This paper presents a study that applies unsupervised learning for identifying how demographic characteristics of students and their engagement in online learning activities can affect their learning achievement. We utilise the k-prototypes clustering method to identify groups of students based on demographic characteristics and interactions with online learning environments, and also investigate the learning achievement of each group. knowing these groups of students who have successful or poor learning outcomes can aid faculty for designing online courses that adapt to different students' needs. It can also assist students in selecting online courses that are appropriate to them.
暂无评论