The article presents immediate access to over fifty fundamental clustering algorithms. Additionally, access to clustering benchmark datasets published priorly as "Fundamental clustering Problems Suite" (FCPS...
详细信息
The article presents immediate access to over fifty fundamental clustering algorithms. Additionally, access to clustering benchmark datasets published priorly as "Fundamental clustering Problems Suite" (FCPS) is provided. The software library is named "FCPS", available in R on CRAN and accessible within Python. The input and output of clustering algorithms are standardized to enable users a swift execution of cluster analysis. By combining mirrored-density plots (MD plots) with statistical testing, FCPS provides a tool to investigate the cluster-tendency quickly before the cluster analysis itself. Common clustering challenges can be generated with an arbitrary sample size. Additionally, FCPS sums up 26 indicators intending to estimate the number of clusters and provides an appropriate implementation of the clustering accuracy for more than two clusters. (C) 2020 The Author(s). Published by Elsevier B.V.
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. On the basis of the DIFI and China's Provincial Panel data, this study aims to test the poverty reducti...
详细信息
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. On the basis of the DIFI and China's Provincial Panel data, this study aims to test the poverty reduction effect of digital inclusive finance in three dimensions of income, education, and healthcare and further look at the transmission mechanism of digital inclusive finance in poverty alleviation. The results indicated that digital inclusive finance exerts a poverty reduction effect in three dimensions-medical poverty, income poverty, and education poverty. Of these, the coverage breadth significantly affects the alleviation of medical poverty, the use depth significantly affects the alleviation of income poverty and education poverty, and the digitization level affects the alleviation of poverty in three dimensions. The level of regional economic development plays an intermediary role in the poverty alleviation effect of digital inclusive finance. Compared with the western region, which is relatively backward in development, the poverty reduction effect of digital inclusive finance in the eastern region is more significant.
Existing cluster validity indices often possess a similar bias as the clustering algorithm they were introduced for, e.g. to determine the optimal number of clusters. We suggest an efficient and holistic assessment of...
详细信息
ISBN:
(纸本)9783030461508;9783030461492
Existing cluster validity indices often possess a similar bias as the clustering algorithm they were introduced for, e.g. to determine the optimal number of clusters. We suggest an efficient and holistic assessment of the structure discovery capabilities of clustering algorithms based on three criteria. We determine the robustness or stability of cluster assignments and interpret it as the confidence of the clustering algorithm in its result. This information is then used to label the data and evaluate the consistency of the stability-assessment with the notion of a cluster as an area of dense and separated data. The resulting criteria of stability, structure and consistency provide interpretable means to judge the capabilities of clustering algorithms without the typical biases of prominent indices, including the judgment of a clustering tendency.
Autoclustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given ...
详细信息
ISBN:
(纸本)9783030587994;9783030587987
Autoclustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given dataset for clustering tasks. Autoclustering uses the Estimation of Distribution algorithms (EDA) evolutionary technique to create the algorithms (individuals), and the adapted CLEST method (originally determines the best number of groups for a dataset) to compute individual fitness, using a decision-tree classifier. Thus, as the motivation to improve the quality of the results generated by Autoclustering, and to avoid possible bias by the adoption of a classifier, this work proposes to increase the efficiency of the evaluation process by the addition of a quality metric based on a fusion of three quality indexes of solution clusters. The three quality indexes are Silhouette, Dunn, and Davies-Bouldin, which assess the situation Intra and Inter clusters, with algorithms based on distance and independent of the generation of the groups. A final score for a specific solution (algorithm + parameters) is the average of normalized quality metric and normalized fitness. Besides, the results of the proposal presented solutions with higher cluster quality metrics, higher fitness average, and higher diversity of generated individuals (clustering algorithms) when compared with traditional Autocluestering.
clustering is the most widely used unsupervised machine learning technique, having extensive applications in statistical analysis. We have multiple clustering algorithms available in theory and many more implementatio...
详细信息
Recommender Systems have been intensively used in Information Systems in the last decades, facilitating the choice of items individually for each user based on your historical. clustering techniques have been frequent...
详细信息
ISBN:
(纸本)9781733632546
Recommender Systems have been intensively used in Information Systems in the last decades, facilitating the choice of items individually for each user based on your historical. clustering techniques have been frequently used in commercial and scientific domains in data mining tasks and visualization tools. However, there is a lack of secondary studies in the literature that analyze the use of clustering algorithms in Recommender Systems and their behavior in different aspects. In this work, we present a Systematic Literature Review (SLR), which discusses the different types of information systems with the use of the clustering algorithm in Recommender Systems, which typically involves three main recommendation approaches found in literature: collaborative filtering, content-based filtering, and hybrid recommendation. In the end, we did a quantitative analysis using K-means clustering for finding patterns between clustering algorithms, recommendation approaches, and some datasets used in their publications.
Despite an increasing consensus regarding the significance of properly identifying the most suitable clustering method for a given problem, a surprising amount of educational research, including both educational data ...
详细信息
Despite an increasing consensus regarding the significance of properly identifying the most suitable clustering method for a given problem, a surprising amount of educational research, including both educational data mining (EDM) and learning analytics (LA), neglects this critical task. This shortcoming could in many cases have a negative impact on the prediction power of both the EDM and LA based approaches. To address such issues, this work proposes an evaluation approach that automatically compares several clustering methods using multiple internal and external performance measures on 9 real-world educational datasets of different sizes, created from the University of Tartu's Moodle system, to produce two-way clustering. Moreover, to investigate the possible effect of normalization on the performance of the clustering algorithms, this work performs the same experiment on a normalized version of the datasets. Since such an exhaustive evaluation includes multiple criteria, the proposed approach employs a multiple criteria decision-making method (i.e., TOPSIS) to rank the most suitable methods for each dataset. Our results reveal that the proposed approach can automatically compare the performance of the clustering methods and accordingly recommend the most suitable method for each dataset. Furthermore, our results show that in both normalized and nonnormalized datasets of different sizes with 10 features, DBSCAN and k-medoids are the best clustering methods, whereas agglomerative and spectral methods appear to be among the most stable and highly performing clustering methods for such datasets with 15 features. Regarding datasets with more than 15 features, OPTICS is among the top-ranked algorithms among the nonnormalized datasets, and k-medoids is the best among the normalized datasets. Interestingly, our findings reveal that normalization may have a negative effect on the performance of certain methods, e.g., spectral clustering and OPTICS;however, it appears to m
Transmissions in the mmWave spectrum benefit from a-priori knowledge of radio channel propagation models. This paper is concerned with one important task that helps provide a more accurate channel model, namely, the c...
详细信息
ISBN:
(纸本)9781538683804
Transmissions in the mmWave spectrum benefit from a-priori knowledge of radio channel propagation models. This paper is concerned with one important task that helps provide a more accurate channel model, namely, the clustering of all multipath components arriving at the receiver. Our work focuses on directive transmissions in urban outdoor scenarios and shows the importance of the correct estimation of the number of clusters for mmWave radio channels simulated with a software ray-tracer tool. We investigate the effectiveness of k-means and k-power-means clustering algorithms in predicting the number of clusters through the use of cluster validity indices (CMIs) and score fusion techniques. Our investigation shows that clustering is a difficult task because the optimal number of clusters is not always given by one or by a combination of more CMIs. However, using score fusion methods, we find the optimal partitioning for the k-means algorithm based on the power and time of arrival of the multipath rays or based on their angle of arrival. When the k-power-means algorithm is used, the power of each arriving ray is the most important clustering factor, making the dominant received paths pull the other ones around them, to form a cluster. Thus, the number of clusters is smaller and the decision based on CMIs or score fusion factors easier to be taken.
The major evolution of the semantic web has become exchanging data between applications in all domains of activities. Based on this vision, different applications in recent days, e.g. in the fields of community web po...
详细信息
In order to handle the problem of linear separability in the early data clustering algorithms, Euclidean distance is being replaced with Kernel functions as measures of similarity. Another problem with the clustering ...
详细信息
暂无评论