The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many resear...
详细信息
Machine learning has become a core part of computing and has affected countless sectors with better implementations of existing systems. Machine learning algorithms use various methods to organize and learn from data ...
详细信息
Machine learning has become a core part of computing and has affected countless sectors with better implementations of existing systems. Machine learning algorithms use various methods to organize and learn from data and clustering is one such method. clustering as the name suggests, forms different clusters of data from the dataset based on the characteristics. However, clustering datasets could be onerous and might become worse when the number of clusters or if the number of data points is increased. Parallelizing the algorithms is one way by which the time taken can be reduced. clustering algorithms can be parallelized by optimizing the algorithm to make use of multiple CPUs or multiple cores of a single CPU by sharing the workload. This paper focuses on the performance analysis of parallelized clustering algorithms and other mainstream clustering algorithms. DBSCAN (Density-Based Spatial clustering of Applications with Noise), K-Means, Mini-Batch K-Means, Mean Shift are the chosen algorithms from different types of clustering to diversify the comparison. This paper will provide a comparative analysis of the performance between the different clustering algorithms by controlling the environment to either be single or multi-threaded.
High-dimensional data is interpreted with a considerable number of features, and new problems are presented in groups. The so-called "high dimension" is initially created to explain the common increase in ti...
详细信息
The reliability of the smart grid is adversely affected due to system uncertainties. Also, the steadily growing deployment of renewable distributed generation (DG) units increases the uncertainties of smart grids. Hen...
详细信息
The reliability of the smart grid is adversely affected due to system uncertainties. Also, the steadily growing deployment of renewable distributed generation (DG) units increases the uncertainties of smart grids. Hence, it is essential to concern the uncertainties in the field of reliability evaluation of smart grids. Although the Monte Carlo simulation (MCS) has received a significant deal of consideration in the literature, there is a research gap in using the clustering algorithms to assess smart grids' reliability. This article aims to fill such a research gap by proposing a new reliability assessment method, using various clustering algorithms. The benefits from the proposed method's accuracy and fast computation are highlighted, while optimal operation, optimal short-term planning, and repetitive problems should be studied. In this paper, the performance and accuracy of various classic (k-means, fuzzy c-means, and k-medoids) and metaheuristic (genetic algorithm, particle swarm optimization, differential evolutionary, harmony search, and artificial bee colony) clustering algorithms are studied. Comparing different scenario reduction algorithms in the proposed reliability evaluation method is one of the most contributions. The proposed method is applied to two realistic test systems. Test results infer that the proposed method is adequately precise, while the required computation time is less than MCS-based approaches. Test results for both test systems imply that the accurate expected energy not supplied (EENS) with less than 2.1% is achievable applying the proposed method. The fuzzy c-means clustering algorithm results in the best accuracy among the studied classic and nonclassic (metaheuristic) algorithms.
The article presents immediate access to over fifty fundamental clustering algorithms. Additionally, access to clustering benchmark datasets published priorly as "Fundamental clustering Problems Suite" (FCPS...
详细信息
The article presents immediate access to over fifty fundamental clustering algorithms. Additionally, access to clustering benchmark datasets published priorly as "Fundamental clustering Problems Suite" (FCPS) is provided. The software library is named "FCPS", available in R on CRAN and accessible within Python. The input and output of clustering algorithms are standardized to enable users a swift execution of cluster analysis. By combining mirrored-density plots (MD plots) with statistical testing, FCPS provides a tool to investigate the cluster-tendency quickly before the cluster analysis itself. Common clustering challenges can be generated with an arbitrary sample size. Additionally, FCPS sums up 26 indicators intending to estimate the number of clusters and provides an appropriate implementation of the clustering accuracy for more than two clusters. (C) 2020 The Author(s). Published by Elsevier B.V.
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. On the basis of the DIFI and China's Provincial Panel data, this study aims to test the poverty reducti...
详细信息
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. On the basis of the DIFI and China's Provincial Panel data, this study aims to test the poverty reduction effect of digital inclusive finance in three dimensions of income, education, and healthcare and further look at the transmission mechanism of digital inclusive finance in poverty alleviation. The results indicated that digital inclusive finance exerts a poverty reduction effect in three dimensions-medical poverty, income poverty, and education poverty. Of these, the coverage breadth significantly affects the alleviation of medical poverty, the use depth significantly affects the alleviation of income poverty and education poverty, and the digitization level affects the alleviation of poverty in three dimensions. The level of regional economic development plays an intermediary role in the poverty alleviation effect of digital inclusive finance. Compared with the western region, which is relatively backward in development, the poverty reduction effect of digital inclusive finance in the eastern region is more significant.
Autoclustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given ...
详细信息
ISBN:
(纸本)9783030587994;9783030587987
Autoclustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given dataset for clustering tasks. Autoclustering uses the Estimation of Distribution algorithms (EDA) evolutionary technique to create the algorithms (individuals), and the adapted CLEST method (originally determines the best number of groups for a dataset) to compute individual fitness, using a decision-tree classifier. Thus, as the motivation to improve the quality of the results generated by Autoclustering, and to avoid possible bias by the adoption of a classifier, this work proposes to increase the efficiency of the evaluation process by the addition of a quality metric based on a fusion of three quality indexes of solution clusters. The three quality indexes are Silhouette, Dunn, and Davies-Bouldin, which assess the situation Intra and Inter clusters, with algorithms based on distance and independent of the generation of the groups. A final score for a specific solution (algorithm + parameters) is the average of normalized quality metric and normalized fitness. Besides, the results of the proposal presented solutions with higher cluster quality metrics, higher fitness average, and higher diversity of generated individuals (clustering algorithms) when compared with traditional Autocluestering.
Existing cluster validity indices often possess a similar bias as the clustering algorithm they were introduced for, e.g. to determine the optimal number of clusters. We suggest an efficient and holistic assessment of...
详细信息
ISBN:
(纸本)9783030461508;9783030461492
Existing cluster validity indices often possess a similar bias as the clustering algorithm they were introduced for, e.g. to determine the optimal number of clusters. We suggest an efficient and holistic assessment of the structure discovery capabilities of clustering algorithms based on three criteria. We determine the robustness or stability of cluster assignments and interpret it as the confidence of the clustering algorithm in its result. This information is then used to label the data and evaluate the consistency of the stability-assessment with the notion of a cluster as an area of dense and separated data. The resulting criteria of stability, structure and consistency provide interpretable means to judge the capabilities of clustering algorithms without the typical biases of prominent indices, including the judgment of a clustering tendency.
clustering is the most widely used unsupervised machine learning technique, having extensive applications in statistical analysis. We have multiple clustering algorithms available in theory and many more implementatio...
详细信息
Recommender Systems have been intensively used in Information Systems in the last decades, facilitating the choice of items individually for each user based on your historical. clustering techniques have been frequent...
详细信息
ISBN:
(纸本)9781733632546
Recommender Systems have been intensively used in Information Systems in the last decades, facilitating the choice of items individually for each user based on your historical. clustering techniques have been frequently used in commercial and scientific domains in data mining tasks and visualization tools. However, there is a lack of secondary studies in the literature that analyze the use of clustering algorithms in Recommender Systems and their behavior in different aspects. In this work, we present a Systematic Literature Review (SLR), which discusses the different types of information systems with the use of the clustering algorithm in Recommender Systems, which typically involves three main recommendation approaches found in literature: collaborative filtering, content-based filtering, and hybrid recommendation. In the end, we did a quantitative analysis using K-means clustering for finding patterns between clustering algorithms, recommendation approaches, and some datasets used in their publications.
暂无评论