This study compares the clustering algorithms used by four outstanding Iraqi institutions in Anbar, including Anbar University, Fallujah University, AI-Marf University and AI-Huda University. Meta-studies for clusteri...
详细信息
ISBN:
(数字)9798350372977
ISBN:
(纸本)9798350372984
This study compares the clustering algorithms used by four outstanding Iraqi institutions in Anbar, including Anbar University, Fallujah University, AI-Marf University and AI-Huda University. Meta-studies for clustering, studies by themselves ensure clustering, clustering Reinforcement learning, definition including clustering, and deep clustering include methods to be considered The analysis is based on three basic metrics of cluster performance: the Silhouette score, the Calinski-Harabasz index, and the Davies-Bouldin index. The collected evaluations show unique performance characteristics for each algorithm, which provides helpful information for algorithm selection in an educational setting. Meta-getting to know for clustering is an effective method. It works well on all parameters with a silhouette rating of 0.4655, a Calinski-Harabasz index of 189.8185, and a Davies-Bouldin index of 0.8202, just like 0.7287. The locating, which is supported by way of silhouette rankings and equal Calinski-Harabasz and Davies-Bouldin indexes, suggests that the efficacy of all 3 of those methods compares nicely inside the context of experimental measurements from the s. Deep clustering, alternatively, plays in another way, with an extensively decreased silhouette score of 0.0880 and extensively higher Calinski-Harabasz (83.4026) and Davies-Bouldin (3.2493) indices. This shows that its clustering performance may be more severe when compared to the other strategies studied. The most successful method is meta-gaining knowledge of for clustering, which plays nicely on all parameters with a silhouette rating of 0.4655, a Calinski-Harabasz index of 189.8185, and a Davis- Bouldin index of 0.8202, just like 0.7287 Finding this, similarly determined by silhouette scores and almost same Calinski-Harabasz, and Davis- Bouldin indices, shows that the effectiveness of these three technology is comparable in the context of experimental measures of the. On the other hand, deep clustering has un
Big data is a main problem for data mining methods. Fortunately, the rapid advances in affordable high performance computing platforms such as the Graphics Processing Unit (GPU) have helped researchers in reducing the...
详细信息
ISBN:
(纸本)9781509043217
Big data is a main problem for data mining methods. Fortunately, the rapid advances in affordable high performance computing platforms such as the Graphics Processing Unit (GPU) have helped researchers in reducing the execution time of many algorithms including data mining algorithms. This paper discusses the utilization of the parallelism capabilities of the GPU to improve the performance of two common clustering algorithms, which are K-Means (KM) and Fuzzy C-Means (FCM) algorithms. Two main parallelism approaches are presented: pure and hybrid. These different versions are tested under different settings including two different GPU-equipped machines (a laptop and a server). The results show excellent improvement gains of the hybrid implementations compared with the pure parallel and sequential ones. On the laptop, the best gains of the hybrid implementations compared with the sequential ones are 113X for KM and 10.9X for FCM. As for the server, the best gains are 13.5X for KM and 16.3X for FCM. Moreover, the paper explores the usage of a recent memory management technique for GPU called Unified Memory (UM). The results show a decrease in the performance gain of the hybrid implementations that is equal to 44% for hybrid version of KM and 61% for FCM. On the other hand, the use of UM does introduce a small advantage for the pure parallel implementation.
clustering algorithms offer several advantages over a manual grouping processes. Yet, the initial and random guess of most clustering algorithms, along with noise and outliers, affect the reliability of their results....
详细信息
clustering algorithms offer several advantages over a manual grouping processes. Yet, the initial and random guess of most clustering algorithms, along with noise and outliers, affect the reliability of their results. In this paper, new clustering performance measures (CPM) for assessing the reliability of a clustering algorithm are proposed. In this paper, two parameters are used to define clustering performance measures - the first is the validation measure, which is used for determining how well the algorithm works at a given set of parameter values, and the second is a repeatability measure, which is used for studying the effect of initial conditions on the clusters membership. Furthermore, these CPMs can be used to evaluate clustering algorithms. Two different types of real-world data are used for such an evaluation procedure. The first is a communications signal data set representing one modulation scheme under noise condition, and the second is a breast cancer data set
This paper presents a case of study where two of the most used fuzzy clustering algorithms in pattern recognition tasks are analyzed under a classification problem that involves a high degree of subjectivity. The prob...
详细信息
This paper presents a case of study where two of the most used fuzzy clustering algorithms in pattern recognition tasks are analyzed under a classification problem that involves a high degree of subjectivity. The problem consists on the classification of seven types of wood defects called knots. The algorithms are the Abonyi-Szeifert modification of the Gath-Geva algorithm, GGAS, and the Gustafson-Keseel, GK. An improvement to the GK algorithm, GKM, is also proposed and analyzed. Besides the analysis of the algorithms, three different techniques are proposed to generate the design set of samples and the testing set of samples. Results of the study show that the GGAS and the GKM algorithms have a performance close to human performance.
In the domain of software architecture recovery, classical clustering algorithms have been used to recover module views, while new ones have been proposed to tackle specific software architecture issues. Nonetheless, ...
详细信息
In the domain of software architecture recovery, classical clustering algorithms have been used to recover module views, while new ones have been proposed to tackle specific software architecture issues. Nonetheless, little information concerning their empirical evaluation in this context is presently available. This paper presents an empirical study that evaluates four clustering algorithms according to three previously proposed criteria: extremity of cluster distribution, authoritativeness, and stability, which were measured against consecutive releases of four different systems. Our results suggest that the k-means algorithm performs best in terms of authoritativeness and extremity and that the modularization quality algorithm produces more stable clusters. They also point out that fully automated clustering techniques alone cannot recover module views in a sensible way, but may provide a reasonable first step to speed up an expert-assisted architecture recovery process.
This paper deals with a new approach for complex systems modeling and control based on neural and fuzzy clustering algorithms. It aims to derive a base of local models describing the system in the whole operating doma...
详细信息
This paper deals with a new approach for complex systems modeling and control based on neural and fuzzy clustering algorithms. It aims to derive a base of local models describing the system in the whole operating domain. The implementation of this approach requires three main steps: 1) determination of the structure of the model-base, the number of models are found out by using Rival Penalized Competitive Learning (RPCL), and the operating clusters are selected referring to the fuzzy K-means algorithm, 2) parametric model identification using the clustering results 3) determination of the global system control parameters obtained by a fusion of local control parameters. The case of a second order nonlinear system is studied to illustrate the efficiency of the proposed approach.
In this paper we attempt to define the major trade routes which vessels of trade follow when travelling across the globe in a scalable, data-driven unsupervised way. For this, we exploit a large volume of historical A...
详细信息
In this paper we attempt to define the major trade routes which vessels of trade follow when travelling across the globe in a scalable, data-driven unsupervised way. For this, we exploit a large volume of historical AIS data, so as to estimate the location and connections of the major trade routes, with minimal reliance on other sources of information. We address the challenges posed due to the volume of data by leveraging distributed computing techniques and present a novel MapReduce based algorithmic approach, capable of handling skewed and nonuniform geospatial data. In the direction, we calculate and compare the performance (execution time and compression ratio) and accuracy of several mature clustering algorithms and present preliminary results.
Speaker clustering is an important problem of speech processing, such as speaker diarization, however, its behavior in adverse acoustic environments is lack of comprehensive study. To address this problem, we focus on...
详细信息
ISBN:
(纸本)9781728102436;9789881476852
Speaker clustering is an important problem of speech processing, such as speaker diarization, however, its behavior in adverse acoustic environments is lack of comprehensive study. To address this problem, we focus on investigating its components respectively. A speaker clustering system contains three components-a feature extraction front-end, a dimensionality reduction algorithm, and a clustering back-end. In this paper, we use the standard Gaussian mixture model based universal background model (GMM-UBM) as a front end to extract high-dimensional supervectors, and compare three dimensionality reduction algorithms as well as two clustering algorithms. The three dimensionality reduction algorithms are the principal component analysis (PCA), spectral clustering (SC), and multilayer bootstrap network (MBN). The two clustering algorithms are the k-means and agglomerative hierarchical clustering (AHC). We have conducted an extensive experiment with both in-domain and out-of-domain settings on the noisy versions of the NIST 2006 speaker recognition evaluation (SRE) and NIST 2008 SRE corpora. Experimental results in various noisy environments show that (i) the MBN based systems perform the best in most cases, while the SC based systems outperform the PCA based systems as well as the original supervector based systems; (ii) AHC is more robust than k-means.
In the kernel clustering problem we are given a (large) n × n symmetric positive semidefinite matrix A = (a_(ij)) with (sum from i=1 to n) (sum from j=1 to n) a_(ij) = 0 and a (small) k × k symmetric positiv...
详细信息
ISBN:
(纸本)9780898717013
In the kernel clustering problem we are given a (large) n × n symmetric positive semidefinite matrix A = (a_(ij)) with (sum from i=1 to n) (sum from j=1 to n) a_(ij) = 0 and a (small) k × k symmetric positive semidefinite matrix B = (b_(ij)). The goal is to find a partition {S_1,..., S_k} of {1,..., n} which maximizes (sum from i=1 to k) (sum from j=1 to k) (∑a_(pq), ((p,q)∈s_i×s_j) b_(ij). We design a polynomial time approximation algorithm that achieves an approximation ratio of R(B)~2/C(B), where R(B) and C(B) are geometric parameters that depend only on the matrix B, defined as follows: if b_(ij) = 〈v_i, v_j〉 is the Gram matrix representation of B for some v_1,...,v_k ∈ R~k then R(B) is the minimum radius of a Euclidean ball containing the points {v_1, ..., v_k}.
We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete ...
详细信息
暂无评论