This paper provides a comparative study of several enhanced versions of the fuzzy c-means clustering algorithm in an application of histogram-based image color reduction. A common preprocessing is performed before clu...
详细信息
ISBN:
(纸本)9781479959969
This paper provides a comparative study of several enhanced versions of the fuzzy c-means clustering algorithm in an application of histogram-based image color reduction. A common preprocessing is performed before clustering, consisting of a preliminary color quantization, histogram extraction and selection of frequently occurring colors of the image. These selected colors will be clustered by tested c-means algorithms. clustering is followed by another common step, which creates the output image. Besides conventional hard (HCM) and fuzzy c-means (FCM) clustering, the so-called generalized improved partition FCM algorithm, and several versions of the suppressed FCM (s-FCM) in its conventional and generalized form, are included in this study. Accuracy is measured as the average color difference between pixels of the input and output image, while efficiency is mostly characterized by the total runtime of the performed color reduction. Numerical evaluation found all enhanced FCM algorithms more accurate, and four out of seven enhanced algorithms faster than FCM. All tested algorithms can create reduced color images of acceptable quality.
The main goal of clustering algorithms is to organize a given set of data patterns into groups (clusters) and their main strategy is to group patterns based on their similarity. However, some clustering algorithms als...
详细信息
ISBN:
(纸本)9781509035663
The main goal of clustering algorithms is to organize a given set of data patterns into groups (clusters) and their main strategy is to group patterns based on their similarity. However, some clustering algorithms also require as an input parameter, the number of clusters the induced clustering should have, or then, a threshold value used for limiting for the number of induced clusters. Both, the number of cluster as well a threshold value are often unknown;however it is well-known that results of clustering tasks can be very sensitive to them. This work presents a method for empirically estimating both values. The method is based on multiple runs of sequential clustering algorithms, by using increasing threshold values. Results from experiments conducted using several data domains from two repositories, the UCI and the Keel, as well as a few artificially created data, are presented and a comparative analysis is carried out, as evidence of the good estimates on both values given by the method.
clustering algorithms are an important component of data mining technology which has been applied widely in many applications including those that operate on Internet. Recently a new line of research namely Web Intell...
详细信息
ISBN:
(纸本)9780769548807
clustering algorithms are an important component of data mining technology which has been applied widely in many applications including those that operate on Internet. Recently a new line of research namely Web Intelligence emerged that demands for advanced analytics and machine learning algorithms for supporting knowledge discovery mainly in the Web environment. The so called Web Intelligence data are known to be dynamic, loosely structured and consists of complex attributes. To deal with this challenge standard clustering algorithms are improved and evolved with optimization ability by swarm intelligence which is a branch of nature-inspired computing. Some examples are PSO clustering (C-PSO) and clustering with Ant Colony Optimization. The objective of this paper is to investigate the possibilities of applying other nature-inspired optimization algorithms (such as Fireflies, Cuckoos, Bats and Wolves) for performing clustering over Web Intelligence data. The efficacies of each new clustering algorithm are reported in this paper, and in general they outperformed C-PSO.
A method for the initialisation step of clustering algorithms is presented. It is based on the concept of cluster as a high density region of points. The search space is modelled as a set of d-dimensional cells. A sam...
详细信息
ISBN:
(纸本)3540664904
A method for the initialisation step of clustering algorithms is presented. It is based on the concept of cluster as a high density region of points. The search space is modelled as a set of d-dimensional cells. A sample of points is chosen and located into the appropriate cells. Cells are iteratively split as the number of points they receive increases. The regions of the search space having a higher density of points are considered good candidates to contain the true centers of the clusters. Preliminary experimental results show the good quality of the estimated centroids, with respect to the random choice of points. The accuracy of the clusters obtained by running the K-Means algorithm with the two different initialisation techniques - random starting centers chosen uniformly on the datasets, and centers found by our method - is evaluated and the better outcome of the K-Means by using our initialisation method is shown.
Brain storm optimization (BSO) algorithm is a novel swarm intelligence algorithm inspired by human beings' brainstorming process in problems solving. Generally, BSO algorithm has five main steps, which are initial...
详细信息
ISBN:
(纸本)9781479972593
Brain storm optimization (BSO) algorithm is a novel swarm intelligence algorithm inspired by human beings' brainstorming process in problems solving. Generally, BSO algorithm has five main steps, which are initialization, evaluation, clustering, disruption and updating. In these five steps, the clustering step is critical to BSO algorithms. Original BSO algorithms use k-means methods as clustering algorithms, but k-means algorithm is affected by extreme values easily and the speed of algorithm is not high enough. In this paper, a variation of k-means clustering algorithm, called k-medians clustering algorithm, is investigated to replace k-means clustering algorithm. In addition, one modification is applied to both clustering algorithms, which is to replace the calculated cluster center with an individual closest to it. Experimental results show that the effectiveness of BSO does not change obviously, but the higher efficiency can be obtained.
Three approaches to extract clusters sequentially so that the specification of the number of clusters beforehand is unnecessary are introduced and four algorithms are developed. First is derived from possibilistic clu...
详细信息
ISBN:
(纸本)9781424435968
Three approaches to extract clusters sequentially so that the specification of the number of clusters beforehand is unnecessary are introduced and four algorithms are developed. First is derived from possibilistic clustering while the second is a variation of the mountain clustering using medoids as cluster representatives. Moreover an algorithm based on the idea of noise clustering is developed. The last idea is applied to sequential extraction of regression models and we have the fourth algorithm. We compare these algorithms using numerical examples.
This paper proposes a novel framework for speckle noise suppression and edge preservation using clustering algorithms in ultrasound images. The algorithms considered are K-means clustering, fuzzy C-means clustering, p...
详细信息
ISBN:
(纸本)9781450363860
This paper proposes a novel framework for speckle noise suppression and edge preservation using clustering algorithms in ultrasound images. The algorithms considered are K-means clustering, fuzzy C-means clustering, possibilistic C-means, fuzzy possibilistic C-means, and possibilistic fuzzy C-means clustering. This work presents an exhaustive comparative analysis of the above clustering algorithms to consider their suitability for despeckling and identifies the best clustering algorithm. Two types of dataset are considered: medical ultrasound images of the thyroid, and synthetically modelled ultrasound images. The framework consists of several distinct phases - first the edges of the image are identified using the Canny edge operator, and then a clustering algorithm applied on high frequency coefficients extracted using wavelet transform. Finally, the preserved edges are added back to speckle suppressed image. Thus, the proposed clustering method effectively accomplishes both speckle suppression and edge preservation. This paper also presents a quantitative evaluation of results to demonstrate the effectiveness of the clustering approach.
Community structure is a feature of complex networks that can be crucial for the understanding of their internal organization. This is particularly true for brain networks, as the brain functioning is thought to be ba...
详细信息
ISBN:
(纸本)9781509028092
Community structure is a feature of complex networks that can be crucial for the understanding of their internal organization. This is particularly true for brain networks, as the brain functioning is thought to be based on a modular organization. In the last decades, many clustering algorithms were developed with the aim to identify communities in networks of different nature. However, there is still no agreement about which one is the most reliable, and to test and compare these algorithms under a variety of conditions would be beneficial to potential users. In this study, we performed a comparative analysis between six different clustering algorithms, analyzing their performances on a ground-truth consisting of simulated networks with properties spanning a wide range of conditions. Results show the effect of factors like the noise level, the number of clusters, the network dimension and density on the performances of the algorithms and provide some guidelines about the use of the more appropriate algorithm according to the different conditions. The best performances under a wide range of conditions were obtained by Louvain and Leicht & Newman algorithms, while Ronhovde and Infomap proved to be more appropriate in very noisy conditions. Finally, as a proof of concept, we applied the algorithms under exam to brain functional connectivity networks obtained from EEG signals recorded during a sustained movement of the right hand, obtaining a clustering of scalp electrodes which agrees with the results of the simulation study conducted.
Many datasets including social media data and bibliographic data can be modeled as graphs. clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering...
详细信息
ISBN:
(纸本)9781538674741
Many datasets including social media data and bibliographic data can be modeled as graphs. clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.
In the kernel clustering problem we are given a (large) n x n symmetric positive semidefinite matrix A = (a(ij)) with Sigma(n)(i=1) Sigma(n)(j=1) a(ij) = 0 and a (small) k x k symmetric positive semidefinite matrix B ...
详细信息
ISBN:
(纸本)9780898717013
In the kernel clustering problem we are given a (large) n x n symmetric positive semidefinite matrix A = (a(ij)) with Sigma(n)(i=1) Sigma(n)(j=1) a(ij) = 0 and a (small) k x k symmetric positive semidefinite matrix B = (b(ij)) The goal is to find a partition (S(1), ..., S(k)} of {1, n} which maximizes Sigma(k)(i=1) Sigma(k)(j=1) (Sigma(p, q)is an element of S(1) x S(1) a(pq)) b(ij) We design a polynomial time approximation algorithm that achieves an approximation ratio of R(B)(2)/C(B) where R(B) and C(B) are geometric parameters that depend only on the matrix 13, defined as follows if b(ij) = < v(i), v(j)> is the Gram matrix representation of B lot some v(1), v(k) is an element of R(k) then R(B) is the minimum radius of a Euclidean ball containing the points {v(1), ..., v(k)} The parameter C(B) is defined as the maximum over all measurable partitions {A(1), ..., A(k)} of R(k-1) the quantity Sigma(k)(i=1) Sigma(k)(j=1) b(ij) < z(i), z(j)>, where for i is an element of{1, ..., k} the vector z, is an element of R(k-1) is the Gaussian moment of A(i), i e z, =,1/(2 pi)((k-1)/2) integral A ie-||x||(2)(2)/2dx We also show that for eve!): epsilon> 0, achieving an approximation guarantee of (1 - epsilon) R(B)(2)/C(B) is Unique Games hard
暂无评论