In this paper, a weight selection procedure in the W-k-means algorithm is proposed based on the statistical variation viewpoint. This approach can solve the W-k-means algorithm's problem that the clustering qualit...
详细信息
In this paper, a weight selection procedure in the W-k-means algorithm is proposed based on the statistical variation viewpoint. This approach can solve the W-k-means algorithm's problem that the clustering quality is greatly affected by the initial value of weight. After the statistics of data, the weights of data are designed to provide more information for the character of W-k-means algorithm so as to improve the precision. Furthermore, the corresponding computational complexity is analyzed as well. We compare the clustering results of the W-k-means algorithm with the different initialization methods. Results from color image segmentation illustrate that the proposed procedure produces better segmentation than the random initialization according to Liu and Yang's (1994) evaluation function. (C) 2011 Elsevier Ltd. All rights reserved.
The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. ...
详细信息
The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms. (C) 2010 Elsevier Ltd. All rights reserved.
Conventional algorithms fail to obtain satisfactory background segmentation results for underwater images. In this study, an improved k-means algorithm was developed for underwater image background segmentation to add...
详细信息
Conventional algorithms fail to obtain satisfactory background segmentation results for underwater images. In this study, an improved k-means algorithm was developed for underwater image background segmentation to address the issue of improper k value determination and minimize the impact of initial centroid position of grayscale image during the gray level quantization of the conventional k-means algorithm. A total of 100 underwater images taken by an underwater robot were sampled to test the aforementioned algorithm in respect of background segmentation validity and time cost. The k value and initial centroid position of grayscale image were optimized. The results were compared to the other three existing algorithms, including the conventional k-means algorithm, the improved Otsu algorithm, and the Canny operator edge extraction method. The experimental results showed that the improved k-means underwater background segmentation algorithm could effectively segment the background of underwater images with a low color cast, low contrast, and blurred edges. Although its cost in time was higher than that of the other three algorithms, it none the less proved more efficient than the time-consuming manual segmentation method. The algorithm proposed in this paper could potentially be used in underwater environments for underwater background segmentation.
Clustering is one of the widely used knowledge discovery techniques to reveal structures in a dataset that can be extremely useful to the analyst. In iterative clustering algorithms the procedure adopted for choosing ...
详细信息
Clustering is one of the widely used knowledge discovery techniques to reveal structures in a dataset that can be extremely useful to the analyst. In iterative clustering algorithms the procedure adopted for choosing initial cluster centers is extremely important as it has a direct impact on the formation of final clusters. Since clusters are separated groups in a feature space, it is desirable to select initial centers which are well separated. In this paper, we have proposed an algorithm to compute initial cluster centers for k-means algorithm. The algorithm is applied to several different datasets in different dimension for illustrative purposes. It is observed that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm. (C) 2011 Elsevier B.V. All rights reserved.
Recently a modified k-means algorithm for vector quantization design has been proposed where the codevector updating step is as follows: new codevector = current codevector + scale factor (new centroid - current codev...
详细信息
Recently a modified k-means algorithm for vector quantization design has been proposed where the codevector updating step is as follows: new codevector = current codevector + scale factor (new centroid - current codevector), This algorithm uses a fixed value for the scale factor. In this paper, we propose the use of a variable scale factor which is a function of the iteration number. For the vector quantization of image data, we show that it offers faster convergence than the modified k-means algorithm with a fixed scale factor, without affecting the optimality of the codebook.
In this paper, we aim to compare empirically four initialization methods for the k-means algorithm: random, Forgy, MacQueen and kaufman. Although this algorithm is known for its robustness, it is widely reported in th...
详细信息
In this paper, we aim to compare empirically four initialization methods for the k-means algorithm: random, Forgy, MacQueen and kaufman. Although this algorithm is known for its robustness, it is widely reported in the literature that its performance depends upon two key points: initial clustering and instance order. We conduct a series of experiments to draw up tin terms of mean, maximum, minimum and standard deviation) the probability distribution of the square-error values of the final clusters returned by the k-means algorithm independently on any initial clustering and on any instance order when each of the four initialization methods is used. The results of our experiments illustrate that the random and the kaufman initialization methods outperform the rest of the compared methods as they make the k-means more effective and more independent on initial clustering and on instance order. In addition, we compare the convergence speed of the k-means algorithm when using each of the four initialization methods. Our results suggest that the kaufman initialization method induces to the k-means algorithm a more desirable behaviour with respect to the convergence speed than the random initialization method. (C) 1999 Elsevier Science B.V. All rights reserved.
In the fierce competition of the electricity market, how to consolidate and develop customers is particularly important. Aiming to analyze the electricity consumption characteristics of customer groups, this paper use...
详细信息
In the fierce competition of the electricity market, how to consolidate and develop customers is particularly important. Aiming to analyze the electricity consumption characteristics of customer groups, this paper used a k-means algorithm and optimized it. The number of clusters was determined by the Davies-Bouldin index (DBI). An improved Harris Hawks optimization (IHHO) algorithm was designed to realize the initial cluster center selection. Based on data such as electricity purchase and average electricity price, electricity customer groups were clustered using the IHHO-k-means algorithm. The IHHO-k-means algorithm achieved the best clustering effect on Iris, Wine, and Glass datasets compared with the traditional k-means and PSO-k-means algorithms. Taking Iris as an example, the optimal value of the IHHO-k-means algorithm was 96.538, with an accuracy rate of 0.932, precision and recall rates of 0.941 and 0.793, respectively, an F-measure of 0.861, and an area under the curve (AUC) value of 0.851. In the customer dataset, the number of clusters determined by DBI was 4. The power customers were divided into four groups with different characteristics of electricity consumption, and their electricity consumption behaviors were analyzed. The results prove the reliability of the IHHO-k-means algorithm in analyzing electricity consumption characteristics of customer groups, and it can be applied in practice.
The k-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic al to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the k-me...
详细信息
The k-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic al to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the k-means algorithm. Results obtained are very encouraging and in most of the cases, on data sets having well separated clusters, the proposed scheme reached a global minimum.
The k-means algorithm is commonly used with the Euclidean metric. While the use of Mahalanobis distances seems to be a straightforward extension of the algorithm, the initial estimation of covariance matrices can be c...
详细信息
The k-means algorithm is commonly used with the Euclidean metric. While the use of Mahalanobis distances seems to be a straightforward extension of the algorithm, the initial estimation of covariance matrices can be complicated. We propose a novel approach for initializing covariance matrices. (C) 2013 Elsevier B.V. All rights reserved.
Correcting interferometric synthetic aperture radar (InSAR) interferograms using Global Navigation Satellite System (GNSS) data can effectively improve their accuracy. However, most of the existing correction methods ...
详细信息
Correcting interferometric synthetic aperture radar (InSAR) interferograms using Global Navigation Satellite System (GNSS) data can effectively improve their accuracy. However, most of the existing correction methods utilize the difference between GNSS and InSAR data for surface fitting;these methods can effectively correct overall long-wavelength errors, but they are insufficient for multiple medium-wavelength errors in localized areas. Based on this, we propose a method for correcting InSAR interferograms using GNSS data and the k-means spatial clustering algorithm, which is capable of obtaining correction information with high accuracy, thus improving the overall and localized area error correction effects and contributing to obtaining high-precision InSAR deformation time series. In an application involving the Central Valley of Southern California (CVSC), the experimental results show that the proposed correction method can effectively compensate for the deficiency of surface fitting in capturing error details and suppress the effect of low-quality interferograms. At the nine GNSS validation sites that are not included in the modeling process, the errors in the ascending track 137A and descending track 144D are mostly less than 15 mm, and the average root mean square error values are 11.8 mm and 8.0 mm, respectively. Overall, the correction method not only realizes effective interferogram error correction, but also has the advantages of high accuracy, high efficiency, ease of promotion, and can effectively address large-scale and high-precision deformation monitoring scenarios.
暂无评论