Clustering is one of the widely used knowledge discovery techniques to reveal structures in a dataset that can be extremely useful to the analyst. In iterative clustering algorithms the procedure adopted for choosing ...
详细信息
Clustering is one of the widely used knowledge discovery techniques to reveal structures in a dataset that can be extremely useful to the analyst. In iterative clustering algorithms the procedure adopted for choosing initial cluster centers is extremely important as it has a direct impact on the formation of final clusters. Since clusters are separated groups in a feature space, it is desirable to select initial centers which are well separated. In this paper, we have proposed an algorithm to compute initial cluster centers for k-means algorithm. The algorithm is applied to several different datasets in different dimension for illustrative purposes. It is observed that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm. (C) 2011 Elsevier B.V. All rights reserved.
In the fierce competition of the electricity market, how to consolidate and develop customers is particularly important. Aiming to analyze the electricity consumption characteristics of customer groups, this paper use...
详细信息
In the fierce competition of the electricity market, how to consolidate and develop customers is particularly important. Aiming to analyze the electricity consumption characteristics of customer groups, this paper used a k-means algorithm and optimized it. The number of clusters was determined by the Davies-Bouldin index (DBI). An improved Harris Hawks optimization (IHHO) algorithm was designed to realize the initial cluster center selection. Based on data such as electricity purchase and average electricity price, electricity customer groups were clustered using the IHHO-k-means algorithm. The IHHO-k-means algorithm achieved the best clustering effect on Iris, Wine, and Glass datasets compared with the traditional k-means and PSO-k-means algorithms. Taking Iris as an example, the optimal value of the IHHO-k-means algorithm was 96.538, with an accuracy rate of 0.932, precision and recall rates of 0.941 and 0.793, respectively, an F-measure of 0.861, and an area under the curve (AUC) value of 0.851. In the customer dataset, the number of clusters determined by DBI was 4. The power customers were divided into four groups with different characteristics of electricity consumption, and their electricity consumption behaviors were analyzed. The results prove the reliability of the IHHO-k-means algorithm in analyzing electricity consumption characteristics of customer groups, and it can be applied in practice.
Correcting interferometric synthetic aperture radar (InSAR) interferograms using Global Navigation Satellite System (GNSS) data can effectively improve their accuracy. However, most of the existing correction methods ...
详细信息
Correcting interferometric synthetic aperture radar (InSAR) interferograms using Global Navigation Satellite System (GNSS) data can effectively improve their accuracy. However, most of the existing correction methods utilize the difference between GNSS and InSAR data for surface fitting;these methods can effectively correct overall long-wavelength errors, but they are insufficient for multiple medium-wavelength errors in localized areas. Based on this, we propose a method for correcting InSAR interferograms using GNSS data and the k-means spatial clustering algorithm, which is capable of obtaining correction information with high accuracy, thus improving the overall and localized area error correction effects and contributing to obtaining high-precision InSAR deformation time series. In an application involving the Central Valley of Southern California (CVSC), the experimental results show that the proposed correction method can effectively compensate for the deficiency of surface fitting in capturing error details and suppress the effect of low-quality interferograms. At the nine GNSS validation sites that are not included in the modeling process, the errors in the ascending track 137A and descending track 144D are mostly less than 15 mm, and the average root mean square error values are 11.8 mm and 8.0 mm, respectively. Overall, the correction method not only realizes effective interferogram error correction, but also has the advantages of high accuracy, high efficiency, ease of promotion, and can effectively address large-scale and high-precision deformation monitoring scenarios.
The k-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic al to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the k-me...
详细信息
The k-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic al to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the k-means algorithm. Results obtained are very encouraging and in most of the cases, on data sets having well separated clusters, the proposed scheme reached a global minimum.
This paper gives a comparative study of the k-means algorithm and the mixture model (MM) method for clustering normal data. The EM algorithm is used to compute the maximum likelihood estimators (MLEs) of the parameter...
详细信息
This paper gives a comparative study of the k-means algorithm and the mixture model (MM) method for clustering normal data. The EM algorithm is used to compute the maximum likelihood estimators (MLEs) of the parameters of the MM model. These parameters include mixing proportions, which may be thought of as the prior probabilities of different clusters: the maximum posterior (Bayes) rule is used for clustering. Hence, asymptotically the MM method approaches the Bayes rule for known parameters, which is optimal in terms of minimizing the expected misclassification rate (EMCR). The paper gives a thorough analytic comparison of the two methods for the univariate case Under both homoscedasticity and heteroscedasticity. Simulation results are given to compare the two methods for a range of sample sizes. The comparison, which is limited to two clusters, shows that the MM method has substantially lower EMCR particularly when the mixing proportions are unbalanced. The two methods have asymptotically the same EMCR under homoscedasticity (resp., heteroscedasticity) when the mixing proportions of the two clusters are equal (resp., unequal), but for small samples the MM method sometimes performs slightly worse because of the errors in estimating unknown parameters. (C) 2007 Elsevier B.V. All rights reserved.
The latest research in the field of recognition of image characters has led to various developments in the modern technological works for the improvement of recognition rate and precision. This technology is significa...
详细信息
The latest research in the field of recognition of image characters has led to various developments in the modern technological works for the improvement of recognition rate and precision. This technology is significant in the field of character recognition, business card recognition, document recognition, vehicle license plate recognition etc. for smart city planning, thus its effectiveness should be improved. In order to improve the accuracy of image text recognition effectively, this article uses canny algorithm to process edge detection of text, and k-means algorithm for cluster pixel recognition. This unique combination combined with maximally stable extremal region and optimization of stroke width for image text yields better results in terms of recognition rate, recall, precision, F-score and accuracy. The results show that the correct recognition rate is 88.3% and 72.4% respectively with an accuracy value of 90.5% for the proposed method. This algorithm has high image text recognition rate, can recognize images taken in complex environment, and has good noise removal function. It is significantly an optimal algorithm for image text recognition.
Clustering is considered as one of the important methods in data mining. The performance of the k-means algorithm, as one of the most common clustering methods, is high sensitivity to the initial cluster centers. Henc...
详细信息
Clustering is considered as one of the important methods in data mining. The performance of the k-means algorithm, as one of the most common clustering methods, is high sensitivity to the initial cluster centers. Hence, selecting appropriate initial cluster centers for implementing the algorithm improves clustering resulted from the algorithm. The present study aims to find suitable initial cluster centers for the k-means. In fact, the initial cluster centers should be selected in such a way that clusters with high separation and high density can be obtained. Therefore, in this paper, finding initial cluster centers is considered as a multi-objective optimization problem through maximizing the distance between the initial cluster centers, as well as the neighbor density of the initial cluster centers. Solving the above problem through using the MOPSO algorithm provided a set of initial cluster centers of the candidate. Then, the hesitant fuzzy sets were used to evaluate the clusters generated from initial cluster centers by considering separation, cohesion and silhouette index. After that, the concept of informational energy of hesitant fuzzy sets is used, by which non-dominated particles in the Pareto optimal set were ranked and the initial cluster centers were selected for starting the k-means algorithm. The proposed HFSMOOk-means method was compared with several clustering algorithms by considering common and widely used criteria. The results indicated the successful performance of HFSMOOk-means in the majority of the datasets compared to the other algorithms.
The k-means algorithm is commonly used with the Euclidean metric. While the use of Mahalanobis distances seems to be a straightforward extension of the algorithm, the initial estimation of covariance matrices can be c...
详细信息
The k-means algorithm is commonly used with the Euclidean metric. While the use of Mahalanobis distances seems to be a straightforward extension of the algorithm, the initial estimation of covariance matrices can be complicated. We propose a novel approach for initializing covariance matrices. (C) 2013 Elsevier B.V. All rights reserved.
With the rapid development of the computer level, especially in recent years, "Internet +," cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantitie...
详细信息
With the rapid development of the computer level, especially in recent years, "Internet +," cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.
k-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo...
详细信息
k-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved k-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved k-means algorithm has been greatly improved and the clustering results are more stable.
暂无评论