检索结果-内蒙古大学图书馆

A new algorithm for initial cluster centers in k-means algorithm

PATTERN RECOGNITION LETTERS 2011年第14期32卷 1701-1705页

作者： Erisoglu, Murat Calis, Nazif Sakallioglu, Sadullah Cukurova Univ Fac Sci & Letters Dept Stat TR-01300 Adana Turkey

Clustering is one of the widely used knowledge discovery techniques to reveal structures in a dataset that can be extremely useful to the analyst. In iterative clustering algorithms the procedure adopted for choosing initial cluster centers is extremely important as it has a direct impact on the formation of final clusters. Since clusters are separated groups in a feature space, it is desirable to select initial centers which are well separated. In this paper, we have proposed an algorithm to compute initial cluster centers for k-means algorithm. The algorithm is applied to several different datasets in different dimension for illustrative purposes. It is observed that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm. (C) 2011 Elsevier B.V. All rights reserved.

关键词： k-means algorithm Initial cluster centers Rand index Error percentage Wilks' lambda test statistic

来源：评论

学校读者我要写书评

暂无评论

Behavioral analysis of electricity consumption characteristics for customer groups using the k-means algorithm

引用

SYSTEMS AND SOFT COMPUTING 2024年 6卷

作者： Wu, Ruobing Yunnan Power Grid Co Informat Ctr Kunming 650217 Yunnan Peoples R China

In the fierce competition of the electricity market, how to consolidate and develop customers is particularly important. Aiming to analyze the electricity consumption characteristics of customer groups, this paper used a k-means algorithm and optimized it. The number of clusters was determined by the Davies-Bouldin index (DBI). An improved Harris Hawks optimization (IHHO) algorithm was designed to realize the initial cluster center selection. Based on data such as electricity purchase and average electricity price, electricity customer groups were clustered using the IHHO-k-means algorithm. The IHHO-k-means algorithm achieved the best clustering effect on Iris, Wine, and Glass datasets compared with the traditional k-means and PSO-k-means algorithms. Taking Iris as an example, the optimal value of the IHHO-k-means algorithm was 96.538, with an accuracy rate of 0.932, precision and recall rates of 0.941 and 0.793, respectively, an F-measure of 0.861, and an area under the curve (AUC) value of 0.851. In the customer dataset, the number of clusters determined by DBI was 4. The power customers were divided into four groups with different characteristics of electricity consumption, and their electricity consumption behaviors were analyzed. The results prove the reliability of the IHHO-k-means algorithm in analyzing electricity consumption characteristics of customer groups, and it can be applied in practice.

关键词： k-means algorithm Electricity consumption characteristic Behavior analysis Yunnan power grid Electricity customer

来源：评论

学校读者我要写书评

暂无评论

A method for correcting InSAR interferogram errors using GNSS data and the k-means algorithm

引用

EARTH PLANETS AND SPACE 2024年第1期76卷 1-19页

作者： Yan, Huineng Dai, Wujiao Xu, Wenbin Shi, Qiang Sun, kai Lu, Zhigang Wang, Rui Cent South Univ Sch Geosci & Info Phys Changsha 410083 Hunan Peoples R China Gannan Univ Sci & Technol Sch Resources & Civil Engn Ganzhou 341000 Jiangxi Peoples R China Ganzhou Key Lab Remote Sensing Resource & Environm Ganzhou 341000 Jiangxi Peoples R China Jiangsu Ocean Univ Sch Marine Technol & Geomatics Lianyungang 222005 Jiangsu Peoples R China

Correcting interferometric synthetic aperture radar (InSAR) interferograms using Global Navigation Satellite System (GNSS) data can effectively improve their accuracy. However, most of the existing correction methods utilize the difference between GNSS and InSAR data for surface fitting;these methods can effectively correct overall long-wavelength errors, but they are insufficient for multiple medium-wavelength errors in localized areas. Based on this, we propose a method for correcting InSAR interferograms using GNSS data and the k-means spatial clustering algorithm, which is capable of obtaining correction information with high accuracy, thus improving the overall and localized area error correction effects and contributing to obtaining high-precision InSAR deformation time series. In an application involving the Central Valley of Southern California (CVSC), the experimental results show that the proposed correction method can effectively compensate for the deficiency of surface fitting in capturing error details and suppress the effect of low-quality interferograms. At the nine GNSS validation sites that are not included in the modeling process, the errors in the ascending track 137A and descending track 144D are mostly less than 15 mm, and the average root mean square error values are 11.8 mm and 8.0 mm, respectively. Overall, the correction method not only realizes effective interferogram error correction, but also has the advantages of high accuracy, high efficiency, ease of promotion, and can effectively address large-scale and high-precision deformation monitoring scenarios.

关键词： InSAR GNSS Error correction k-means algorithm Interferogram quality High-precision deformation time series

来源：评论

学校读者我要写书评

暂无评论

A NEAR-OPTIMAL INITIAL SEED VALUE SELECTION IN k-means algorithm USING A GENETIC algorithm

引用

PATTERN RECOGNITION LETTERS 1993年第10期14卷 763-769页

作者： BABU, GP MURTY, MN INDIAN INST SCI DEPT COMP SCI & AUTOMAT BANGALORE 560012 KARNATAKA INDIA

The k-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic al to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the k-means algorithm. Results obtained are very encouraging and in most of the cases, on data sets having well separated clusters, the proposed scheme reached a global minimum.

关键词： CLUSTERING SEED VALUES OPTIMAL PARTITION GENETIC algorithmS k-means algorithm

来源：评论

学校读者我要写书评

暂无评论

A comparative study of the k-means algorithm and the normal mixture model for clustering:: Univariate case

引用

JOURNAL OF STATISTICAL PLANNING AND INFERENCE 2007年第11期137卷 3722-3740页

作者： Qiu, Dingxi Tamhane, Ajit C. Northwestern Univ Dept Ind Engn & Management Sci Evanston IL 60208 USA

This paper gives a comparative study of the k-means algorithm and the mixture model (MM) method for clustering normal data. The EM algorithm is used to compute the maximum likelihood estimators (MLEs) of the parameters of the MM model. These parameters include mixing proportions, which may be thought of as the prior probabilities of different clusters: the maximum posterior (Bayes) rule is used for clustering. Hence, asymptotically the MM method approaches the Bayes rule for known parameters, which is optimal in terms of minimizing the expected misclassification rate (EMCR). The paper gives a thorough analytic comparison of the two methods for the univariate case Under both homoscedasticity and heteroscedasticity. Simulation results are given to compare the two methods for a range of sample sizes. The comparison, which is limited to two clusters, shows that the MM method has substantially lower EMCR particularly when the mixing proportions are unbalanced. The two methods have asymptotically the same EMCR under homoscedasticity (resp., heteroscedasticity) when the mixing proportions of the two clusters are equal (resp., unequal), but for small samples the MM method sometimes performs slightly worse because of the errors in estimating unknown parameters. (C) 2007 Elsevier B.V. All rights reserved.

关键词： Bayes rule clustering data mining EM algorithm k-means algorithm misclassification rate mixture model prior and posterior probabilities

来源：评论

学校读者我要写书评

暂无评论

Research on image text recognition based on canny edge detection algorithm and k-means algorithm

引用

INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT 2022年第SUPPL 1期13卷 72-80页

作者： Wu, Fangsheng Zhu, Changan Xu, Jinxiu Bhatt, Mohammed Wasim Sharma, Ashutosh Anhui Business Vocat Coll Inst Informat Engn Hefei 231131 Anhui Peoples R China Univ Sci & Technol China Hefei 230026 Anhui Peoples R China Cent Univ Punjab Bathinda India Southern Fed Univ Inst Comp Technol & Informat Secur Southern Fed Dist Russia

The latest research in the field of recognition of image characters has led to various developments in the modern technological works for the improvement of recognition rate and precision. This technology is significant in the field of character recognition, business card recognition, document recognition, vehicle license plate recognition etc. for smart city planning, thus its effectiveness should be improved. In order to improve the accuracy of image text recognition effectively, this article uses canny algorithm to process edge detection of text, and k-means algorithm for cluster pixel recognition. This unique combination combined with maximally stable extremal region and optimization of stroke width for image text yields better results in terms of recognition rate, recall, precision, F-score and accuracy. The results show that the correct recognition rate is 88.3% and 72.4% respectively with an accuracy value of 90.5% for the proposed method. This algorithm has high image text recognition rate, can recognize images taken in complex environment, and has good noise removal function. It is significantly an optimal algorithm for image text recognition.

关键词： Canny edge detection algorithm k-means algorithm Image text recognition Noise removal

来源：评论

学校读者我要写书评

暂无评论

HFSMOOk-means: An Improved k-means algorithm Using Hesitant Fuzzy Sets and Multi-objective Optimization

引用

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2020年第8期45卷 6241-6257页

作者： Rezaei, kamran Rezaei, Hassan Univ Sistan & Baluchestan Fac Math Dept Comp Sci Zahedan Iran

Clustering is considered as one of the important methods in data mining. The performance of the k-means algorithm, as one of the most common clustering methods, is high sensitivity to the initial cluster centers. Hence, selecting appropriate initial cluster centers for implementing the algorithm improves clustering resulted from the algorithm. The present study aims to find suitable initial cluster centers for the k-means. In fact, the initial cluster centers should be selected in such a way that clusters with high separation and high density can be obtained. Therefore, in this paper, finding initial cluster centers is considered as a multi-objective optimization problem through maximizing the distance between the initial cluster centers, as well as the neighbor density of the initial cluster centers. Solving the above problem through using the MOPSO algorithm provided a set of initial cluster centers of the candidate. Then, the hesitant fuzzy sets were used to evaluate the clusters generated from initial cluster centers by considering separation, cohesion and silhouette index. After that, the concept of informational energy of hesitant fuzzy sets is used, by which non-dominated particles in the Pareto optimal set were ranked and the initial cluster centers were selected for starting the k-means algorithm. The proposed HFSMOOk-means method was compared with several clustering algorithms by considering common and widely used criteria. The results indicated the successful performance of HFSMOOk-means in the majority of the datasets compared to the other algorithms.

关键词： Clustering k-means algorithm Multi-objective optimization MOPSO Hesitant fuzzy sets

来源：评论

学校读者我要写书评

暂无评论

On k-means algorithm with the use of Mahalanobis distances

引用

STATISTICS & PROBABILITY LETTERS 2014年第1期84卷 88-95页

作者： Melnykoy, Igor Melnykov, Volodymyr Nazarbayev Univ Astana 010000 Kazakhstan Univ Alabama Tuscaloosa AL 35487 USA

The k-means algorithm is commonly used with the Euclidean metric. While the use of Mahalanobis distances seems to be a straightforward extension of the algorithm, the initial estimation of covariance matrices can be complicated. We propose a novel approach for initializing covariance matrices. (C) 2013 Elsevier B.V. All rights reserved.

关键词： k-means algorithm Mahalanobis distance Initialization

来源：评论

学校读者我要写书评

暂无评论

Improvement of k-means algorithm for Accelerated Big Data Clustering

引用

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH 2021年第2期14卷 99-119页

作者： Wu, Chunqiong Yan, Bingwen Yu, Rongrui Huang, Zhangshu Yu, Baoqin Yu, Yanliang Chen, Na Zhou, Xiukao Yango Univ Business Coll Fuzhou Peoples R China Fujian Univ Big Data Business Intelligence Engn Res Ctr Fuzhou Peoples R China

With the rapid development of the computer level, especially in recent years, "Internet +," cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.

关键词： Big Data Clustering Clustering algorithms k-means algorithm

来源：评论

学校读者我要写书评

暂无评论

引用

Journal of Beijing Institute of Technology 2015年第4期24卷 566-572页

作者：曹奇敏郭巧吴向华 School of Automation Beijing Institute of Technology

k-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved k-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved k-means algorithm has been greatly improved and the clustering results are more stable.

关键词： text clustering k-means algorithm similarity matrix F-measure

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：