A new density peak clustering (dpc) algorithm with adaptive clustering center based on differential privacy was proposed to solve the problems of poor adaptability of high-dimensional data, inability to automatically ...
详细信息
A new density peak clustering (dpc) algorithm with adaptive clustering center based on differential privacy was proposed to solve the problems of poor adaptability of high-dimensional data, inability to automatically determine clustering centers, and privacy problems in clustering analysis. First, to solve the problem of poor adaptability of high-dimensional data, cosine distance was used to measure the similarity between high-dimensional datasets. Then, aiming at the subjective problem of clustering center selection, from the perspective of ranking graph, the weight (i - 1)/i was introduced creatively, the slope trend of ranking graph was redefined to realize the adaptive clustering center. Finally, aiming at the privacy problem, the Laplacian noise of appropriate privacy budget was added to the core statistic (local density) of the algorithm to achieve the balance between privacy protection and algorithm effectiveness. Experimental results on both the synthetic and UCI datasets show that this algorithm can not only realize the automatic selection of clustering center, but also solve the privacy problem in clustering analysis, and improve the clustering evaluation index greatly, which proves the effectiveness of the algorithm.
Given a large unlabeled set of complex data, how to efficiently and effectively group them into clusters remains a challenging problem. Density peaks clustering (dpc) algorithm is an emerging algorithm, which identifi...
详细信息
Given a large unlabeled set of complex data, how to efficiently and effectively group them into clusters remains a challenging problem. Density peaks clustering (dpc) algorithm is an emerging algorithm, which identifies cluster centers based on a decision graph. Without setting the number of cluster centers, dpc can effectively recognize the clusters. However, the similarity between every two data points must be calculated to construct a decision graph, which results in high computational complexity. To overcome this issue, we propose a fast sparse search density peaks clustering (FSdpc) algorithm to enhance the dpc, which constructs a decision graph with fewer similarity calculations to identify cluster centers quickly. In FSdpc, we design a novel sparse search strategy to measure the similarity between the nearest neighbors of each data points. Therefore, FSdpc can enhance the efficiency of the dpc while maintaining satisfactory results. We also propose a novel random third-party data point method to search the nearest neighbors, which introduces no additional parameters or high computational complexity. The experimental results on synthetic datasets and real-world datasets indicate that the proposed algorithm consistently outperforms the dpc and other state-of-the-art algorithms. (C) 2020 Elsevier Inc. All rights reserved.
Density peaks clustering (dpc) algorithm is proposed to identify the cluster centers quickly by drawing a decision-graph without any prior knowledge. Meanwhile, dpc obtains arbitrary clusters with fewer parameters and...
详细信息
Density peaks clustering (dpc) algorithm is proposed to identify the cluster centers quickly by drawing a decision-graph without any prior knowledge. Meanwhile, dpc obtains arbitrary clusters with fewer parameters and no iteration. However, dpc has some shortcomings to be addressed before it is widely applied. Firstly, dpc is not suitable for manifold datasets because these datasets have multiple density peaks in one cluster. Secondly, the cut-off distance parameter has a great influence on the algorithm, especially on small-scale datasets. Thirdly, the method of decision-graph will cause uncertain cluster centers, which leads to wrong clustering. To address these issues, we propose a robust density peaks clustering algorithm with density-sensitive similarity (Rdpc-DSS) to find accurate cluster centers on the manifold datasets. With density-sensitive similarity, the influence of the parameters on the clustering results is reduced. In addition, a novel density clustering index (DCI) instead of the decision-graph is designed to automatically determine the number of cluster centers. Extensive experimental results show that Rdpc-DSS outperforms dpc and other state-of-the-art algorithms on the manifold datasets. (C) 2020 Elsevier B.V. All rights reserved.
Density peaks clustering (dpc) algorithm is a novel algorithm that efficiently deals with the complex structure of the data sets by finding the density peaks. It needs neither iterative process nor more parameters. Th...
详细信息
Density peaks clustering (dpc) algorithm is a novel algorithm that efficiently deals with the complex structure of the data sets by finding the density peaks. It needs neither iterative process nor more parameters. The density-distance is utilized to find the density peaks in the dpc algorithm. But unfortunately, it will divide one cluster into multiple clusters if there are multiple density peaks in one cluster and ineffective when data sets have relatively higher dimensions. To overcome the first problem, we propose a Fdpc algorithm based on a novel merging strategy motivated by support vector machine. First, the strategy utilizes the support vectors to calculate the feedback values between every two clusters after clustering based on the dpc. Then, it merges clusters to obtain accurate clustering results in a recursive way according to the feedback values. To address the second limitation, we introduce nonnegative matrix factorization into the Fdpc to preprocess high-dimensional data sets before clustering. The experimental results on real-world data sets and artificial data sets demonstrate that our algorithm is robust and flexible and can recognize arbitrary shapes of the clusters effectively regardless of the space dimension and outperforms dpc.
To deal with the complex structure of the data set, density peaks clustering algorithm (dpc) was proposed in 2014. The density and the delta-distance are utilized to find the clustering centers in the dpc method. It d...
详细信息
To deal with the complex structure of the data set, density peaks clustering algorithm (dpc) was proposed in 2014. The density and the delta-distance are utilized to find the clustering centers in the dpc method. It detects outliers efficiently and finds clusters of arbitrary shape. But unfortunately, we need to calculate the distance between all data points in the first process, which limits the running speed of dpc algorithm on large datasets. To address this issue, this paper introduces a novel approach based on grid, called density peaks clustering algorithm based on grid (dpcG). This approach can overcome the operation efficiency problem. When calculating the local density, the idea of the grid is introduced to reduce the computation time based on the dpc algorithm. Neither it requires calculating all the distances nor much input parameters. Moreover, dpcG algorithm successfully inherits the all merits of the dpc algorithm. Experimental results on UCI data sets and artificial data show that the dpcG algorithm is flexible and effective.
Density peaks clustering algorithm (dpc) relies on local-density and relative-distance of dataset to find cluster centers. However, the calculation of these attributes is based on Euclidean distance simply, and dpc is...
详细信息
ISBN:
(纸本)9783030008284;9783030008277
Density peaks clustering algorithm (dpc) relies on local-density and relative-distance of dataset to find cluster centers. However, the calculation of these attributes is based on Euclidean distance simply, and dpc is not satisfactory when dataset's density is uneven or dimension is higher. In addition, parameter d(c) only considers the global distribution of the dataset, a little change of d(c) has a great influence on small-scale dataset clustering. Aiming at these drawbacks, this paper proposes a mass-based density peaks clustering algorithm (Mdpc). Mdpc introduces a mass-based similarity measure method to calculate the new similarity matrix. After that, K-nearest neighbour information of the data is obtained according to the new similarity matrix, and then Mdpc redefines the local density based on the K-nearest neighbour information. Experimental results show that Mdpc is superior to dpc, and satisfied on datasets with uneven density and higher dimensions, which also avoids the influence of d(c) on the small-scale datasets.
暂无评论