We present an algorithm for clustering high dimensional streaming data. The algorithm incorporates dimension reduction into the streamclustering framework. When a new datum arrives, the algorithm performs dimension r...
详细信息
We present an algorithm for clustering high dimensional streaming data. The algorithm incorporates dimension reduction into the streamclustering framework. When a new datum arrives, the algorithm performs dimension reduction to find a local projected subspace using unsupervised LDA (Linear Discriminant Analysis)-based method. The obtained local subspace would maximally separate the nearby micro-clusters with respect to the incoming point. Then, the incoming point is assigned to a micro-cluster in the projected space, rather than in the full dimensional space. The experimental results show that the proposed algorithm outperforms its counterpart streaming clustering algorithms. Moreover, when compared with traditional clustering algorithms which require the whole data set, the proposed algorithms shows comparable clustering performances with much less computation time for large data sets. (C) 2016 Elsevier Inc. All rights reserved.
We present an incremental graph-based clustering algorithm whose design was motivated by a need to extract and retain meaningful information from datastreams produced by applications such as large scale surveillance,...
详细信息
We present an incremental graph-based clustering algorithm whose design was motivated by a need to extract and retain meaningful information from datastreams produced by applications such as large scale surveillance, network packet inspection and financial transaction monitoring. To this end, the method we propose utilises representative points to both incrementally cluster new data and to selectively retain important cluster information within a knowledge repository. The repository can then be subsequently used to assist in the processing of new data, the archival of critical features for off-line analysis, and in the identification of recurrent patterns. Crown Copyright (C) 2008 Published by Elsevier B.V. All rights reserved.
This paper presents a novel incremental density-based clustering framework using the one-pass scheme, named Fuzzy Incremental Density-based clustering (FIDC). Employing one-pass clustering in which each data point is ...
详细信息
This paper presents a novel incremental density-based clustering framework using the one-pass scheme, named Fuzzy Incremental Density-based clustering (FIDC). Employing one-pass clustering in which each data point is processed once and discarded, FIDC can process large datasets with less computation time and memory, compared to its density-based clustering counterparts. Fuzzy local clustering is employed in local clusters assignment process to reduce clustering inconsistencies from one-pass clustering. To improve the clustering performance and simplify the parameter choosing process, the modified valley seeking algorithm is used to adaptively determine the outlier thresholds for generating the final clusters. FIDC can operate in both traditional and stream data clustering. The experimental results show that FIDC outperforms state-of-the-art algorithms in both clustering modes. (C) 2020 Elsevier Inc. All rights reserved.
In this paper, we present a new approach for updating clusters incrementally. The proposed incremental approach preserves comprehensive statistical information of the clusters in form of Gaussian Mixture Models (GMM)....
详细信息
ISBN:
(纸本)9783319183565;9783319183558
In this paper, we present a new approach for updating clusters incrementally. The proposed incremental approach preserves comprehensive statistical information of the clusters in form of Gaussian Mixture Models (GMM). As each GMM needs the number of Gaussian (component) as an input parameter, we proposed a method to determine the number of components automatically with introducing the concept of core points. In the updating phase, instead of processing each new sample individually, we collect the new incoming samples and cluster them. By employing the concepts of core points and GMMs, we build a number of GMMs for the new samples and we label the new GMMs based on their similarity to the already existing GMMs. To find the similarity among GMMs, we introduce a new modified version of Kullback-Leibler as a distance function. For merging the current GMMs and the new GMMs, we proposed a new merging mechanism in which the closest components in both GMMs are merged to create a new GMM. Since GMM structure is a compact representation of clusters, there is no increase in the time neither in clustering side nor in updating phase. We measured the accuracy of clusters based on different clustering validity metrics (DB, Dunn, SD and purity) and the results show that our algorithm outperforms other incremental clustering algorithms in terms of quality of the final clusters.
streaming data arrives continually and is characterized by fast, massive, dynamic evolution and instability. Different from traditional static dataclustering, streaming dataclustering algorithms need to consider con...
详细信息
streaming data arrives continually and is characterized by fast, massive, dynamic evolution and instability. Different from traditional static dataclustering, streaming dataclustering algorithms need to consider concept drift, outlier handling, identification and updating of dynamic clustering patterns, etc. DENCLUE is one of the most classical algorithms, which adopts nonparametric estimation and utilizes a finite number of samples to make inferences, to get the distribution of the overall data. However, the basic DENCLUE algorithm suffers from the problem that the Kernel Density Estimation (KDE) window width and density threshold parameter are difficult to choose, which cannot be directly applied to streaming dataclustering. Therefore, in this paper, we propose a dual strategies improved DENCLUE streaming dataclustering method based on KDE optimization and two-stage clustering, which takes into account the concept drift problem in streaming data. Firstly, a density threshold parameter optimization method based on KDE is proposed to address the challenges associated with selecting the KDE window width and density threshold in the traditional DENCLUE algorithm. Secondly, a two-stage clustering and merging method is designed to improve the performance of traditional DENCLUE clustering. The experimental results show that our algorithm outperforms the traditional Clustream and Denstream algorithms on datasets with arbitrary shapes and sizes, and has good performance on streaming dataclustering.
暂无评论