With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream m...
详细信息
ISBN:
(纸本)9781467391665
With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.
In this paper an improved K-medoids algorithm by a specific P system is proposed which extends the application of membrane computing. The traditional K-medoids clustering results vary accordingly to the initial center...
详细信息
ISBN:
(纸本)9783319155548;9783319155531
In this paper an improved K-medoids algorithm by a specific P system is proposed which extends the application of membrane computing. The traditional K-medoids clustering results vary accordingly to the initial centers which are selected randomly. In order to conquer the defect, we improve the algorithm by selecting the k initial centers based on the density parameter of data points. P system is adequate to solve clustering problem for its high parallelism and lower computational time complexity. A specific P system with the aim of realizing the improved K-medoids algorithm to form clusters is constructed. By computation of the designed system, it obtains one possible clustering result in a non-deterministic and maximal parallel way. Through example verification, it can improve the quality of clustering.
Using TF·IDF and Kleinberg algorithms, this paper constructs a knowledge map of Chinese ESG literature from 2018 to 2023, which is included in the core journals of China Academic Journal *** map visualizes the pu...
详细信息
Content-based Publish / Subscribe communication paradigm offers a new approach to disseminate messages in the network, where the message content determines the recipients. Many applications used on AANETs, which are a...
详细信息
ISBN:
(纸本)9781479989409
Content-based Publish / Subscribe communication paradigm offers a new approach to disseminate messages in the network, where the message content determines the recipients. Many applications used on AANETs, which are a subclass of VANETs, could be more efficient using this paradigm. Many Publish / Subscribe systems suitable for VANETs have been developed, however they are not efficient for some AANET applications. A promising approach is to build a Publish / Subscribe system over a cluster structure to reduce the control overhead and to offer a good scalability. However, the efficiency of this approach strongly depends on the performance of the clustering algorithm. The aim of this article is to propose a new clustering method, named CAPS, which will be the basis for a future content-based Publish / Subscribe system for AANETs. To validate our approach, a simulation model has been developed. Our algorithm has been compared to some other solutions in a modeled AANET context based on real air traffic traces. We show that CAPS gives better results than other solutions in terms of stability while maintaining at a low level the number of cluster groups.
Aiming at issues of load imbalance and low energy efficiency in the existing underwater sensor network clustering algorithm, a novel global optimal clustering algorithm with the low complexity and parallel processing ...
详细信息
ISBN:
(纸本)9781509027088
Aiming at issues of load imbalance and low energy efficiency in the existing underwater sensor network clustering algorithm, a novel global optimal clustering algorithm with the low complexity and parallel processing is proposed. The algorithm is based on the basic idea of particle swarm optimization algorithm (PSO). After the binary initial code of the sensor nodes, the particle code is adjusted by mutation to satisfy the ideal number of cluster heads. In the iterative process, new particles are generated by random recombination of the surviving nodes. In order to screen out the optimal particle, three optimization objectives are considered in the particle fitness function, which are cluster head energy, cluster head load and cluster range. Simulation results show that the proposed algorithm can effectively improve the load balance and prolong the network lifetime.
The STC algorithm clusters the documents based on shared phrases and it is a linear time algorithm. Directed against the insufficiency of the existing STC algorithm such as the quality of clustering results and the sc...
详细信息
ISBN:
(纸本)9781467368506
The STC algorithm clusters the documents based on shared phrases and it is a linear time algorithm. Directed against the insufficiency of the existing STC algorithm such as the quality of clustering results and the screening of the clustering labels, the paper improves STC algorithm, respectively perfecting the choice of the base cluster, the similarity calculation formula used to merge the base clusters and the scoring function for the clustering labels. Finally entropy is taken as the evaluation criterion for the clustering results. Compared with the original algorithm there are a better effect which is attested by experiments and more readability, descriptive and distinguishable clustering labels.
Outlier detection is an important research area in the field of machine learning and data science. The presence of outliers in a dataset limits its true usefulness in a real-life scenario. Due to the varied challenges...
详细信息
Outlier detection is an important research area in the field of machine learning and data science. The presence of outliers in a dataset limits its true usefulness in a real-life scenario. Due to the varied challenges, researchers strive to find a general method to be useful for different datasets. In this paper, we have proposed an outlier detection technique based on unsupervised learning using an ensemble of three clustering algorithms, namely K-means, K-means++ and Fuzzy C-means. We have proposed a unique way to deal with clustered outliers. Outcomes of the three aforementioned clustering algorithms are combined intelligently to accumulate all the complementary information. To combine the decisions of the hard and soft clustering algorithms, we have proposed a novel probability-based technique, which assigns a membership value to each data point in the case of a hard clustering algorithm. Three cluster validity indices are used as our evaluation metrics, which measure the goodness of a cluster. Significant improvement of cluster validity indices is observed after removing the outliers, which ensures the removal of outliers has resulted in stringent clusters. The method is evaluated on eight datasets, among which, three datasets are comparatively large. Source code of this work is available at: https://***/biswarup9/Outlier-Detection-Using-an-Ensemble-of-clustering-algorithms-.
Accurate and accelerated MRI tissue recognition is a crucial preprocessing for real-time 3d tissue modeling and medical diagnosis. This paper proposed an information de-correlated clustering algorithm implemented by v...
详细信息
ISBN:
(纸本)9781424492701
Accurate and accelerated MRI tissue recognition is a crucial preprocessing for real-time 3d tissue modeling and medical diagnosis. This paper proposed an information de-correlated clustering algorithm implemented by variational level set method for fast tissue segmentation. The key idea is to design a local correlation term between original image and piecewise constant into the variational framework. The minimized correlation will then lead to de-correlated piecewise regions. Firstly, by introducing a continuous bounded variational domain describing the image, a probabilistic image restoration model is assumed to modify the distortion. Secondly, regional mutual information is introduced to measure the correlation between piecewise regions and original images. As a de-correlated description of the image, piecewise constants are finally solved by numerical approximation and level set evolution. The converged piecewise constants automatically clusters image domain into discriminative regions. The segmentation results show that our algorithm performs well in terms of time consuming, accuracy, convergence and clustering capability.
One of the key tasks in mobility data analysis is the study of the individual mobility of users with reference to their personal locations, i.e. the places or areas where they stop to perform any kind of activities. C...
详细信息
ISBN:
(纸本)9781450339674
One of the key tasks in mobility data analysis is the study of the individual mobility of users with reference to their personal locations, i.e. the places or areas where they stop to perform any kind of activities. Correctly discovering such personal locations is therefore a very important problem, which is yet not very well addressed in literature. In this work we propose a robust, efficient, statistically well-founded and parameter-free personal location detection process. The algorithm, called TOSCA (TwO-Steps parameter free clustering algorithm), combines two clustering strategies and applies statistical tests to drive the selection of the needed parameters. The proposed solution is tested against a large set of competitors and several datasets, including synthetic and real ones. The empirical results show its ability to automatically adapt to different contexts yielding good accuracy and a good efficiency.
Social Network Service (SNS) has been explosively growing and generating huge amounts of data every day, it is a meaningful job to mine useful information from the big data which generated from the social networks. In...
详细信息
ISBN:
(纸本)9781467372114
Social Network Service (SNS) has been explosively growing and generating huge amounts of data every day, it is a meaningful job to mine useful information from the big data which generated from the social networks. In this paper, we study the relationship and behavior of social network users, and then put forward a model which combines clustering algorithm with Factorization Machine (FM) for SNS Friend Recommendation. With the help of clustering algorithm, we classified the users and make it easy to locate users' characteristics and interests, and by using FM we can solve the Data Sparseness problem effectively. We trained this model by Markov Chain Monte Carlo (MCMC) algorithm and verified our model using Tencent Webo's real dataset and proved it has a better computational efficiency and better accuracy in recommending friends.
暂无评论