The purpose of data clustering algorithm is to form clusters (groups) of data points such that there is high intra-cluster and low inter-cluster similarity. There are different types of clustering methods such as hier...
详细信息
ISBN:
(纸本)9781479985630
The purpose of data clustering algorithm is to form clusters (groups) of data points such that there is high intra-cluster and low inter-cluster similarity. There are different types of clustering methods such as hierarchical, partitioning, grid and density based. Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. A hierarchical clustering method can be thought of as a set of ordinary (flat) clustering methods organized in a tree structure. These methods construct the clusters by recursively partitioning the objects in either a top-down or bottom-up fashion. In this paper we present a new hierarchical clustering algorithm using Euclidean distance. To validate this method we have performed some experiments with low dimensional artificial datasets and high dimensional fMRI dataset. Finally the result of our method is compared to some of existing clustering methods.
To improve clustering effects of fast global K-means and reduce time complexity, a highly efficient fast global K-means algorithm is proposed in this paper. Which, maximal point of density in data sets is chosen as th...
详细信息
To improve clustering effects of fast global K-means and reduce time complexity, a highly efficient fast global K-means algorithm is proposed in this paper. Which, maximal point of density in data sets is chosen as the first initial clustering center, then finding the next initial clustering center, firstly we can exclude a certain number of clusters around the given clustering center,and narrow selection range of the next initial clustering center, further utilize the related theorem of triangle inequality,reduce computational amount,choose the sample which have great contribution in reducing error sum of squares and are apart from the given clustering center as the next initial clustering center, then the modified fast global K-means algorithm reassigns sample to cluster, which will collect sample to the weighted distance of cluster center,and partition the sample to cluster when weighted distance is minimum. The modified algorithm can select more reasonable initial cluster center, obtain more objective,true clustering results and shorten the clustering time. The experiment results shows the algorithm is valid.
The clustering algorithm based on the minimum spanning tree (MST) can identify clusters of different shapes. However, the minimum spanning tree clustering algorithm has a high computational cost due to multiple rounds...
详细信息
The clustering algorithm based on the minimum spanning tree (MST) can identify clusters of different shapes. However, the minimum spanning tree clustering algorithm has a high computational cost due to multiple rounds of traversing required during the construction and edge-breaking clustering phases. To address this issue, a density peak fusion idea is proposed, which utilizes kd-trees to obtain neighborhoods and designs leader nodes to propose an algorithm that reduces the time cost of the minimum spanning tree. The main idea of the algorithm is that nodes with higher density than surrounding nodes are used to replace the nodes participating in the formation of the minimum spanning tree. By using the selection technique of density peaks, the number of points required for initial construction and subsequent traversal of the minimum spanning tree is reduced, thereby reducing the complexity of the minimum spanning tree and achieving faster clustering speed. The algorithm has been tested on various synthetic and real datasets. It has achieved significant improvements in speed and clustering performance. Compared with the most advanced mst-based clustering algorithm, the algorithm has a speed increase of at least 30% and a performance increase of about 5%.
Time series anomaly detection is an important field of data science. Statistical, distance-based, clustering-based, or density-based approaches can detect anomalies. Generally, distance-based methods are relatively st...
详细信息
Time series anomaly detection is an important field of data science. Statistical, distance-based, clustering-based, or density-based approaches can detect anomalies. Generally, distance-based methods are relatively straightforward, but the method's effectiveness depends on how well they handle the distribution of data points. To address the challenge, a preprocessing step is used to convert the underlying time series into a more useful format. In this paper, a novel clustering-based representation of time series is proposed. This representation is then used to compute anomaly scores and detect anomalies. Experimental studies on synthetic and real datasets show that proposed method outperforms other methods by up to 75% for five standard performance metrics.
Objective: Understanding country-level nutrition intake is crucial to global nutritional policies that aim to reduce disparities and relevant disease burdens. Still, there are limited numbers of studies using clusteri...
详细信息
Objective: Understanding country-level nutrition intake is crucial to global nutritional policies that aim to reduce disparities and relevant disease burdens. Still, there are limited numbers of studies using clustering techniques to analyse the recent Global Dietary Database (GDD). This study aims to extend an existing multivariate time series (MTS) clustering algorithm to allow for greater customisability and provide the first cluster analysis of the GDD to explore temporal trends in country-level nutrition profiles (1990-2018).Design: Trends in sugar-sweetened beverage intake and nutritional deficiency were explored using the newly developed programme 'MTSclust'. Time series clustering algorithms are different from simple clustering approaches in their ability to appreciate temporal ***: Nutritional and demographical data from 176 countries were analysed from the ***: Population representative samples of the 176 in the ***: In a three-class test specific to the domain, the MTSclust programme achieved a mean accuracy of 715 % (adjusted Rand Index [ARI] = 0381) while the mean accuracy of a popular algorithm, DTWclust, was 58 % (ARI = 0224). The clustering of nutritional deficiency and sugar-sweetened beverage intake identified several common trends among countries and found that these did not change by demographics. MTS clustering demonstrated a global convergence towards a Western ***: While global nutrition trends are associated with geography, demographic variables such as sex and age are less influential to the trends of certain nutrition intake. The literature could be further supplemented by applying outcome-guided methods to explore how these trends link to disease burdens.
With the continuous and rapid development of online questionnaire survey,the low response rate has plagued operating *** solve this problem,this paper proposed an effective user invitation model by our improved cluste...
详细信息
ISBN:
(纸本)9781467377249
With the continuous and rapid development of online questionnaire survey,the low response rate has plagued operating *** solve this problem,this paper proposed an effective user invitation model by our improved clustering algorithm,which analyzed large-scale historical user behavior characteristic data,including users' quality data,users' preferential data and users' similarity *** experiments with large-scale data from an online survey company have been conducted to validate the feasibility and effectiveness of our proposed *** results demonstrate that the questionnaire response rate is increased and our approach can be easily deployed in real-world online survey application for effective personalized survey recommendation.
The goals of wireless sensor networks (WSNs) are to sense and collect data and to transmit the information to a sink. Because the sensor nodes are typically battery powered, the main challenges in WSNs are to optimise...
详细信息
The goals of wireless sensor networks (WSNs) are to sense and collect data and to transmit the information to a sink. Because the sensor nodes are typically battery powered, the main challenges in WSNs are to optimise the energy consumption and to prolong the network lifetime. This paper proposes a centralised clustering algorithm termed the minimum distance clustering algorithm that is based on an improved differential evolution (MD-IDE). The new algorithm combines the advantages of simulated annealing and differential evolution to determine the cluster heads (CHs) for minimising the communication distance of the WSN. Many simulation results demonstrate that the performance of MD-IDE outperforms other well-known protocols, including the low-energy adaptive clustering hierarchy (LEACH) and LEACH-C algorithms, in the aspects of reducing the communication distance of the WSN for reducing energy consumption.
With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream m...
详细信息
ISBN:
(纸本)9781467391672
With the emergence of big data and cloud computing, data stream arrives rapidly, large-scale and continuously, real-time data stream clustering analysis has become a hot topic in the study on the current data stream mining. Some existing data stream clustering algorithms cannot effectively deal with the high-dimensional data stream and are incompetent to find clusters of arbitrary shape in real-time, as well as the noise points could not be removed timely. To address these issues, this paper proposes PGDC-Stream, a algorithm based on grid and density for clustering data streams in a parallel distributed environment [4]. The algorithm adopts density threshold function to deal with the noise points and inspect and remove them periodically. It also can find clusters of arbitrary shape in large-scale data flow in real-time. The Map-Reduce framework is used for parallel cluster analysis of data streams.
As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so ...
详细信息
As a mainstream research direction in the field of image segmentation,medical image segmentation plays a key role in the quantification of lesions,three-dimensional reconstruction,region of interest extraction and so *** with natural images,medical images have a variety of ***,the emphasis of information which is conveyed by images of different modes is quite *** it is time-consuming and inefficient to manually segment medical images only by professional and experienced ***,large quantities of automated medical image segmentation methods have been ***,until now,researchers have not developed a universal method for all types of medical image *** paper reviews the literature on segmentation techniques that have produced major breakthroughs in recent *** the large quantities of medical image segmentation methods,this paper mainly discusses two categories of medical image segmentation *** is the improved strategies based on traditional clustering *** other is the research progress of the improved image segmentation network structure model based on *** power of technology proves that the performance of the deep learning-based method is significantly better than that of the traditional *** paper discussed both advantages and disadvantages of different algorithms and detailed how these methods can be used for the segmentation of lesions or other organs and tissues,as well as possible technical trends for future work.
Deep embedded clustering (DEC) is a representative clustering algorithm that leverages deep-learning frameworks. DEC jointly learns low-dimensional feature representations and optimizes the clustering goals but only w...
详细信息
Deep embedded clustering (DEC) is a representative clustering algorithm that leverages deep-learning frameworks. DEC jointly learns low-dimensional feature representations and optimizes the clustering goals but only works with numerical data. However, in practice, the real-world data to be clustered includes not only numerical features but also categorical features that DEC cannot handle. In addition, if the difference between the soft assignment and target values is large, DEC applications may suffer from convergence problems. In this study, to overcome these limitations, we propose a deep embedded clustering framework that can utilize mixed data to increase the convergence stability using soft-target updates;a concept that is borrowed from an improved deep Q learning algorithm used in reinforcement learning. To evaluate the performance of the framework, we utilized various benchmark datasets composed of mixed data and empirically demonstrated that our approach outperformed existing clustering algorithms in most standard metrics. To the best of our knowledge, we state that our work achieved state-of-the-art performance among its contemporaries in this field.
暂无评论