clustering is one of the most important data analysis tasks. It is used to organize data points into groups or clusters. Each cluster has similar instances, which are dissimilar to instances belonging to other cluster...
详细信息
ISBN:
(纸本)9781728167206
clustering is one of the most important data analysis tasks. It is used to organize data points into groups or clusters. Each cluster has similar instances, which are dissimilar to instances belonging to other clusters. clustering is used in multiple disciplines and has an integral role in a wide variety of applications. This paper presents a comparative study of three common density-based clustering algorithms namely, DBSCAN, OPTICS and Mean-shift. The results are supported by an experimental evaluation using twelve datasets.
Most models concerned with real-world applications can be improved in structuring data and incorporating knowledge about the domain. In our problem of radio electrical wave dying down prediction for mobile communicati...
详细信息
Most models concerned with real-world applications can be improved in structuring data and incorporating knowledge about the domain. In our problem of radio electrical wave dying down prediction for mobile communication, a geographic database can be divided in contextual subsets, each representing an homogeneous domain where a predictive model performs better. More precisely, by clustering the input space, a predictive model (here a multilayer perceptron) can be trained on each subspace. Various unsupervised algorithms for clustering were evaluated (Kohonen's maps, Desieno's algorithm 1988, neural gas, growing neural gas, Buhmann's algorithm 1992) to obtain classes homogeneous enough to decrease the predictive error of the radio electrical wave prediction.
The implementation of zero-touch network and service management solutions in software defined optical networks requires the elaboration of detailed optical components’ information. However, such data can be elaborate...
详细信息
ISBN:
(数字)9798350363852
ISBN:
(纸本)9798350363869
The implementation of zero-touch network and service management solutions in software defined optical networks requires the elaboration of detailed optical components’ information. However, such data can be elaborated by third parties. Thus, confidentiality issues may arise because providers are not willing to unveil their detailed information. This study proposes schemes based on dataset scrambling and unsupervised machine learning algorithms for soft failures detection in optical networks. A key aspect of the proposed scheme is the preservation of data confidentiality, that refers, in this context, to safeguard the detailed information of optical components while still enabling effective failure detection. The performance of six different clustering algorithms have been experimentally evaluated in a laboratory testbed. The results reveal that certain algorithms, while working in a confidentiality preserving scheme, perform very well in clustering different states (i.e., working and faulty states) of the network.
This paper proposes two new incremental fuzzy c medoids clustering algorithms for very large datasets. These algorithms are tailored to work with continuous data streams, where all the data is not necessarily availabl...
详细信息
This paper proposes two new incremental fuzzy c medoids clustering algorithms for very large datasets. These algorithms are tailored to work with continuous data streams, where all the data is not necessarily available at once or can not fit in main memory. Some fuzzy algorithms already propose solutions to manage large datasets in a similar way but are generally limited to spatial datasets to avoid the complexity of medoids computation. Our methods keep the advantages of the fuzzy approaches and add the capability to handle large relational datasets by considering the continuous input stream of data as a set of data chunks that are processed sequentially. Two distinct models are proposed to aggregate the information discovered from each data chunk and produce the final partition of the dataset. Our new algorithms are compared to state-of-the-art fuzzy clustering algorithms on artificial and real datasets. Experiments show that our new approaches perform closely if not better than existing algorithms while adding the capability to handle relational data to better match the needs of real world applications.
clustering is a technique that involves grouping data into clusters based on the similarity of their characteristics. In images these can be visual features like color, texture, shape, intensity and so on. Unsupervise...
详细信息
ISBN:
(数字)9798350362879
ISBN:
(纸本)9798350362886
clustering is a technique that involves grouping data into clusters based on the similarity of their characteristics. In images these can be visual features like color, texture, shape, intensity and so on. Unsupervised learning takes place here as the data is clustered without any prior knowledge of their labels or categories. MRI (Magnetic Resonance Imaging) tests capture medical images using magnetic fields and radio waves. These images are used for diagnostic purposes to detect and analyze the internal structure of organs and tissues in the human body. clustering the planar section of these medical scans can help detect the various types of diseases or condition. This paper potrays a comparitive study by evaluating the efficiency of various clustering algorithms like K-Means, Agglomerative, DBSCAN (Density-Based Spatial clustering of Applications with Noise) and BIRCH (Balanced Iterative Reducing and clustering using Hierarchies). The aim is to find out the most appropriate algorithm for effectively identifying and clustering the different planar sections of Brain Tumour MRI scans.
Texture analysis has been efficiently utilized in the area of terrain classification. The widely used co-occurrence features have been reported most effective for this application. Since the number of co-occurrence fe...
详细信息
Texture analysis has been efficiently utilized in the area of terrain classification. The widely used co-occurrence features have been reported most effective for this application. Since the number of co-occurrence features is very high, a terrain classifier based on co-occurrence features should deal with high dimensionality problem. This paper deals with how to solve high dimensionality problems by employing a conventional linear discriminant classifier and clustering algorithms based on ANN (Artificial Neural Network). A implemented linear discriminant classifier is based on dimensionality reduction by using FST (Foley-Sammon transform), and its result is compared with ANN clustering algorithm FCM (Fuzzy C-mean). Experimental results show that the overall classification accuracy using clustering algorithm is good, especially for some particular classes.
Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two smaller clusters into a larger one or splitting a larger cluster into smaller ones. The crucial step is how to best select th...
详细信息
Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two smaller clusters into a larger one or splitting a larger cluster into smaller ones. The crucial step is how to best select the next cluster(s) to split or merge. We provide a comprehensive analysis of selection methods and propose several new methods. We perform extensive clustering experiments to test 8 selection methods, and find that the average similarity is the best method in divisive clustering and the minmax linkage is the best in agglomerative clustering. Cluster balance is a key factor to achieve good performance. We also introduce the concept of objective function saturation and clustering target distance to effectively assess the quality of clustering.
clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. clustering can be considered one of the most important uns...
详细信息
clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. clustering can be considered one of the most important unsupervised learning techniques so as every other problem of this kind; it deals with finding a structure in a collection of unlabelled data. clustering is of soft and hard clustering. Hard clustering refers to basic partitioning algorithms where object belongs to only one cluster. Soft clustering refers to data objects belonging to more than one cluster based on its membership values. This paper reviews three types of Soft clustering techniques: Fuzzy C-Mean, Rough C-Mean, and Rough Fuzzy C-Mean. Thereby calculating cluster validity indices for a synthetic dataset and a real dataset on applying these algorithms and ensuring best soft clustering algorithm through experimental analysis.
Given the increasing demand for blood smear analysis in the Hematology department of Oran (Algeria) Hospital and worldwide, in the literature there are some methods directed to the automation of this important problem...
详细信息
Given the increasing demand for blood smear analysis in the Hematology department of Oran (Algeria) Hospital and worldwide, in the literature there are some methods directed to the automation of this important problem. This paper presents a state-of-art about the used clustering methods.
Semi-supervised clustering with constraints has widely been studied, but there are few studies on constrained agglomerative hierarchical algorithms. We have shown modified kernel algorithms of agglomerative hierarchic...
详细信息
Semi-supervised clustering with constraints has widely been studied, but there are few studies on constrained agglomerative hierarchical algorithms. We have shown modified kernel algorithms of agglomerative hierarchical clustering, but there is a drawback that the modified kernels are not positive definite in general. In this paper we consider another idea of agglomerative hierarchical algorithms with pairwise constraints. That is, merging of clusters is with penalties. The centroid method and the Ward method with and without a kernel are considered. Typical numerical examples show effectiveness of the proposed algorithms in generating clusters with nonlinear cluster boundaries. We also compare the results with those by COP K-means, showing that the proposed algorithms outperform the COP K-means.
暂无评论