In recent years, graph-based data clusteringalgorithms have become popular as they perform connectivity-based rather than centroid-based partitioning. Methods related to minimum spanning tree (MST)-based data cluster...
详细信息
In recent years, graph-based data clusteringalgorithms have become popular as they perform connectivity-based rather than centroid-based partitioning. Methods related to minimum spanning tree (MST)-based data clustering are types of graph-basedalgorithms that can recognize arbitrary shapes of clusters by eliminating inconsistent edges from MST graphs. In all MST-based data clusteringalgorithms, definition of inconsistent edges is the main problem that needs to be addressed. The longest edges in MST graphs are considered as inconsistent edges under ideal conditions. Nevertheless, outliers often exist in real-world tasks, which makes the longest edges inaccurate cluster separation indicators. In this paper, we propose a new data clusteringalgorithm using MST and a critical distance method. The proposed algorithm solves the main issue of MST-based data clustering, namely identifying and removing inconsistent edges to obtain clusters even in the event that the dataset contains some outliers. It begins by constructing the MST over a given weighted graphbased on Euclidean distance and then splits up the graph into clusters by eliminating inconsistent edges using critical distance as a threshold. Integration of the advantages of both MST and critical distance methodology to obtain optimal clusters is the main contribution of this work. The conducted experimental analysis and results using different datasets prove that our proposed clusteringalgorithm yields better overall performance compared with the most common data clusteringalgorithms. Taking the Liver and Tumor datasets as an example, the proposed algorithm outperforms all other clusteringalgorithms with clustering accuracy equal to 0.579 and 0.660, respectively.
clusteringalgorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and su...
详细信息
ISBN:
(纸本)9781614993308;9781614993292
clusteringalgorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clusteringalgorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clusteringalgorithms is studied on time series expression data using Dynamic Time Warping distance in order to measure similarity between gene expression profiles. Four different cluster validation measures are used to evaluate the clusteringalgorithms: Connectivity and Silhouette Index for estimating the quality of clusters, Jaccard Index for evaluating the stability of a cluster method and Rand Index for assessing the accuracy. The obtained results are analyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clusteringalgorithm under the Silhouette and Rand validation indices.
Comprehensive situational awareness is paramount to the effectiveness of proprietary navigational and higher-level functions of the intelligent vehicles. In this paper, we address a hierarchical road understanding sys...
详细信息
ISBN:
(纸本)9781457721977
Comprehensive situational awareness is paramount to the effectiveness of proprietary navigational and higher-level functions of the intelligent vehicles. In this paper, we address a hierarchical road understanding system for intelligent vehicles with respect to the road topography and the existence of objects based on sensor fusion. The proposed system consists of three modules that run in parallel. Module one classifies the road environment into four categories, i.e. the reachable region, the drivable region, the obstacle region and the unknown region. In module two, an efficient graph-based clustering algorithm is performed in the obstacle region to generate a list of object hypotheses, and their characteristics are used for the coarse identification. In module three, for the object hypotheses in front of the vehicle, particular objects of interest, including vehicles, pedestrians, motorcycles and bicycles, are identified using a multi-class object detector with deformable part-based models, and tracked using particle filters. In the experiments, the data of various typical but challenging road scenarios were acquired by a Velodyne sensor and a monocular camera, and the results have demonstrated the effectiveness of the proposed system.
暂无评论