Graph embedding is an important dimension reduction method for high-dimensional data. In this paper, a neighborhood graph embedding algorithm is proposed and it is applied in data clustering. Different from the tradit...
详细信息
clustering is currently one of the most crucial techniques for dealing with massive amount of heterogeneous information on the web, which is beyond human being's capacity to digest. Recent studies have shown that ...
详细信息
Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of...
详细信息
ISBN:
(纸本)9781595938039
Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare several clustering setups, some of them novel and others already tried. The setups include feature extraction and selection methods and clustering algorithms. In order to do the comparison, we develop a clustering evaluation metric, specifically adapted for the relation identification task. Our experiments demonstrate significant superiority of the single-linkage hierarchical clustering with the novel threshold selection technique over the other tested clustering algorithms. Also, the experiments indicate that for successful relation identification it is important to use rich complex features of two kinds: features that test both relation slots together ("relation features"), and features that test only one slot each ("entity features"). We have found that using both kinds of features with the best of the algorithms produces very high-precision results, significantly improving over the previous work. Copyright 2007 ACM.
The problem of clustering data has been driven by a demand from various disciplines engaged in exploratory data analysis, such as medicine taxonomy, customer relationship management and so on. However, Most of the alg...
详细信息
Consensus clustering is a stability-based algorithm with a prediction power far better than other internal measures. Unfortunately, this method is reported to be slow in terms of time and hard to scalability. We prese...
详细信息
Due to resource constraints in Wireless Sensor Networks (WSNs), this paper contributes a distributed clustering algorithm suitable for a large scale Voronoi cell-based WSNs with sensors randomly deployed according to ...
详细信息
Spatial indexing is an important research in the field of spatial databases, and plays a key role in how to efficiently perform spatial data retrieval and query. In this paper, a new hierarchical clustering algorithm ...
详细信息
In this paper we introduce an efficient clustering algorithm embedded in a novel approach for solving the problem of faults identification in large telecommunication networks. Our algorithm is especially designed for ...
详细信息
Considering the complementarity between the classification and the clustering algorithms, we propose a new feature selection method based on fuzzy Interactive Self-Organizing Data Algorithm (ISODATA). A formula for co...
详细信息
Spectral clustering has recently become one of the most popular modern clustering algorithms for traditional data. However, the application of this clustering method on geostatistical data produces spatially scattered...
详细信息
暂无评论