One of the fundamental challenges of clustering is how to evaluate, without auxiliary information, to what extent the obtained clusters fit the natural partitions of the data set. A common approach for evaluation of c...
详细信息
ISBN:
(纸本)9781424413799
One of the fundamental challenges of clustering is how to evaluate, without auxiliary information, to what extent the obtained clusters fit the natural partitions of the data set. A common approach for evaluation of clustering results is to use validity indices. We propose a new validity index, Conn-Index, for prototype based clustering. Conn-Index is applicable to data sets with a wide variety of cluster characteristics (different shapes, sizes, densities, overlaps). We construct Conn-Index based on inter- and intra-cluster connectivities of prototypes, which are found through a weighted Delaunay triangulation called "connectivity matrix" [1], where the weights indicate the data distribution. We compare the performance of Conn-Index to commonly used indices on synthetic and real data sets.
Computer-Aided Design (CAD) tools represent a major factor to enhance the quality of Field Programmable Gate Arrays (FPGAs) and use their architectural resources to their full potential. Since they can be developed to...
详细信息
ISBN:
(纸本)9781538677476
Computer-Aided Design (CAD) tools represent a major factor to enhance the quality of Field Programmable Gate Arrays (FPGAs) and use their architectural resources to their full potential. Since they can be developed to satisfy application constraints like surface, speed and energy while responding to Time-to-market requirements. In this paper, we explore the impact of T-VPack and First Choice (FC) clustering algorithms on the performance of Multilevel Switch Boxes (MS) FPGA with Long Wires (LWs). Indeed, the performance of an FPGA is highly sensitive to the mapping of Logic Blocks (LBs) on FPGA architecture. This work shows that FC ameliorates power consumption, area, critical path delay and energy compared to T-VPack.
We discuss one of the shortcomings of the standard K-means algorithm - its tendency to converge to a local rather than a global optimum. This is often accommodated by means of different random restarts of the algorith...
详细信息
We discuss one of the shortcomings of the standard K-means algorithm - its tendency to converge to a local rather than a global optimum. This is often accommodated by means of different random restarts of the algorithm, however in this paper, we attack the problem by amending the performance function of the algorithm in such a way as to incorporate global information into the performance function. We do this in three different manners and show on artificial data sets that the resulting algorithms are less initialisation-dependent than the standard K-means algorithm. We also show how to create a family of topology-preserving manifolds using these algorithms and an underlying constraint on the positioning of the prototypes.
In this paper, we propose some new forms obtain centers of the groups given interval membership, where that membership is the pertinence of each object to the prototypes of all clusters using intervals distance valued...
详细信息
ISBN:
(纸本)9781479945627
In this paper, we propose some new forms obtain centers of the groups given interval membership, where that membership is the pertinence of each object to the prototypes of all clusters using intervals distance valued (IMV). In this case, we will perform a comparative analysis using the three different approaches proposed in this paper, using seven interval-based datasets (four synthetic and three real datasets). As a result of this analysis, we will observe that the proposed approaches achieved better performance than all analyzed methods for interval-based methods.
This article provides a simple and general way for defining the recovery rate of clustering algorithms using a given family of old clusters for evaluating the performance of the algorithm when calculating a family of ...
详细信息
ISBN:
(纸本)9783540929567
This article provides a simple and general way for defining the recovery rate of clustering algorithms using a given family of old clusters for evaluating the performance of the algorithm when calculating a family of new clusters. Under the assumption of dealing with simulated data (i.e., known old clusters), the recovery rate is calculated using one proposed exact (but slow) algorithm, or one proposed approximate algorithm (with feasible run time).
Methods of data analysis and automatic processing are treated as knowledge discovery. In many cases it is necessary to classify, data in some way or find regularities in the data. That is why the notion of similarity ...
详细信息
ISBN:
(纸本)9789984440712
Methods of data analysis and automatic processing are treated as knowledge discovery. In many cases it is necessary to classify, data in some way or find regularities in the data. That is why the notion of similarity is becoming more and more important in the context of intelligent data processing systems. It is frequently required to ascertain how the data are interrelated, how various data differ or agree with each other, and what the measure of their comparison is. An important part in detection of similarity in clustering algorithms play the accuracy in the choice of metrics and the correctness of the clustering algorithms operation.
This paper presents a software radio based receiver architecture for identification of Linear, bi-dimensional modulation techniques via clustering algorithms. Identification of digital modulation schemes is of great i...
详细信息
ISBN:
(纸本)9781424458424
This paper presents a software radio based receiver architecture for identification of Linear, bi-dimensional modulation techniques via clustering algorithms. Identification of digital modulation schemes is of great importance in 3G and 4G cellular mobile systems, and it can be well presented as a pattern recognition problem with the use of vector space representation of digitally modulated signals. The proposed system starts with a preprocessing, stage that includes: Digital Down Conversion (DDC), symbol rate estimation, base-band filtering, synchronization and normalization. Then the identifier stage follows which uses clustering algorithms along with cluster validity measures if needed to identify the digital modulation scheme used. Three clustering algorithms were compared, K-means clustering algorithm with Dunn index as a validity measure, Fuzzy). C-means clustering algorithm with a minimum hard tendency validity measure, and Density Based clustering. Simulation results for the three approaches are presented under the presence of AWGN.
Authors propose a new approach to the development of clustering algorithms based on parametric optimization models with the combined use of search algorithms with variable randomized neighborhoods and greedy agglomera...
详细信息
Authors propose a new approach to the development of clustering algorithms based on parametric optimization models with the combined use of search algorithms with variable randomized neighborhoods and greedy agglomerative heuristic procedures.
We study the influence of different clustering algorithms on cluster evolution monitoring in data streams. The capturing and interpretation of cluster change delivers indicators on the evolution of the underlying popu...
详细信息
ISBN:
(纸本)9783540744672
We study the influence of different clustering algorithms on cluster evolution monitoring in data streams. The capturing and interpretation of cluster change delivers indicators on the evolution of the underlying population. For text stream monitoring, the clusters can be summarized into topics, so that cluster monitoring provides insights on the data and decline of thematic subjects over time. However, such insights should always be taken with a grain of salt: The quality of the clusters has a decisive impact on the observed changes. In the simplest case, cluster change across the stream may be due to the low quality of the original cluster than to a drift in the population belonging to this cluster. We show our framework ThemeFinder for topic evolution monitoring in streams and compare the influence to the quality of two very different cluster algorithms. After an evaluation of different cluster algorithms with external and internal quality measures, we use the center based bisecting k-means algorithm and the density-based DBScan algorithm. Our results show that the influence is relatively high and show that different clustering algorithms results allow to draw conclusion to the evaluation of the other cluster algorithm. Our experiments were done on a subarchive of the ACM library.
One of the most important parameters to be studied in Wireless Sensor Networks (WSNs) is its life time. There are two typical data mining processes that support to reduce the energy consumption of WSNs is clustering a...
详细信息
ISBN:
(纸本)9781467357586;9781467357593
One of the most important parameters to be studied in Wireless Sensor Networks (WSNs) is its life time. There are two typical data mining processes that support to reduce the energy consumption of WSNs is clustering and data summarization. Several energy aware, communication aware, coverage aware, data dissemination and data aggregation/sensor fusion protocols and algorithms have been specifically designed for WSN to reduce the power consumption. One of the primary goals of Node clustering in WSN is in-network preprocessing that aims to obtain qualified information and to limit the energy consumed. A clustering algorithm is composed of three parts first electing cluster head (CH), selection of cluster membership and transferal data from members to *** relays only one of the aggregated or compressed data packet to base station or sink In this paper a brief comparative study is made from different research proposals, which suggests different cluster head selection approaches for data aggregation. The algorithms under study are Data relay K-means clustering algorithm, Fuzzy C- means clustering algorithms and Voronoi based Genetic clustering algorithm. Significant factors for evaluating and comparing these algorithms are defined, analyzed and summarized. It has been assumed that the sensor nodes are randomly distributed and are not mobile, the coordinates of the base station (BS) and the dimensions of the sensor field are known.
暂无评论