In this paper, the design and implementation for computing the Single Linkage (SLINK) and Complete Linkage (CLINK) clustering algorithms on a Field Programmable Gate Array (FPGA) are presented. This research seeks to ...
详细信息
In this paper, the design and implementation for computing the Single Linkage (SLINK) and Complete Linkage (CLINK) clustering algorithms on a Field Programmable Gate Array (FPGA) are presented. This research seeks to extend the authors' previous work, in which novel systolic arrays for implementing the SLINK hierarchical clustering algorithm were presented. As is well known, hierarchical clustering algorithms find applications in many engineering areas, including pattern classification and image processing. However, the execution of these algorithms require considerable CPU time which makes them unsuitable for real-time applications. Our motivation for performing this research is, and has been since our earlier publications, to reduce the CPU time taken for executing the algorithms under discussion. In the present work, our focus is on the implementation of the proposed systolic arrays on a reconfigurable architecture, namely, a Xilinx FPGA. The reconfigurability aspect of the FPGA permits the implementation of both the SLINK and CLINK algorithms on the same FPGA.
clustering is one of the important techniques in data mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different c...
详细信息
ISBN:
(纸本)0769509967
clustering is one of the important techniques in data mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different clusters. The similarity between two objects is defined by a distance function, e.g., the Euclidean distance, which satisfies the triangular inequality. Distance calculation is computationally very expensive and many algorithms have been proposed so far to solve this problem. This paper considers the gradual clustering problem. From practice, we noticed that the user often begins clustering on a small number of attributes, e.g., two. If the result is partially satisfying the user will continue clustering on a higher number of attributes, e.g., ten. We refer to this problem as the gradual clustering problem. In fact gradual clustering can be considered as vertically incremental clustering. Approaches are proposed to solve this problem. The main idea is to reduce the number of distance calculations by using the triangle inequality. Our method first stores in an index the distances between a representative object and objects in n-dimensional space. Then these pre-computed distances are used to avoid distance calculations in (n+m)-dimensional space. Two experiments on real data sets demonstrate the added value of our approaches. The implemented algorithms are based on the DBSCAN algorithm with an associated M-Tree as index tree. However the principles of our idea can well be integrated with other tree structures such as MVP-Tree, R*-Tree, etc., and with other clustering algorithms.
作者:
Jorge M. SantosFrederico MoraisINEB
Biomedical Engineering Institute ISEP School of Engineering Polytechnic of Porto ISEP
School of Engineering Polytechnic of Porto
clustering algorithms are being widely used on biomedical data. They aim to extract important information that can be used to improve life conditions by helping specialized technicians on the decision process. Cluster...
详细信息
ISBN:
(纸本)9781479926060
clustering algorithms are being widely used on biomedical data. They aim to extract important information that can be used to improve life conditions by helping specialized technicians on the decision process. clustering algorithms based on information theory concepts claim that by using higher order statistic they are able to extract more information from the data and therefore provide much better results. In this work we try to verify this claim by comparing the performance of some entropic clustering algorithms against more conventional ones. Results of the performed experiments are not conclusive but they seem to indicate that this kind of entropic algorithms may provide some improvements when clustering biomedical data.
Results from any existing clustering algorithm that are used for segmentation are highly sensitive to features that limit their generalization. Shape is one important attribute of an object. The detection and separati...
详细信息
Results from any existing clustering algorithm that are used for segmentation are highly sensitive to features that limit their generalization. Shape is one important attribute of an object. The detection and separation of an object using fuzzy ring-shaped clustering (FKR) and elliptic ring-shaped clustering (FKE) already exists in the literature. Not all real objects however, are ring or elliptical in shape, so to address these issues, this paper introduces a new shape-based algorithm, called fuzzy image segmentation combining ring and elliptic shaped clustering algorithms (FCRE) by merging the initial segmented results produced by FKR and FKE. The distribution of unclassified pixels is performed by connectedness and fuzzy c-means (FCM) using a combination of pixel intensity and normalized pixel location. Both qualitative and quantitative analysis of the results for different varieties of images proves the superiority of the proposed FCRE algorithm compared with both FKR and FKE.
The authors present modifications to Kohonen autoassociative maps to increase their efficiency for clustering and decrease their sensitivity to initial conditions. A new update rule is described for the classification...
详细信息
The authors present modifications to Kohonen autoassociative maps to increase their efficiency for clustering and decrease their sensitivity to initial conditions. A new update rule is described for the classification for similarity. Some test results are presented for comparison between different algorithms. The new neural network algorithm was applied to the problem of preplacement of VLSI cells with improvement in the quality of the solution and computational time.< >
This paper presents the results of some partitional clustering algorithms applied to the segmentation of color images in the RGB space. As more information is involved in the algorithm, and the distance measure is mor...
详细信息
This paper presents the results of some partitional clustering algorithms applied to the segmentation of color images in the RGB space. As more information is involved in the algorithm, and the distance measure is more flexible, the better the results. The selected algorithms for this work are the K-means, the FCM, the GK-B, and the GKPFCM. The GKPFCM gives the better results when all the algorithms are applied to the segmentation of two images, an image of bananas and the other one of tomates at different stages of ripeness in both cases. The results are interesting as it is possible to identify the objects, to determine the degree of ripeness, and to estimate the amount and proportion of ripe objects for a possible decision-making.
In the last few years, a number of available screening compounds has been growing rapidly due to the recent developments of high-throughput screening in drug discovery. Chemical vendors provide millions of compounds f...
详细信息
In the last few years, a number of available screening compounds has been growing rapidly due to the recent developments of high-throughput screening in drug discovery. Chemical vendors provide millions of compounds for drug lead identification; however, these compounds are highly redundant. clustering method that groups similar compounds into families, can be used to analyze such redundancy. One of most used clustering method is cluster-based compound selection, which involves subdividing a set of compounds into clusters and choosing one compound or a small number of compounds from each cluster. However, little research has been done on overlapping method fuzzy c-means (FCM) and fuzzy c-varieties (FCV) clustering algorithms in compound selection research. Therefore, these two clustering algorithms are implemented and the performance is analyzed based on the effectiveness of the clustering results in terms of mean intercluster molecular dissimilarity (MIMDS) where these results are compared with one another. The analysis shows that in terms of MIMDS, the FCV is better than FCM because it clearly shown the uniform results compare to FCM clustering algorithm.
clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process us...
详细信息
clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure. This paper has captured the problems that are faced in real when clustering algorithms are implemented .It also considers the most extensively used tools which are readily available and support functions which ease the programming. Once algorithms have been implemented, they also need to be tested for its validity. There exist several validation indexes for testing the performance and accuracy which have also been discussed here.
Certain fuzzy clustering algorithms involve dimensionality reduction techniques, such as principal component analysis (PCA), probabilistic principal component analysis (PPCA), and t-factor analysis (t-FA). Other fuzzi...
详细信息
Certain fuzzy clustering algorithms involve dimensionality reduction techniques, such as principal component analysis (PCA), probabilistic principal component analysis (PPCA), and t-factor analysis (t-FA). Other fuzzification techniques have been applied to fuzzy clustering without dimensionality reduction. In this study, eleven fuzzy clustering algorithms are proposed based on five dimensionality reduction methods: PCA, PPCA, t-distribution-based PPCA, FA, and t-FA; and three fuzzification techniques: Bezdek-type, Kullback-Leibler divergence-regularization, and q-divergence-regularization. Based on numerical experiments using an artificial dataset, it is shown that some of the proposed methods outperforms the conventional methods on clustering accuracy.
Identification of aircraft from high range resolution (HRR) radar range profiles requires a database of information capturing the variability of the individual range profiles as a function of viewing aspect. This data...
详细信息
ISBN:
(纸本)0780344499
Identification of aircraft from high range resolution (HRR) radar range profiles requires a database of information capturing the variability of the individual range profiles as a function of viewing aspect. This database can be a collection of individual signatures or a collection of average signatures distributed over the region of viewing aspect of interest. An efficient database is one which captures the intrinsic variability of the HRR signatures without either excessive redundancy (over-characterization) typical of single-signature databases or without the loss of information (under-characterization) common when averaging arbitrary group of signatures. The identification of "natural" clustering of similar HRR signatures provides a means for creating efficient databases of either individual signatures or of signature templates. Using a k-means and the Kohonen self-organizing feature net, we identify the natural clustering of the HRR radar range profiles into groups of similar signatures based on the match quality metric (Euclidean distance) used within a Vector quantizer (VQ) classification algorithm. This greatly reduces the redundancy in such databases while retaining classification performance. Such clusters can be useful in template-based algorithms where groups of signatures are averaged to produce a template. Instead of basing the group of signatures to be averaged on arbitrary regions of viewing aspect, the averages are taken over the signatures contained in the natural clusters which have been Identified. The benefits of applying natural cluster identification to individual-signature HRR data preparation are decreased algorithm memory and computational requirements with a consequent decrease in the time required to perform identification calculations. When applied to template databases the benefits are improved identification performance. This paper describes the techniques used for identifying HRR signature clusters and describes the statistical proper
暂无评论