Autoclustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given ...
详细信息
ISBN:
(纸本)9783030587994;9783030587987
Autoclustering is a computational tool for the automatic generation of clustering algorithms, which combines and evaluates the main parts of density-based algorithms to generate more appropriate solutions for a given dataset for clustering tasks. Autoclustering uses the Estimation of Distribution algorithms (EDA) evolutionary technique to create the algorithms (individuals), and the adapted CLEST method (originally determines the best number of groups for a dataset) to compute individual fitness, using a decision-tree classifier. Thus, as the motivation to improve the quality of the results generated by Autoclustering, and to avoid possible bias by the adoption of a classifier, this work proposes to increase the efficiency of the evaluation process by the addition of a quality metric based on a fusion of three quality indexes of solution clusters. The three quality indexes are Silhouette, Dunn, and Davies-Bouldin, which assess the situation Intra and Inter clusters, with algorithms based on distance and independent of the generation of the groups. A final score for a specific solution (algorithm + parameters) is the average of normalized quality metric and normalized fitness. Besides, the results of the proposal presented solutions with higher cluster quality metrics, higher fitness average, and higher diversity of generated individuals (clustering algorithms) when compared with traditional Autocluestering.
Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. Recent studies have demonstrated the...
详细信息
ISBN:
(纸本)9781457713033
Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. Recent studies have demonstrated the effectiveness of hard c-means (k-means) clustering algorithm in this domain. Other studies reported similar findings pertaining to the fuzzy c-means algorithm. Interestingly, none of these studies directly compared the two types of c-means algorithms. In this study, we implement fast and exact variants of the hard and fuzzy c-means algorithms with several initialization schemes and then compare the resulting quantizers on a diverse set of images. The results demonstrate that fuzzy c-means is significantly slower than hard c-means, and that with respect to output quality the former algorithm is neither objectively nor subjectively superior to the latter.
Vector quantization (VQ) is useful technology for communication terminals with both small payloads and limited computational abilities. clustering algorithms are mainly used to design a code book (CB) for VQ. In most ...
详细信息
ISBN:
(纸本)9781538626979;9781538626962
Vector quantization (VQ) is useful technology for communication terminals with both small payloads and limited computational abilities. clustering algorithms are mainly used to design a code book (CB) for VQ. In most previous studies to estimate performance of clustering algorithms, learning images to design the CB for VQ were the same as the test images to examine their performances. However, it is necessary to encode/decode unseen images for practical usage of VQ. The performance of the clustering algorithms therefore has to be estimated under the condition that learning images are different from test images. This comparative study is indispensable to examine effectiveness of clustering algorithms for practical usage of VQ. We selected mainly used four clustering algorithms and estimated performance of these algorithms to design a CB for practical usage. A set of computational experiments showed that there was marginal difference in the performance of clustering algorithms.
Currently, a large number of clustering algorithms are available for data mining. But it will be difficult for people who to a large extent know little about data mining to select an appropriate clustering algorithm. ...
详细信息
ISBN:
(纸本)9781424455379
Currently, a large number of clustering algorithms are available for data mining. But it will be difficult for people who to a large extent know little about data mining to select an appropriate clustering algorithm. In order to solve this problem, in this paper, we first comprehensively analyze a number of clustering algorithms, then summarize their evaluation criteria and apply the so-called fuzzy comprehensive evaluation to smart comprehensive evaluation for clustering algorithm. Finally, we propose a smart choice of specific data mining algorithm to help the users who lacks the corresponding expertise.
Radio frequencies refer to the electromagnetic energy that we transmit the identification information from tags to its reader. Radio Frequency Identification (RFID) transmits the data without line of sight. RFID tags ...
详细信息
ISBN:
(纸本)9781450333771
Radio frequencies refer to the electromagnetic energy that we transmit the identification information from tags to its reader. Radio Frequency Identification (RFID) transmits the data without line of sight. RFID tags are small, wireless devices that help identify item automatically and indicating unique serial number for each item. However, counterfeiting in supply chain management likes cloned and fraud RFID tag bring the impact to the organization and social when attackers want to gain illegal benefits. Organizationsarelosing a lot of money and trust from userswhen counterfeiting occurred. Furthermore, RFID data nature characteristics faces the issues likes RFID just carry simple information, in-flood of data, inaccuracy data from RFID readers and difficulties to track spatial and place. We propose to use clustering algorithms in order to detect counterfeit in supply chain management. We will apply various clustering algorithms to analyzed and determine every attribute in the dataset structure pattern. Based on evaluation that have done, we found that Farthest First is the best algorithm for 1000 (small data) and 10000 (bigger data). However, the values of false negative in data still quite high and it is dangerous if RFID scanner misread the cloned or fraud tags become genuine tags. Hence, we applied cost algorithms to reduce false negative values.
Two clustering algorithms that handle data with tolerance are proposed. One is based on hard c-means while the other uses the learning vector quantization. The concept of the tolerance includes. First, the concept of ...
详细信息
ISBN:
(纸本)9781424412099
Two clustering algorithms that handle data with tolerance are proposed. One is based on hard c-means while the other uses the learning vector quantization. The concept of the tolerance includes. First, the concept of tolerance which implies errors, ranges and the loss of attribute of data is described. Optimization problems that take the tolerance into account are formulated. Since the Kuhn-Tucker condition give a unique and explicit optimal solution, an alternate minimization algorithm and a learning algorithm are constructed. Moreover, the effectiveness of the proposed algorithms is verified through numerical examples.
Modern biological research increasingly recognises the importance of genome-wide gene regulatory network inference;however, a range of statistical, technological and biological factors make it a difficult and intracta...
详细信息
ISBN:
(纸本)9781848829824
Modern biological research increasingly recognises the importance of genome-wide gene regulatory network inference;however, a range of statistical, technological and biological factors make it a difficult and intractable problem. One approach that some research has used is to cluster the data and then infer a structural model of the clusters. When using this kind of approach it is very important to choose the clustering algorithm carefully. In this paper we explicitly analyse the attributes that make a clustering algorithm appropriate, and we also consider how to measure the quality of the identified clusters. Our analysis leads us to develop three novel cluster quality measures that are based on regulatory overlap. Using these measures we evaluate two modern candidate algorithms: FLAME, and KMART. Although FLAME was specifically developed for clustering gene expression profile data, we find that KMART is probably a better algorithm to use if the goal is to infer a structural model of the clusters.
It is well known that the clusters produced by a clustering algorithm depend on the chosen initial centers. In this paper we present a measure for the degree to which a given clustering algorithm depends on the choice...
详细信息
ISBN:
(纸本)9783642030390
It is well known that the clusters produced by a clustering algorithm depend on the chosen initial centers. In this paper we present a measure for the degree to which a given clustering algorithm depends on the choice of initial centers, for a given data set. This measure is calculated for four well-known offline clustering algorithms (k-means Forgy, k-means Hartigan, k-means Lloyd and frizzy c-means), for five benchmark data sets. The measure is also calculated for ECM, an online algorithm that does not require the number of initial centers as input, but for which the resulting clusters can depend oil the order that the input arrives. Our main finding is that this initialization dependence measure call also he used to determine the optimal number of clusters.
The detection of pulmonary nodules in radiological or CT images has been widely investigated in the field of medical image analysis due to the high degree of difficulty it presents. The traditional approach is to deve...
详细信息
ISBN:
(纸本)9783642159091
The detection of pulmonary nodules in radiological or CT images has been widely investigated in the field of medical image analysis due to the high degree of difficulty it presents. The traditional approach is to develop a multistage CAD system that will reveal the presence or absence of nodules to the radiologist. One of the stages within this system is the detection of ROIs (regions of interest) that may possibly be nodules, in order to reduce the scope of the problem. In this article we evaluate clustering algorithms that use different classification strategies for this purpose. In order to evaluate these algorithms we used high resolution CT images from the LIDC (Lung Internet Database Consortium) database.
This paper provides a comparative study of several enhanced versions of the fuzzy c-means clustering algorithm in an application of histogram-based image color reduction. A common preprocessing is performed before clu...
详细信息
ISBN:
(纸本)9781479959969
This paper provides a comparative study of several enhanced versions of the fuzzy c-means clustering algorithm in an application of histogram-based image color reduction. A common preprocessing is performed before clustering, consisting of a preliminary color quantization, histogram extraction and selection of frequently occurring colors of the image. These selected colors will be clustered by tested c-means algorithms. clustering is followed by another common step, which creates the output image. Besides conventional hard (HCM) and fuzzy c-means (FCM) clustering, the so-called generalized improved partition FCM algorithm, and several versions of the suppressed FCM (s-FCM) in its conventional and generalized form, are included in this study. Accuracy is measured as the average color difference between pixels of the input and output image, while efficiency is mostly characterized by the total runtime of the performed color reduction. Numerical evaluation found all enhanced FCM algorithms more accurate, and four out of seven enhanced algorithms faster than FCM. All tested algorithms can create reduced color images of acceptable quality.
暂无评论