This paper presents a new heuristic for the data clustering problem. It comprises two parts. The first part is a greedy algorithm, which selects the data points that can act as the centroids of well-separated clusters...
详细信息
This paper presents a new heuristic for the data clustering problem. It comprises two parts. The first part is a greedy algorithm, which selects the data points that can act as the centroids of well-separated clusters. The second part is a single-solution-based heuristic, which performs clustering with the objective of optimizing a cluster validity index. Single-solution-based heuristics are memory efficient as compared with population-based heuristics. The proposed heuristic is inspired from evolutionaryalgorithms (EAs) and consists of five main components: 1) genes;2) fitness of genes;3) selection;4) mutation operation;and 5) diversification. The attributes of the centroids of clusters are considered as genes. The fitness of a gene is a function of two factors: 1) difference between its value and the same attribute of the mean of the data points assigned to its cluster and 2) the frequency with which it has been mutated in previous iterations. The genes that have low fitness values should be updated through the mutation operation. The mutation operation performs small change (positive or negative) in the value of the gene. The mutants are accepted if they are better (with respect to objective function) than their parents. However, diversification in the search process is maintained by allowing, with a small probability, the mutants to replace their parents even they are not better than them. The objective functions used in the proposed heuristic are Calinski Harabasz index and Dunn index. The proposed algorithm has been experimented using real-life numeric data sets of UCI repository. The number of data points and number of attributes in the datasets lie between 150-11000 and 4-60, respectively. The results indicate that the proposed algorithm performs better than two standard EAs: 1) simulated annealing algorithm and 2) differential evolution algorithm and a genetic algorithm-based clustering method.
This article provides a short introduction to the evolutionary multiobjective optimization field. The first part of the article discusses the most representative multiobjective evolutionaryalgorithms that have been d...
详细信息
This article provides a short introduction to the evolutionary multiobjective optimization field. The first part of the article discusses the most representative multiobjective evolutionaryalgorithms that have been developed, from a historical perspective. In the second part of the article, some representative applications within materials science and engineering are reviewed. In the final part of the article, some potential areas for future research in this area are briefly described.
Clustering is considered a challenging problem of data mining due to its unsupervised nature. The literature is inundated with algorithms and concepts related to determining the most suitable clustering structure in d...
详细信息
Clustering is considered a challenging problem of data mining due to its unsupervised nature. The literature is inundated with algorithms and concepts related to determining the most suitable clustering structure in data. These techniques have a mathematical model of a cluster and attempt to obtain a result that shall represent this model as closely as possible. However as the problem of clustering is NP hard such strategies have disadvantages such as converging to local optima or suffering from the curse of dimensionality. In such scenario, meta heuristics could be more suitable strategies. Such techniques utilizes biologically inspired techniques such as swarm intelligence, evolution etc. to traverse the search space. Due to their inherent parallel nature, they are most robust towards converging to a local optima. The objective (cost) function used by such meta heuristics is responsible for guiding the agents of the swarm towards the best solution. Hence it should be designed to achieve trade-off between multiple objectives and constraints and at the same time produce relevant clustering. In this paper, a cost function is proposed (PSO-2) to produce compact well separated clusters by using the concept of intra-cluster and inter-cluster distances. Experiments have been performed on artificial benchmark data-sets where performance of the particle swarm optimizer using the proposed cost function is evaluated against other evolutionary and non evolutionaryalgorithms. The clustering structures produced by the methods have been evaluated using distance based and internal cluster validation metrics to demonstrate that the performance of PSO-2 is comparable to other techniques. (C) 2018 The Authors. Published by Elsevier B.V.
Clustering is considered a challenging problem of data mining due to its unsupervised nature. The literature is inundated with algorithms and concepts related to determining the most suitable clustering structure in d...
详细信息
Clustering is considered a challenging problem of data mining due to its unsupervised nature. The literature is inundated with algorithms and concepts related to determining the most suitable clustering structure in data. These techniques have a mathematical model of a cluster and attempt to obtain a result that shall represent this model as closely as possible. However as the problem of clustering is NP hard such strategies have disadvantages such as converging to local optima or suffering from the curse of dimensionality. In such scenario, meta heuristics could be more suitable strategies. Such techniques utilizes biologically inspired techniques such as swarm intelligence, evolution etc. to traverse the search space. Due to their inherent parallel nature, they are most robust towards converging to a local optima. The objective (cost) function used by such meta heuristics is responsible for guiding the agents of the swarm towards the best solution. Hence it should be designed to achieve trade-off between multiple objectives and constraints and at the same time produce relevant clustering. In this paper, a cost function is proposed (PSO-2) to produce compact well separated clusters by using the concept of intra-cluster and inter-cluster distances. Experiments have been performed on artificial benchmark data-sets where performance of the particle swarm optimizer using the proposed cost function is evaluated against other evolutionary and non evolutionaryalgorithms. The clustering structures produced by the methods have been evaluated using distance based and internal cluster validation metrics to demonstrate that the performance of PSO-2 is comparable to other techniques.
This article provides a short introduction to the evolutionary multiobjective optimization field. The first part of the article discusses the most representative multiobjective evolutionaryalgorithms that have been d...
详细信息
This article provides a short introduction to the evolutionary multiobjective optimization field. The first part of the article discusses the most representative multiobjective evolutionaryalgorithms that have been developed, from a historical perspective. In the second part of the article, some representative applications within materials science and engineering are reviewed. In the final part of the article, some potential areas for future research in this area are briefly described.
暂无评论