In this paper we perform a comparison among FSS-EBNA, a randomized, population-based and evolutionary algorithm, and two genetic and other two sequential search approaches in the well-known feature subset selection (F...
详细信息
In this paper we perform a comparison among FSS-EBNA, a randomized, population-based and evolutionary algorithm, and two genetic and other two sequential search approaches in the well-known feature subset selection (FSS) problem. In FSS-EBNA, the FSS problem, stated as a search problem, uses the estimation of bayesian network algorithm (EBNA) search engine, an algorithm within the estimation of distribution algorithm (EDA) approach. The EDA paradigm is born from the roots of the genetic algorithm (GA) community in order to explicitly discover the relationships among the features of the problem and not disrupt them by genetic recombination operators. The EDA paradigm avoids the use of recombination operators and it guarantees the evolution of the population of solutions and the discovery of these relationships by the factorization of the probability distribution of best individuals in each generation of the search. In EBNA, this factorization is carried out by a bayesiannetwork induced by a cheap local search mechanism. FSS-EBNA can be seen as a hybrid Soft Computing system, a synergistic combination of probabilistic and evolutionary computing to solve the FSS task. Promising results on a set of real Data Mining domains are achieved by FSS-EBNA in the comparison respect to well-known genetic and sequential search algorithms. (C) 2001 Elsevier Science Inc. All rights reserved.
A new method for Feature Subset Selection in machine learning, FSS-EBNA (Feature Subset Selection by estimation of bayesian network algorithm), is presented. FSS-EBNA is an evolutionary, population-based, randomized s...
详细信息
A new method for Feature Subset Selection in machine learning, FSS-EBNA (Feature Subset Selection by estimation of bayesian network algorithm), is presented. FSS-EBNA is an evolutionary, population-based, randomized search algorithm, and it can be executed when domain knowledge is not available. A wrapper approach, over Naive-Bayes and ID3 learning algorithms, is used to evaluate the goodness of each visited solution. FSS-EBNA, based on the EDA (estimation of Distribution algorithm) paradigm, avoids the use of crossover and mutation operators to evolve the populations, in contrast to Genetic algorithms. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search. This factorization is carried out by means of bayesiannetworks. Promising results are achieved in a variety of tasks where domain knowledge is not available. The paper explains the main ideas of Feature Subset Selection, estimation of Distribution algorithm and bayesiannetworks, presenting related work about each concept. A study about the 'overfitting' problem in the Feature Subset Selection process is carried out, obtaining a basis to define the stopping criteria of the new algorithm. (C) 2000 Elsevier Science B.V. All rights reserved.
暂无评论