Objectives: The "large k (genes), small N (samples)" phenomenon complicates the problem of microarray classification with logistic regression. The indeterminacy of the maximum likelihood solutions, multicoll...
详细信息
Objectives: The "large k (genes), small N (samples)" phenomenon complicates the problem of microarray classification with logistic regression. The indeterminacy of the maximum likelihood solutions, multicollinearity of predictor variables and data over-fitting cause unstable parameter estimates. Moreover, computational problems arise due to the large number of predictor (genes) variables. Regularized logistic regression excels as a solution. However, the difficulties found here involve an objective function hard to be optimized from a mathematical viewpoint and a careful required tuning of the regularization parameters. Methods: Those difficulties are tackled by introducing a new way of regularizing the logistic regression. estimation of distribution algorithms (EDAs), a kind of evolutionary algorithms, emerge as natural regularizers. Obtaining the regularized estimates of the logistic classifier amounts to maximizing the likelihood function via our EDA, without having to be penalized. Likelihood penalties add a number of difficulties to the resulting optimization problems, which vanish in our case. Simulation of new estimates during the evolutionary process of EDAs is performed in such a way that guarantees their shrinkage while maintaining their probabilistic dependence relationships learnt. The EDA process is embedded in an adapted recursive feature elimination procedure, thereby providing the genes that are best markers for the classification. Results: The consistency with the literature and excellent classification performance achieved with our algorithm are illustrated on four microarray data sets: Breast, Colon, Leukemia and Prostate. Details on the last two data sets are available as supplementary material. Conclusions: We have introduced a novel EDA-based logistic regression regularizer. It implicitly shrinks the coefficients during EDA evolution process while optimizing the usual likelihood function. The approach is combined with a gene subset selection pr
This paper deals with using density ensembles methods to enhance continuous estimation of distribution algorithms. In particular, two density ensembles methods are applied: one is resampling method and the other is su...
详细信息
ISBN:
(纸本)9781424447053
This paper deals with using density ensembles methods to enhance continuous estimation of distribution algorithms. In particular, two density ensembles methods are applied: one is resampling method and the other is subspaces method. In resampling continuous estimation of distribution algorithms, a population of densities are obtained by resampling operator and density estimation operator, and new candidate solutions are reproduced by sampling from all obtained densities. In subspaces continuous estimation of distribution algorithms, a population of densities are obtained by randomly selecting a subset of all variables and estimating the density of high quality solutions in this subspace. The above steps iterate and many densities of high quality solutions in different subspaces are achieved. New candidate solutions are reproduced through perturbing old promising solutions in these subspaces.
In this paper, we discuss a curious relationship between Cooperative Coevolutionary algorithms (CCEAs) and univariate estimation of distribution algorithms (EDAs). Specifically, the distribution model for univariate E...
详细信息
ISBN:
(纸本)9781605584140
In this paper, we discuss a curious relationship between Cooperative Coevolutionary algorithms (CCEAs) and univariate estimation of distribution algorithms (EDAs). Specifically, the distribution model for univariate EDAs is equivalent to the infinite population EGT model common in the analysis of CCEAs. This relationship may permit cross-pollination between these two disparate fields. As an example, we derive a new EDA based on a known CCEA from the literature, and provide some preliminary experimental analysis of the algorithm.
In this paper, we identify a number of topics relevant for the improvement and development of discrete estimation of distribution algorithms. Focusing on the role of probability distributions and factorizations in est...
详细信息
The estimation of distribution algorithms (EDAs) is a novel class of evolutionary algorithms which is motivated by the idea of building probabilistic graphical model of promising solutions to represent linkage informa...
详细信息
ISBN:
(纸本)9780769536347
The estimation of distribution algorithms (EDAs) is a novel class of evolutionary algorithms which is motivated by the idea of building probabilistic graphical model of promising solutions to represent linkage information between variables in chromosome. Through learning of and sampling from probabilistic graphical model, new population is generated and optimization procedure is repeated until the stopping criteria are met. In this paper, the mechanism of the estimation of distribution algorithms is analyzed. Currently existing EDAs are surveyed and categorized according to the probabilistic model they used.
Finding a good model and efficiently estimating the distribution is still an open challenge in estimation of distribution algorithms (EDAs). Factorization encoded by models in most of the EDAs are constrained. However...
详细信息
ISBN:
(纸本)9781450326629
Finding a good model and efficiently estimating the distribution is still an open challenge in estimation of distribution algorithms (EDAs). Factorization encoded by models in most of the EDAs are constrained. However for optimization of many real-world problems, finding the model capable of representing complex interactions without much computational complexity overhead is the key challenge. On the other hand factor graph which is the most natural graphical model for representing additively decomposable functions is rarely employed in EDAs. In this paper we introduce Factor Graph based EDA (FGEDA) which learns factor graph as the model and estimate the probability distribution represented by the learned factor graph using Markov blanket canonical factorization. The class of factorization that is employed for approximation of distribution in FGEDA is expanded relative to famous EDAs. We have used matrix factorization for learning the factor graph of the problem based on the pairwise mutual information between pair of variables. Gibbs sampling and BB- wise crossover are used to generate new samples. Empirical evaluation as well as theoretical analysis of the approach show the efficiency and power of FGEDA in the optimization of functions with complex interactions. It is showed experimentally that FGEDA outperform other well- known EDAs. GECCO track: estimation of distribution Algorithm.
This work studies the problem of premature convergence due to the lack of diversity in estimation of distributions algorithms. This problem is quite important for these kind of algorithms since, even when using very c...
详细信息
ISBN:
(纸本)9781424429585
This work studies the problem of premature convergence due to the lack of diversity in estimation of distributions algorithms. This problem is quite important for these kind of algorithms since, even when using very complex probabilistic models, they can not solve certain optimization problems such as some deceptive, hierarchical or multimodal ones. There are several works in literature which propose different techniques to deal with premature convergence. In most cases, they arise as an adaptation of the techniques used with genetic algorithms, and use randomness to generate individuals. In our work, we study a new scheme which tries to preserve the population diversity. Instead of generating individuals randomly, it uses the information contained in the probability distribution learned from the population. In particular, a new probability distribution is obtained as a variation of the learned one so as to generate individuals with less probability to appear on the evolutionary process. This proposal has been validated experimentally with success with a set of different test functions.
Continuous estimation of distribution algorithms (EDAs) commonly use a Gaussian distribution to control the search process. For high-dimensional optimization problems, several practical issues arise when estimating a ...
详细信息
ISBN:
(纸本)9783319135632;9783319135625
Continuous estimation of distribution algorithms (EDAs) commonly use a Gaussian distribution to control the search process. For high-dimensional optimization problems, several practical issues arise when estimating a large covariance matrix from the selected population. Recent work in continuous EDAs has aimed to address these issues. The Screening estimation of distribution Algorithm (sEDA) is one such algorithm which, uniquely, utilizes the objective function values obtained during the search. A sensitivity analysis technique is then used to reduce the rank of the covariance matrix, according to the estimated sensitivity of the fitness function to individual variables in the search space. In this paper we analyze sEDA and find that it does not scale well to very high-dimensional problems because it uses a large number of additional fitness function evaluations per generation. A modified version of the algorithm, named sEDA-lite is proposed which requires no additional fitness evaluations for sensitivity analysis. Experiments on a variety of artificial and real-world representative problems evaluate the performance of the algorithm compared with sEDA and EDA-MCC, a related, recently proposed algorithm.
We study the update of the distribution in estimation of distribution algorithms, and show that a simple modification leads to unbiased estimates of the optimum. The simple modification (based on a proper reweighting ...
详细信息
ISBN:
(纸本)9781605583259
We study the update of the distribution in estimation of distribution algorithms, and show that a simple modification leads to unbiased estimates of the optimum. The simple modification (based on a proper reweighting of estimates) leads to a strongly improved behavior in front of premature convergence. Copyright 2009 ACM.
estimation of distribution algorithms have evolved as a technique for estimating population distribution in evolutionary algorithms. They estimate the distribution of the candidate solutions and then sample the next g...
详细信息
estimation of distribution algorithms have evolved as a technique for estimating population distribution in evolutionary algorithms. They estimate the distribution of the candidate solutions and then sample the next generation from the estimated distribution. Bayesian optimization algorithm is an estimation of distribution algorithm, which uses a Bayesian network to estimate the distribution of candidate solutions and then generates the next generation by sampling from the constructed network. The experimental results show that the Bayesian optimization algorithms are capable of identifying correct linkage between the variables of optimization problems. Since the problem of finding the optimal Bayesian network belongs to the class of NP-hard problems, typically Bayesian optimization algorithms use greedy algorithms to build the Bayesian network. This paper proposes a new real-coded Bayesian optimization algorithm for solving continuous optimization problems that uses a team of learning automata to build the Bayesian network. This team of learning automata tries to learn the optimal Bayesian network structure during the execution of the algorithm. The use of learning automaton leads to an algorithm with lower computation time for building the Bayesian network. The experimental results reported here show the preference of the proposed algorithm on both uni-modal and multi-modal optimization problems.
暂无评论