At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel baggi...
详细信息
At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel bagging ensemble method called BSSW is developed to implement variable ranking and selection in linear regression models. Its main idea is to execute stepwise search algorithm on multiple bootstrap samples. In each trial, a mixed importance measure is assigned to each variable according to the order that it is selected into final model as well as the improvement of model fitting resulted from its inclusion. Based on the importance measure averaged across some bootstrapping trials, all candidate variables are ranked and then decided to be important or not. To extend the scope of application, BSSW is extended to the situation of generalized linear models. Experiments carried out with some simulated and real data indicate that BSSW achieves better performance in most studied cases when compared with several other existing methods.
With respect to variable selection for linear regression models, a novel bagging ensemble method is developed in this paper based on a ranked list of variables. Specifically, a mixed importance measure is assigned to ...
详细信息
ISBN:
(纸本)9783319202488;9783319202471
With respect to variable selection for linear regression models, a novel bagging ensemble method is developed in this paper based on a ranked list of variables. Specifically, a mixed importance measure is assigned to each variable according to the order that it is selected by stepwise search algorithm into the final model as well as the improvement resulted from its inclusion. Considering that small permutations in training data may lead to some changes in the order that the variables enter the final model, the above process is repeated for multiple times with each executed on a bootstrap sample. Finally, the importance measure of each variable is averaged across the bootstrapping trials. The experiments conducted with some simulated data demonstrate that the novel method compares favorably with some other variable selection techniques.
暂无评论