A trained regressionmodel can be used to create new synthetic training data by drawing from a distribution over independent variables and calling the model to produce a prediction for the dependent variable. We inves...
详细信息
ISBN:
(数字)9783031539695
ISBN:
(纸本)9783031539688;9783031539695
A trained regressionmodel can be used to create new synthetic training data by drawing from a distribution over independent variables and calling the model to produce a prediction for the dependent variable. We investigate how this idea can be used together with genetic programming (GP) to address two important issues in regressionmodelling, interpretability and limited data. In particular, we have two hypotheses. (1) Given a trained and non-interpretable regression model (e.g., a neural network (NN) or random forest (RF)), GP can be used to create an interpretablemodel while maintaining accuracy by training on synthetic data formed from the existing model's predictions. (2) In the context of limited data, an initial regressionmodel (e.g., NN, RF, or GP) can be trained and then used to create abundant synthetic data for training a second regressionmodel (again, NN, RF, or GP), and this second model can perform better than it would if trained on the original data alone. We carry out experiments on four well-known regression datasets comparing results between an initial model and a model trained on the initial model's outputs;we find some results which are positive for each hypothesis and some which are negative. We also investigate the effect of the limited data size on the final results.
暂无评论