版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Fed Univ Para R Augusto Correa Inst Technol Belem Para Brazil Fed Univ Western Para Engn & Geosci Inst Belem Para Brazil
出 版 物:《PATTERN RECOGNITION LETTERS》 (模式识别快报)
年 卷 期:2015年第68卷第Part1期
页 面:126-131页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Missing data Data imputation Multi-objective evolutionary algorithm Genetic algorithm
摘 要:A large number of techniques for data analyses have been developed in recent years, however most of them do not deal satisfactorily with a ubiquitous problem in the area: the missing data. In order to mitigate the bias imposed by this problem, several treatment methods have been proposed, highlighting the data imputation methods, which can be viewed as an optimization problem where the goal is to reduce the bias caused by the absence of information. Although most imputation methods are restricted to one type of variable whether categorical or continuous. To fill these gaps, this paper presents the multi-objective genetic algorithm for data imputation called MOGAImp, based on the NSGA-II, which is suitable for mixed-attribute datasets and takes into account information from incomplete instances and the modeling task. A set of tests for evaluating the performance of the algorithm were applied using 30 datasets with induced missing values;five classifiers divided into three classes: rule induction learning, lazy learning and approximate models;and were compared with three techniques presented in the literature. The results obtained confirm the MOGAImp outperforms some well-established missing data treatment methods. Furthermore, the proposed method proved to be flexible since it is possible to adapt it to different application domains. (C) 2015 Elsevier B.V. All rights reserved.