Missing data is a common problem in data quality. Such data are generally ignored or simply substituted in classification problem, which will affect the performance of a classifier. In the paper an innovative framewor...
详细信息
Missing data is a common problem in data quality. Such data are generally ignored or simply substituted in classification problem, which will affect the performance of a classifier. In the paper an innovative framewor...
详细信息
Missing data is a common problem in data quality. Such data are generally ignored or simply substituted in classification problem, which will affect the performance of a classifier. In the paper an innovative framework RBP-AdaBoost for handling with missing features values in classification is presented. This framework is composed of two parts: predicting the missing values and classifying the data including predicted missing values. Back-propagation algorithm (BP) is adopted to predict missing value firstly, and Adaptive Boosting (AdaBoost) as a methodology of aggregation of many weak classifiers into one strong classifier is used in classifying predicted missing data. We carry out experiments with nine UCI datasets to evaluate the effect on classification error rate of four general methods and the prediction model of BP. Experimental results show that the classification rate of the proposed new framework RBP-AdaBoost is increased 6.4% to 23.69% comparing with other methods. The performance of missing data treatment model is considered to be effective.
The modeling and control of pH neutralization processes is a difficult problem in the field of process control.A multi-modeling method using an improved k-means clustering based on a new validity function is proposed ...
详细信息
The modeling and control of pH neutralization processes is a difficult problem in the field of process control.A multi-modeling method using an improved k-means clustering based on a new validity function is proposed in this *** are some common problems, including the number of clusters assumed as a priori knowledge and initial cluster centers selected randomly for classical k-means *** proposed algorithm is used to compute initial cluster centers and a new validity function is added to determine the appropriate number of clusters, then partial least squares (PLS) is used to construct the regression equation for each local *** results showed that multiple models using the proposed algorithm gave good performance, and the feasibility and validity of the proposed algorithm was verified.
暂无评论