We present two novel techniques for the imputation of both categorical and numerical missing values. The techniques use decision trees and forests to identify horizontal segments of a data set where the records belong...
详细信息
We present two novel techniques for the imputation of both categorical and numerical missing values. The techniques use decision trees and forests to identify horizontal segments of a data set where the records belonging to a segment have higher similarity and attribute correlations. Using the similarity and correlations, missing values are then imputed. To achieve a higher quality of imputation some segments are merged together using a novel approach. We use nine publicly available data sets to experimentally compare our techniques with a few existing ones in terms of four commonly used evaluation criteria. The experimental results indicate a clear superiority of our techniques based on statistical analyses such as confidence interval. (C) 2013 Elsevier B.V. All rights reserved.
This study used data mining techniques to analyze the course preferences and course completion rates of enrollees in extension education courses at a university in Taiwan. First, extension courses were classified into...
详细信息
This study used data mining techniques to analyze the course preferences and course completion rates of enrollees in extension education courses at a university in Taiwan. First, extension courses were classified into five broad groups. Records of enrollees in extension courses from 2000-5 were then analyzed by three data mining algorithms: decision Tree, Link Analysis, and decisionforest. decision tree was used to find enrollee course preferences, Link Analysis found the correlation between course category and enrollee profession, and decisionforest found the probability of enrollees completing preferred courses. Results will be used as a reference for curriculum development in the extension program. (c) 2006 Elsevier Ltd. All rights reserved.
In this study, we present an incremental machine learning framework called Adaptive decisionforest (ADF), which produces a decisionforest to classify new records. Based on our two novel theorems, we introduce a new ...
详细信息
In this study, we present an incremental machine learning framework called Adaptive decisionforest (ADF), which produces a decisionforest to classify new records. Based on our two novel theorems, we introduce a new splitting strategy called iSAT, which allows ADF to classify new records even if they are associated with previously unseen classes. ADF is capable of identifying and handling concept drift;it, however, does not forget previously gained knowledge. Moreover, ADF is capable of handling big data if the data can be divided into batches. We evaluate ADF on nine publicly available natural datasets and one synthetic dataset, and compare the performance of ADF against the performance of eight state-of-the-art techniques. We also examine the effectiveness of ADF in some challenging situations. Our experimen-tal results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques. (c) 2021 Elsevier Ltd. All rights reserved.
暂无评论