作者:
Wu, XDMonash Univ
Dept Software Dev Melbourne Vic 3145 Australia
This article presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), based on the newly-developed extension matrix approach. By dividing the positive examples (PE) of a specific c...
详细信息
This article presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), based on the newly-developed extension matrix approach. By dividing the positive examples (PE) of a specific class in a given example set into intersecting groups and adopting a set of strategies to find a heuristic conjunctive formula in each group which covers all the group's positive examples and none of the negative examples (NE), the HCV induction algorithm adopted in the HCV (Version 2.0) software finds a description formula in the form of variable-valued logic for PE against NE in low-order polynomial time at induction time. In addition to the HCV induction algorithm, this article also outlines some of the techniques for noise handling and discretization of numerical domains developed and implemented in the HCV (Version 2.0) software, and provides a performance comparison of HCV (Version 2.0) with other data mining algorithms ID3, C4.5, C4.5rules, and NewID in noisy and continuous domains. The empirical comparison shows that the rules generated by HCV (Version 2.0) are more compact than the decision trees or rules produced by ID3-like algorithms, and HCV's predicative accuracy is competitive with ID3-like algorithms.
暂无评论