MCS (Minimal Consistent Set) is one of the classical algorithms for minimal consistent subset selection problem. However, when noisy samples are present classification accuracy can suffer. In addition, noise affect th...
详细信息
MCS (Minimal Consistent Set) is one of the classical algorithms for minimal consistent subset selection problem. However, when noisy samples are present classification accuracy can suffer. In addition, noise affect the size of minimal consistent set. Therefore, removing noise is an important issue before sample selection. In this paper, an improvement approach based on MCS to select the representative samples is proposed. Compared with other algorithms which remove the noise by Wilson Editing in advance for the representative samples selection, this algorithm performs the processes of noise removing and samples selection simultaneously. According to this method, most noise can be deleted and the most representative samples can be identified and retained. The experiments show that the proposed method can greatly remove the redundant samples and noise as well as increase the accuracy of solutions when it is used for classification tasks.
Decision tree induction is one of the useful approaches for extracting classification knowledge from a set of feature-based instances. The most popular heuristic information used in the decision tree generation is the...
详细信息
Decision tree induction is one of the useful approaches for extracting classification knowledge from a set of feature-based instances. The most popular heuristic information used in the decision tree generation is the minimum entropy. This heuristic information has a serious disadvantage-the poor generalization capability [3]. Support Vector machine (SVM) is a classification technique of machinelearning based on statistical learning theory. It has good generalization. Considering the relationship between the classification margin of support vector machine(SVM) and the generalization capability, the large margin of SVM can be used as the heuristic information of decision tree, in order to improve its generalization *** paper proposes a decision tree induction algorithm based on large margin heuristic. Comparing with the binary decision tree using the minimum entropy as the heuristic information, the experiments show that the generalization capability has been improved by using the new heuristic.
Ontology mapping has been widely used in ontology application, but the similarity calculation becomes a thorny issue in the process of ontology mapping. In this paper, the different elements of ontology are considered...
详细信息
A new method to solve the convex hull problem in n-dimensional spaces is proposed in this paper. At each step, a new point is added into the convex hull if the point is judged to be out of the current convex hull by a...
详细信息
A new method to solve the convex hull problem in n-dimensional spaces is proposed in this paper. At each step, a new point is added into the convex hull if the point is judged to be out of the current convex hull by a linear programming model. For the linear separable classification problem, if an instance is regarded as a point of the instances space, the overlap does not still occur between the convex hulls of different classes after a feature is deleted, then we can delete that feature. Repeat this process, an algorithm for feature selection is given. Experimental results show the effectiveness of the algorithm.
The radial basis function network (RBFN) has been widely used in various fields such as function regression, pattern recognition, and error detection, etc. However, the structural parameters of RBFN including the numb...
详细信息
Markov chains, with Markov property as its essence, are widely used in the fields such as information theory, automatic control, communication techniques, genetics, computer sciences, economic administration, educatio...
详细信息
Short text classification problem as text classification a branch, in addition to the same with traditional text classification to a certain degree, still need to face some special problems to be solved, because of sh...
详细信息
Text Categorization (TC) is an important component in many information organization and information management tasks. In many TC applications, the case-base grows at a fast rate and this causes inefficiency in the cas...
详细信息
It has been shown that the fuzzy integral is an effective tool for the fusion of multiple classifiers. Of primary importance in the development of the system is the choice of the measure which embodies the importance ...
详细信息
It has been shown that the fuzzy integral is an effective tool for the fusion of multiple classifiers. Of primary importance in the development of the system is the choice of the measure which embodies the importance of subsets of classifiers. In this paper we propose a method for a dynamic fuzzy measure which will change following the pattern to be classified (data dependent). This method uses the neural network which has good study ability. Our experiment results show that this method make the classification accurate improve.
Fuzzy Integral is widely accepted and applied in multi-classifier fusion to express the importance of individual classifiers and the interaction among classifiers. In this fusion model, there are two keys to determine...
详细信息
暂无评论