The support vectors play an important role in the training to find the optimal hyper-plane. For the problem of many non-support vectors and a few support vectors in the classification of SVM, a method to reduce the sa...
详细信息
The support vectors play an important role in the training to find the optimal hyper-plane. For the problem of many non-support vectors and a few support vectors in the classification of SVM, a method to reduce the samples that may be not support vectors is proposed in this paper. First, adopt the Support Vector Domain Description to find the smallest sphere containing the most data points, and then remove the objects outside the sphere. Second, remove the edge points based on the distance of each pattern to the centers of other classes. In comparison with the standard SVM, the experimental results show that the new algorithm in the paper is capable of reducing the number of samples as well as the training time while maintaining high accuracy.
Email is a kind of semi-structured document, some important attributes are contained in its structure, and especially using spam-specific features could improve the email classification results. In this paper, we appl...
详细信息
Email is a kind of semi-structured document, some important attributes are contained in its structure, and especially using spam-specific features could improve the email classification results. In this paper, we apply decision tree data mining technique to dig out the potential association rules among these attributes of email, and then to identify unknown email's category based on these rules. According to the experiment of applying numerous Chinese emails to our email classifier, the efficiency of our method is not lower than that of other existing methods of checking whole email content text. Meanwhile our method can reduce the cost of computation and consumption of system resources.
In printed mathematical expression recognition, structural analysis is a key step which determines the overall recognition results. As a special structure in expressions, overbrace/underbrace structure becomes a major...
详细信息
In printed mathematical expression recognition, structural analysis is a key step which determines the overall recognition results. As a special structure in expressions, overbrace/underbrace structure becomes a major difficulty of structural analysis as they're difficult to be distinguished from other structures. On the basis of baseline analysis method of expression structures, this paper presents an improved approach of overbrace/underbrace structure combining region location with variable thresholds. This approach introduces overrun threshold of overbrace/underbrace symbol, which is combined with other features to establish the analysis rules and optimized by testing on a variety of overbrace/underbrace structures. The experimental results show that the approach is effective in determining and analyzing overbrace/underbrace structure.
Active learning is a hot topic in machinelearning field. The main task of active learning is to automatically select the representative instances for efficiently reducing the sample complexity. This paper presents a ...
详细信息
Active learning is a hot topic in machinelearning field. The main task of active learning is to automatically select the representative instances for efficiently reducing the sample complexity. This paper presents a brief survey of active learning regarding selection methods, query strategies, applications and other related works.
This paper presents a PSO-based method for learning similarity measure of nominal features for case based reasoning classifiers (i.e. CBR classifiers). The symbolic features considered here takes completely unordered ...
详细信息
This paper presents a PSO-based method for learning similarity measure of nominal features for case based reasoning classifiers (i.e. CBR classifiers). The symbolic features considered here takes completely unordered values. It has been indicated in that in specific classification task, the similarities between these nominal feature values can not be simply considered as either 0 or 1. A GA-based approach has been developed for learning similarity measure of such feature values. However, when the number of features and feature values become larger, the GA-based algorithm's convergence speed obviously slows down, and the accuracy of classification may be also affected. To address this problem, we propose a PSO-based algorithm for learning similarity measure of nominal features and further describe feature importance through the learned similarity measure. The experimental results show that, using the proposed PSO-based algorithm, the convergence speed is much faster than that of GA-based algorithm and the accuracy is also improved. In addition, we also explain that the feature importance defined through the learned similarities is essentially consistent with that in rough sets, and an illustrative example is finally provided.
Data extraction in Web is to obtain the desired information to users in Web pages. For a more accurately valuable data extraction, this paper proposes a new method called data extraction based on index path in Web (DE...
详细信息
ISBN:
(纸本)9781424463886;9780769539874
Data extraction in Web is to obtain the desired information to users in Web pages. For a more accurately valuable data extraction, this paper proposes a new method called data extraction based on index path in Web (DEIP) . This approach establishes the index path for each text node using XML DOM; defines the prefix of data-rich by keywords in the index path; generate extraction rule and obtain a wrapper according. The wrapper can extract data automatically in the same domain from a Website. It does relevant to the continuity, the structural similarity, and the location relations of the useful information in Web pages, but not the HTML tag, Experiments indicate that this method is efficient in the recall and the precision of data extraction.
The segmentation of touching symbols is one of the key factors which affect the performance of printed mathematical expression recognition system. An Improved method for segmentation of touching symbols in printed mat...
详细信息
The segmentation of touching symbols is one of the key factors which affect the performance of printed mathematical expression recognition system. An Improved method for segmentation of touching symbols in printed mathematical expressions is presented. This method is suitable for different types of touching symbols. Firstly, the outer contour of the symbol image is extracted based on contour tracing algorithm. Next, the concave corner points are detected by corner detection algorithm. These concave corner points are considered as the candidate segmentation points. Finally segmentation paths are constructed to achieve the segmentation of the touching symbol.
A mass of high-quality information included in Deep Web can be accessed, which is still growing rapidly with the rapid development of the World Wide Web. Therefore it becomes more and more important to find the Web da...
详细信息
ISBN:
(纸本)9781424463886;9780769539874
A mass of high-quality information included in Deep Web can be accessed, which is still growing rapidly with the rapid development of the World Wide Web. Therefore it becomes more and more important to find the Web databases which are most relevant to the user queries. In this paper we propose a selection method of Web database based on retrieval performance. This method can fix the topic based on website characteristics and then classify the websites. Finally it decides which Web database can be chosen based on the retrieval performance. This method can not only accurately select the Web databases which satisfy the user queries but also improve the speed of the database query and the quality of retrieval.
By incorporating domination principle in inconsistent decision systems based on dominance relations, we define the concept of distribution function for a decision system to directly reflect the inconsistent degree of ...
详细信息
By incorporating domination principle in inconsistent decision systems based on dominance relations, we define the concept of distribution function for a decision system to directly reflect the inconsistent degree of this system. A new type of distribution reduction and maximum distribution reduction are correspondingly introduced. The relationships between these two reductions are discussed, and their judgment theorems are given. In addition, this article also discusses the relations of several existing reductions, including compatible reduction, absolute reduction, and the proposed distribution reduction and maximum distribution reduction.
Support Vector machine (SVM) is a classification technique of machinelearning based on statistical learning theory. A quadratic optimization problem needs to be solved in the algorithm, and with the increase of the s...
详细信息
暂无评论