The improvement of text categorization by statistical methods can be performed from two main directions, namely the feature selection and the evaluation of characteristic weights. In this paper, we propose an enhanced...
详细信息
The improvement of text categorization by statistical methods can be performed from two main directions, namely the feature selection and the evaluation of characteristic weights. In this paper, we propose an enhanced text categorization method based on a modified mutual information algorithm and evaluation algorithm of characteristic weights which improves both aspects. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.
A well-known drawback in the least squares support vector machine (LS-SVM) is that the sparseness is lost. In this study, an effective pruning algorithm is developed to deal with this problem. To avoid solving the pri...
详细信息
A well-known drawback in the least squares support vector machine (LS-SVM) is that the sparseness is lost. In this study, an effective pruning algorithm is developed to deal with this problem. To avoid solving the primal set of linear equations, the bottom to the top strategy is adopted in the proposed algorithm. During the training process of the algorithm, the chunking incremental and decremental learning procedures are used alternately. A small support vector set, which can cover most of the information in the training set, can be formed adaptively. Using the support vector set, one can construct the final classifier. In order to test the validation of the proposed algorithm, it has been applied to five benchmarking UCI datasets. In order to show the relationships among the chunking size, the number of support vector machine, the training time, and the testing accuracy, different chunking sizes are tested. The experimental results show that the proposed algorithm can adaptively obtain the sparse solutions without almost losing generalization performance when the chunking size is equal to 2, and also its training speed is much faster than that of the sequential minimal optimization (SMO) algorithm. The proposed algorithm can also be applied to the least squares support vector regression machine as well as LS-SVM classifier.
An ontology-based method named AOBM is proposed in this paper. It fully takes into account the factors that will afect the communication, and using ontology can be represented in agent's knowledge base. Pmvided on...
详细信息
In this paper, we describe a fast semi-automatic segmentation algorithm. A nodes aggregation method is proposed for improving the running time and a Graph-Cuts method is used to model the segmentation problem. The who...
详细信息
Sequential pattern mining is one of the most important fields in data mining. In this paper, we propose a novel algorithm FSPAN (Fast Sequential Pattern mining algorithm) to do the sequence mining. FSPAN can mine all ...
详细信息
Identification of Transcription Factor Binding Sites (TFBS) from the upstream region of genes remains a highly important and unsolved problem particularly in higher eukaryotic genomes. In this paper, we propose a nove...
详细信息
In this paper, we present an efficient method for detecting collisions between highly deformable objects, which is a combination of newly developed stochastic method and Particle Swarm Optimization (PSO) algorithm. Fi...
详细信息
In this paper, we investigate the deficiency of Goyal and Egenhofer's method for modeling cardinal directional relations between simple regions and provide the computational model based on the concept of mathemati...
详细信息
Structural similarity computation plays a crucial role in many applications such as in searching similar documents, in comparing chemical compounds, in finding genetic similarities, etc. We propose in this paper to us...
详细信息
ISBN:
(纸本)9781424401956
Structural similarity computation plays a crucial role in many applications such as in searching similar documents, in comparing chemical compounds, in finding genetic similarities, etc. We propose in this paper to use structural information content (SIC) for measuring structural information, considering both the nodes and edges of trees. We utilize a binary encoding approach for assigning the weights of different layer nodes and determining if some tree is a subtree of another tree. By defining a fast kernel and recursively computing SICs, we evaluate the structural information similarities of data trees to pattern trees. In the paper, we present the algorithm for calculating SICs with computation complexity of O(n), and use simple examples to instantiate the performance of the proposed method..
Identification of transcription factor binding sites from the upstream regions of genes is a highly important and unsolved problem. In this paper, we propose a novel framework for using evolutionary algorithm to solve...
详细信息
ISBN:
(纸本)0769525288
Identification of transcription factor binding sites from the upstream regions of genes is a highly important and unsolved problem. In this paper, we propose a novel framework for using evolutionary algorithm to solve this challenging issue. Under this framework, we use two prevalent evolutionary algorithms: Genetic, Algorithm (GA) and Particle Swarm Optimization (PSO) to find unknown sites in a collection of relatively long intergenic sequences that are suspected of being bound by the same factor. This paper represents binding sites motif to position weight matrix (PWM) and introduces how to code PWM to genome for GA and how to code it to particle for PSO. We apply these two algorithms to 5 different yeast Saccharomyces Cerevisiae transcription factor binding sites and CRP binding sites. The results on Saccharomyces Cerevisiae show that it can find the correct binding sites motifs, and the result on CRP shows that these two algorithms can achieve more accuracy than MEME and Gibbs Sampler.
暂无评论