In this paper we consider multiclass learning tasks based on Support Vector machines (SVMs). In this regard, currently used methods are One-Against-All or One-Against-One, but there is much need for improvements in th...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
In this paper we consider multiclass learning tasks based on Support Vector machines (SVMs). In this regard, currently used methods are One-Against-All or One-Against-One, but there is much need for improvements in the field of multiclass learning. We developed a novel combination algorithm called Comb-ECOC, which is based on posterior class probabilities. It assigns, according to the Bayesian rule, the respective instance to the class withthe highest posterior probability. A problem withthe usage of a multiclass method is the proper choice of parameters. Many users only take the default parameters of the respective learning algorithms (e.g. the regularization parameter C and the kernel parameter gamma). We tested different parameter optimization methods on different learning algorithms and confirmed the better performance of One-Against-One versus One-Against-All, which can be explained by the maximum margin approach of SVMs.
Microarray, techniques give biologists first peek into the molecular states of living tissues. Previous studies have proven that it is feasible to build sample classifiers using the gene expressional profiles. To buil...
详细信息
ISBN:
(纸本)9780769530697
Microarray, techniques give biologists first peek into the molecular states of living tissues. Previous studies have proven that it is feasible to build sample classifiers using the gene expressional profiles. To build an effective sample classifier dimension reduction process is necessary since classic patternrecognition algorithms do not work well in high dimensional space. In this paper we present a novel feature extraction algorithm based on the concept of virtual genes by integrating microarray expression data sets with domain knowledge embedded in Gene Ontology (GO) annotations. We define semantic similarity to measure the functional associations between two genes using the annotation on each GO term. We then identify the groups of genes, called virtual genes, that potentially interact with each other for a biological function. the correlation in gene expression levels of virtual genes can be used to build a sample classifier For a colon cancer data set, the integration of microarray expression data with GO annotations significantly improves the accuracy of sample classification by more than 10%.
Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. this has lead to test methods b...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. this has lead to test methods based on semiartificially produced test data, often produced by ad hoc techniques complemented by statistical models such as Hidden Markov Models (HMM). the quality of such a test method depends on how well the test data reflect the regularities in known data and how well they generalize these regularities. So far only very simplified and generalized, artificial data sets have been tested, and a more thorough statistical foundation is required. We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated withthe generation of new, artificial data. More specifically, we suggest to use the PRISM system developed by Sato and Kameya. Based on logic programming extended with random variables and parameter learning, PRISM appears as a powerful modelling environment, which subsumes HMMs and a wide range of other methods, all embedded in a declarative language. We illustrated these principles here, showing parts of a model under development for genetic sequences and indicate first initial experiments producing test data for evaluation of existing gene finders, exemplified by GENSCAN, HMMGene and ***.
this paper addresses relation information extraction problem and proposes a method of discovering relations among entities which is buried in different nest structures of XML documents. the method first identifies and...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
this paper addresses relation information extraction problem and proposes a method of discovering relations among entities which is buried in different nest structures of XML documents. the method first identifies and collects XML fragments that contain all types of entities given by users, then computes similarity between fragments based on semantics of their tags and their structures, and clusters fragments by similarity so that the fragments containing the same relation are clustered together, finally extracts relation instances and patterns of their occurrences from each cluster. the results of experiments show that the method can identify and extract relation information among given types of entities correctly from all kinds of XML documents with meaningful tags.
Clustering technique is a key tool in datamining and patternrecognition. Usually, objects for some traditional clustering algorithms are expressed in the form of vectors, which consist of some components to be descr...
详细信息
ISBN:
(纸本)9781424409723
Clustering technique is a key tool in datamining and patternrecognition. Usually, objects for some traditional clustering algorithms are expressed in the form of vectors, which consist of some components to be described as features. However, objects in real tasks may be some models which are clustered other than data points, for example! neural networks, decision trees, support vector machines, etc. this paper studies the clustering algorithm based on model data. By defining the extended measure, clustering methods are studied for the abstract data objects. Framework of clustering algorithm for models is presented. To validate the effectiveness of models clustering algorithm, we choose the hierarchical model clustering algorithm in the experiments. Models in clustering algorithm are BP(Back Propagation) neural networks and learning method is BP algorithm. Measures are chosen as both same-fault measure and double-fault measure for pairwise of models. Distances between clusters are the single link and the complete link, respectively. By this way, we may obtain part of neural network models which are from each cluster and improve diversity of neural network models. then, part of models is ensembled. Moreover, we also study the relations between the number of clusters in clustering analysis, the size of ensemble learning, and performance of ensemble learning by experiments. Experimental results show that performance of ensemble learning by choosing part of models using clustering of models is improved.
mining XML association rule is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. In order to make mining XML association rule efficiently, we give a new definiti...
详细信息
ISBN:
(纸本)9781424409723
mining XML association rule is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. In order to make mining XML association rule efficiently, we give a new definition of transaction and item in XML context, then build transaction database based on an index table. Based on our definition and the index table used for XML searching, we can check the include relation between a transaction and an item quickly. A high adaptive mining technique is also described. By using it, we can process mining rules with no guidance of interest associations given by users and mining unknown rules. We demonstrate the effectiveness of these techniques through experiments on real-life data.
the need for tools to aid the selection of the CI models that lie at the heart of many AI systems has never been greater, due to the mainstreaming of datamining and other AI applications. LEONARDO -our contribution t...
详细信息
ISBN:
(纸本)9780769530697
the need for tools to aid the selection of the CI models that lie at the heart of many AI systems has never been greater, due to the mainstreaming of datamining and other AI applications. LEONARDO -our contribution to this process- is a recommender system that selects and ranks applicable CI models for a given problem based on the peculiarities of the domain as determined by the user's preferences and dataset characteristics. Leonardo's recommendations are based on two knowledge bases. One contains the description of 65 CI models and provides the Meta knowledge for pruning the space of all CI models to only those applicable to the current task. the second KB contains the performance results of over 200 datasets on the applicable CI models. LEONARDO's ranking is achieved by using the performance information of the k entries, from this KB, nearest in similarity to the new domain dataset.
Many processes experience abrupt changes in their dynamics. this causes problems for some prediction algorithms which assume that the dynamics of the sequence to be predicted are constant, or at least only change slow...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Many processes experience abrupt changes in their dynamics. this causes problems for some prediction algorithms which assume that the dynamics of the sequence to be predicted are constant, or at least only change slowly over time. In this paper the problem of predicting sequences with sudden changes in dynamics is considered. For a model of multivariate Gaussian data we derive expected generalization error of standard linear Fisher classifier in situation where after unexpected task change, the classification algorithm learns on a mixture of old and new data. We show both analytically and by an experiment that optimal length of learning sequence depends on complexity of the task, input dimensionality, on the power and periodicity of. the changes. the proposed solution is to consider a collection of agents, in this case non-linear single layer perceptrons (agents), trained by a memetic like learning algorithm. T e most successful agents are voting for predictions. A grouped structure of the agent population assists in obtaining favorable diversity in the agent population. Efficiency of socially organized evolving multi-agent system is demonstrated on an artificial problem.
the recognition of hand-written Chinese characters using Mahalanobis distance is extensively utilized in bank cheque processing applications. the Mahalanobis distance, defined by the innovation and its covariance, is ...
详细信息
ISBN:
(纸本)9781424409723
the recognition of hand-written Chinese characters using Mahalanobis distance is extensively utilized in bank cheque processing applications. the Mahalanobis distance, defined by the innovation and its covariance, is compared among several target character classes, and the computation is a time-consuming operation. this paper presents an efficient computation for this process. the method described here can be summarized as an incremental, non-decreasing computation for the Mahalanobis distance;if the incrementally computed value exceeds the threshold then the computation is stopped. the elements of covariance and innovation are only computed if they are used, and progressivity is the major advantage of the method. this method is based upon the square-root-free Cholesky's factorization. Experiment shows the method proposed here is effective in financial hand-written character recognition.
To meet the personalized needs of E-learning, an improved association mining rules was proposed in the paper. First, data cube from database was established. then, frequent item-set that satisfies the minimum support ...
详细信息
ISBN:
(纸本)9781424409723
To meet the personalized needs of E-learning, an improved association mining rules was proposed in the paper. First, data cube from database was established. then, frequent item-set that satisfies the minimum support on data cube was mined out. Furthermore, association rules of frequent item-set was generated. Finally, redundant association rules through the relative method in statistics were wiped off. the algorithm had two advantages, the first was that the execution time was short while searching for the frequent item-set;the second was that the precision of the rules was high. the algorithm was also used in personality mining system based on E-learning model (PMSEM). the result manifested that the algorithm was effective.
暂无评论