Imbalanced datalearning (IDL) is one of the most active and important fields in machinelearning research. this paper focuses on exploring the efficiencies of four different SVM ensemble methods integrated with under...
详细信息
ISBN:
(纸本)9783642033476
Imbalanced datalearning (IDL) is one of the most active and important fields in machinelearning research. this paper focuses on exploring the efficiencies of four different SVM ensemble methods integrated with under-sampling in IDL. the experimental results on 20 UCI imbalanced datasets show that two new ensemble algorithms proposed in this paper, i.e., CABagE (which is bagging-style) and MABstE (which is boosting-style), call output the SVM ensemble classifiers with better minority-class-recognition abilities than the existing ensemble methods. Further analysis on the experimental results indicates that MABstE has the best overall classification performance, and we believe that this should be attributed to its more robust example-weighting mechanism.
Keywords are subset of words or phrases from a document that call describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do no...
详细信息
ISBN:
(纸本)9783642033476
Keywords are subset of words or phrases from a document that call describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. Oil the other hand, manual assignment of high quality keywords is time-consuming, and error prone. therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. However, most methods Of automatic keyword extraction cannot use the features of documents effectively. A method which integrates the statistical machinelearning models is proposed in this paper. this method extracts keyword from Chinese documents through voting Of multiple keywords extraction models. Experimental results show that the proposed method based on ensemble leaning outperforms other methods according to F, measurement. Moreover, the keywords extraction model based oil ensemble learning withthe weighted voting outperforms the model without the weighted voting.
the availability of video format sign language corpora limited. this leads to a desire for techniques which do not rely on large, fully-labelled datasets. this paper covers various methods for learning sign either fro...
详细信息
ISBN:
(纸本)9783642027123
the availability of video format sign language corpora limited. this leads to a desire for techniques which do not rely on large, fully-labelled datasets. this paper covers various methods for learning sign either from small data sets or from those without ground truth labels. To avoid non-trivial tracking issues: sign detection is investigated using volumetric spatio-temporal features. Following this the advantages of recognising the component parts of sign rather than the signs themselves is demonstrated and finally the idea of using a weakly labelled data set is considered and results shown for work in this area.
We consider the problem of learning classifiers from samples which have additional features that are absent due to noise or corruption of measurement. the common approach for handling missing features in discriminativ...
详细信息
ISBN:
(纸本)9783642030697
We consider the problem of learning classifiers from samples which have additional features that are absent due to noise or corruption of measurement. the common approach for handling missing features in discriminative models is first to complete their unknown values, anti then a standard classification algorithm is employed over the completed data. In this paper, an algorithm which aims to maximize the margin of each sample in its own relevant subspace is proposed. We show how incomplete data can be classified directly without completing any missing features in a large-margin learning framework. Moreover, according to the theory of optimal kernel function, we proposed an optimal kernel function which is a convex composition of a set of linear kernel function to measure the similarity between additional features of each two samples. Based on the geometric interpretation of the margin, we formulate an objective function to maximize the margin of each sample in its own relevant subspace. In this formulation. we make use of the Structural parameters trained front existing features and optimize the structural parameters trained front additional features only. A two-step iterative procedure for solving, the objective function is proposed. By avoiding the pre-processing phase in which the data is completed, our algorithm Could offer considerable computational saving. We demonstrate our results on a number of standard benchmarks from UCI and the results Show that our algorithm can achieve better or comparable classification accuracy compared to the existing algorithms.
the proceedings contain 81 papers. the topics discussed include: from machinelearning to child learning;sensitivity based generalization error for supervised learning problems with application in feature selection;cl...
ISBN:
(纸本)3642033474
the proceedings contain 81 papers. the topics discussed include: from machinelearning to child learning;sensitivity based generalization error for supervised learning problems with application in feature selection;cluster analysis based on the central tendency deviation principle;a parallel hierarchical agglomerative clustering technique for bilingual corpora based on reduced terms with automatic weight optimization;automatically identifying tag types;social knowledge-driven music hit prediction;closed non derivable data cubes based on non derivable minimal generators;indexing the function: an efficient algorithm for multi-dimensional search with expensive distance functions;anti-germ performance prediction for detergents based on Elman network on small data sets;miningthe structure and evolution of the airport network of china over the past twenty years;and collaborative filtering recommendation algorithm using dynamic similar neighbor probability.
As an extremely powerful probability model, Gaussian mixture model (GMM) has been widely used in fields of patternrecognition, information processing and datamining. If the number of the Gaussians in the mixture is ...
详细信息
Organ transplantation is a highly complex decision process that requires expert, decisions. the major problem ill a transplantation procedure is the possibility of the receiver's immune system attack and destroy t...
详细信息
ISBN:
(纸本)9783642030697
Organ transplantation is a highly complex decision process that requires expert, decisions. the major problem ill a transplantation procedure is the possibility of the receiver's immune system attack and destroy the transplanted tissue. It is therefore of capital importance to find a donor withthe highest possible compatibility withthe receiver, and thus reduce rejection. Finding a good donor is not a straightforward task because a complex network of relations exist's between the immunological and the clinical variables that, influence the receivers acceptance of the transplanted organ. Currently the process of analyzing these variables involves a careful study by the clinical transplant team. the number and complexity of the relations between variables make the manual process very slow. Ill this paper we propose and compare two machinelearning algorithms that might help the transplant team ill improving and Speeding up their decisions. We achieve that objective by analyzing past real cases and constructing models as set, of rules. Such models are accurate and understandable by experts.
It is of high biomedical interest to identify gene interactions and networks that are associated with developmental and physiological functions in the mouse embryo. there are now large datasets with both spatial and o...
详细信息
ISBN:
(纸本)9783642033476
It is of high biomedical interest to identify gene interactions and networks that are associated with developmental and physiological functions in the mouse embryo. there are now large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene-expression that provide a powerful resource to discover potential mechanisms of embryo organisation. Ontological annotation of gene expression consists of labeling images with terms from the anatomy ontology for mouse development. Current annotation is made manually by domain experts. It is both time consuming and costly. In this paper, we present;a new datamining framework to automatically annotate gene expression patterns in images with anatomic terms. this framework integrates the images stored in file systems with ontology terms stored ill databases, and combines patternrecognition with image processing techniques to identify the anatomical component's that;exhibit gene expression patterns in images. the experimental result shows the framework works well.
In this paper we address the problem of using bet selections of a large number of mostly non-expert users to improve sports betting tips. A similarity based approach is used to describe individual users' strategie...
详细信息
Nowadays, the carpet quality analysis is determined in industry by human experts, because the automated assessment is not capable of matching the human expertise. therefore, the carpet company demands a reliable and e...
详细信息
ISBN:
(纸本)9781424450077
Nowadays, the carpet quality analysis is determined in industry by human experts, because the automated assessment is not capable of matching the human expertise. therefore, the carpet company demands a reliable and economic standardization of carpet wear level. this paper presents a new strategy for analyzing and classifying the texture of the wear carpet surface of 3D image, where 3D image is produced by 3D laser scanner. 2D image is obtained from 3D data resample on different grid sizes. the features extracted are based on Haralick descriptors of co-occurrence matrix. these features are used as inputs to a classifier system, which is based on support vector machine (SVM). Multi-class classification training based on SVM is applied. the performance of the new technique proposed gives an average of over 92% correct labeling.
暂无评论