Manifold learning has become a hot issue in the research fields of machinelearning and datamining. Current manifold learning algorithms assume that the observed data set has the high density. But, how to evaluate th...
详细信息
Much work has already been done on building named entity recognition systems. However most of this work has been concentrated on English and other European languages. Hence, building a named entity recognition (NER) s...
详细信息
The proceedings contain 37 papers. The special focus in this conference is on Decision Trees, Clustering and Its Application. The topics include: Introspective learning to build case-based reasoning CBR knowledge cont...
ISBN:
(纸本)3540405046
The proceedings contain 37 papers. The special focus in this conference is on Decision Trees, Clustering and Its Application. The topics include: Introspective learning to build case-based reasoning CBR knowledge containers;graph-based tools for datamining and machinelearning;simplification methods for model trees with regression and splitting nodes;learning multi-label alternating decision trees from texts and data;a discretization method of continuous attributes with guaranteed resistance to noise;on the size of a classification tree;a comparative analysis of clustering algorithms applied to load profiling;similarity-based clustering of sequences using hidden markov models;a fast parallel optimization for training support vector machine;a ROC-based reject rule for support vector machines;remembering similitude terms in CBR;authoring cases from free-text maintenance data;classification boundary approximation by using combination of training steps for real-time image segmentation;simple mimetic classifiers;novel mixtures based on the dirichlet distribution;estimating a quality of decision function by empirical risk;efficient locally linear embeddings of imperfect manifolds;dissimilarity representation of images for relevance feedback in content-based image retrieval;a rule-based scheme for filtering examples from majority class in an imbalanced training set;coevolutionary feature learning for object recognition;generalization of pattern-growth methods for sequential patternmining with gap constraints;discover motifs in multi-dimensional time-series using the principal component analysis and the MDL principle and optimizing financial portfolios from the perspective of mining temporal structures of stock returns.
The multi-relational datamining (MrdM) approach looks for patterns that involve multiple tables from a relational database made of complex/structured objects whose normalized representation does require multiple tabl...
详细信息
ISBN:
(纸本)9783540884347
The multi-relational datamining (MrdM) approach looks for patterns that involve multiple tables from a relational database made of complex/structured objects whose normalized representation does require multiple tables. We have applied MrdM methods (relational association rule discovery and probabilistic relational models) with hidden Markov models (HMMs) and Viterbi algorithm (VA) to mine tetratricopeptide repeat (TPR), pentatricopeptide (PPR) and half-a-TPR (HAT) in genomes of pathogenic protozoa Leishmania. TPR is a protein-protein interaction module and TPR-containing proteins (TPRPs) act as scaffolds for the assembly of different multiprotein complexes. Our aim is to build a great panel of the TPR-like superfamily of Leishmania. Distributed relational state representations for complex stochastic processes were applied to identification, clustering and classification of Leishmania genes and we were able to detect putative 104 TPRPs, 36 PPRPs and 08 HATPs, comprising the TPR-like superfamily. We have also compared currently available resources (Pfam, SMART, SUPERFAMILY and TPRpred) with our approach (MrdM/HMM/VA).
The following topics are dealt with: AI and expert systems; artificial immune systems and bio-informatics; chaos theory; datamining; fuzzy set theory; genetic algorithm; information retrieval; intelligent control; in...
详细信息
The following topics are dealt with: AI and expert systems; artificial immune systems and bio-informatics; chaos theory; datamining; fuzzy set theory; genetic algorithm; information retrieval; intelligent control; intelligent decision making; intelligent information processing; intelligent recognition; intelligent robotics; machinelearning; natural language & machine translation; neural networks; rought set theory; support vector machine; swarm intelligence; video and image processing.
The goal of statistical pattern feature extraction (SPFE) is 39;low loss dimension reduction39;. As the key link of patternrecognition, dimension reduction has become the research hot spot and difficulty in the f...
详细信息
ISBN:
(纸本)9783540859833
The goal of statistical pattern feature extraction (SPFE) is 'low loss dimension reduction'. As the key link of patternrecognition, dimension reduction has become the research hot spot and difficulty in the fields of patternrecognition, machinelearning, datamining and so on. pattern feature extraction is one of the most challenging research fields and has attracted the attention from many scholars. This paper summarily introduces the basic principle of SPFE, and discusses the latest progress of SPFE from the aspects such as classical statistical theories and their modifications, kernel-based methods, wavelet analysis and its modifications, algorithms integration and so on. At last we discuss the development trend of SPFE.
In order to preprocess data for datamining algorithms, an attribute division algorithm based on entropy is given through analyzing the physical meaning of information entropy. The algorithm measures the relativity am...
详细信息
In order to preprocess data for datamining algorithms, an attribute division algorithm based on entropy is given through analyzing the physical meaning of information entropy. The algorithm measures the relativity among the different attributes based on entropy qualitatively and quantitatively. The original attribute set is divided into several subsets which are conditional independence by k-means clustering algorithm. Experimental results show that this algorithm can be used for data preprocessing.
Organization name recognition is the most difficult part in named entity recognition, in order to reduce the use of tagged corpus and use a large amount of untagged corpus, we firstly present using semi-supervised mac...
详细信息
Organization name recognition is the most difficult part in named entity recognition, in order to reduce the use of tagged corpus and use a large amount of untagged corpus, we firstly present using semi-supervised machinelearning algorithm co-training combining with conditional random fields model and support vector machines on Chinese organization name recognition. Based on the principles of compatible and uncorrelated, we construct different classifiers from different views of conditional random fields model, and also construct different classifiers from two models of conditional random fields model and support vector machines as two views. Then present a heuristic untagged samples selection algorithm. From the experimental results we can see that, under the same F-measure, co-training algorithm simply use about 30% of the tagged data compared to single statistical model; under the same tagged data, co-training algorithm has an F-measure increase about 10% than single statistical model.
In many machinelearning settings, labeled samples are difficult to collect while unlabeled samples are abundant. We investigate in this paper the design of support vector machine classification algorithms learning fr...
详细信息
In many machinelearning settings, labeled samples are difficult to collect while unlabeled samples are abundant. We investigate in this paper the design of support vector machine classification algorithms learning from positive and unlabeled samples only. We first find the minimum bounding sphere that enclosed all the positive samples, and then use this minimum bounding sphere to pick out the negative samples from the unlabeled samples, at last we train the support vector machine using the training set which consists of the given positive samples and the negative samples picked out from the unlabeled samples. Experiments indicate that support vector machinelearning from positive and unlabeled samples achieves the desired high test precision and prediction accuracy.
There have been proposed many learning algorithms for VQ based on the steepest descend method. However, any learning algorithm known as a superior one does not always work well. This paper proposes a new learning algo...
详细信息
There have been proposed many learning algorithms for VQ based on the steepest descend method. However, any learning algorithm known as a superior one does not always work well. This paper proposes a new learning algorithm with boosting. Boosting is a general method which attempts to boost the accuracy of any given learning algorithm. The proposed method consists of three sub-learners. The first sub-learner is constructed by performing the conventional learning algorithm with data randomly selected from given data space. The second sub-learner is constructed by performing the conventional learning algorithm with data selected with higher probability from data incorrectly learned by the first sub-learner. The third sub-learner is constructed with data for which either the first or the second sub-learner is incorrectly learned. That is, the method attempts to construct different kinds of reference vectors by using different kinds of data sets constructed from the original data set The output for any input data is given as decision by averaging the outputs of three sub-learners. In order to show the effectiveness of the proposed algorithm, numerical simulations are performed.
暂无评论