Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. this has lead to test methods b...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Programs for gene prediction in computational biology are examples of systems for which the acquisition of authentic test data is difficult as these require years of extensive research. this has lead to test methods based on semiartificially produced test data, often produced by ad hoc techniques complemented by statistical models such as Hidden Markov Models (HMM). the quality of such a test method depends on how well the test data reflect the regularities in known data and how well they generalize these regularities. So far only very simplified and generalized, artificial data sets have been tested, and a more thorough statistical foundation is required. We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated withthe generation of new, artificial data. More specifically, we suggest to use the PRISM system developed by Sato and Kameya. Based on logic programming extended with random variables and parameter learning, PRISM appears as a powerful modelling environment, which subsumes HMMs and a wide range of other methods, all embedded in a declarative language. We illustrated these principles here, showing parts of a model under development for genetic sequences and indicate first initial experiments producing test data for evaluation of existing gene finders, exemplified by GENSCAN, HMMGene and ***.
A Classification Association Rule (CAR), a common type of mined knowledge in datamining, describes an implicative co-occurring relationship between a set of binary-valued data-attributes (items) and a pre-defined cla...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
A Classification Association Rule (CAR), a common type of mined knowledge in datamining, describes an implicative co-occurring relationship between a set of binary-valued data-attributes (items) and a pre-defined class, expressed in the form of an "antecedent double right arrow consequent-class" rule. Classification Association Rule mining (CARM) is a recent Classification Rule mining (CRM) approach that builds an Association Rule mining (ARM) based classifier using CARs. Regardless of which particular methodology is used to build it, a classifier is usually presented as an ordered CAR list, based on an applied rule ordering strategy. Five existing rule ordering mechanisms can be identified: (1) Confidence-Support-size -of-Antecedent (CSA), (2) size-of-Antecedent-Confidence-Support (ACS), (3) Weighted Relative Accuracy (WRA), (4) Laplace Accuracy, and (5) chi(2) Testing. In this paper, we divide the above mechanisms into two groups: (i) pure "support-confidence" framework like, and (ii) additive score assigning like. We consequently propose a hybrid rule ordering approach by combining one approach taken from (i) and another approach taken from (ii). the experimental results show that the proposed rule ordering approach performs well with respect to the accuracy of classification.
Transduction is an inference mechanism "from particular to particular". Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classif...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Transduction is an inference mechanism "from particular to particular". Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. Unlike the classical inductive setting, no general rule valid for all possible instances is generated. Transductive learning is most suited for those applications where the examples for which a prediction is needed are already known when training the classifier. Several approaches have been proposed in the literature on building transductive classifiers from data stored in a single table of a relational database. Nonetheless, no attention has been paid to the application of the transduction principle in a (multi-) relational setting, where data are stored in multiple tables of a relational database. In this paper we propose a new transductive classifier, named TRANSC, which is based on a probabilistic approach to making transductive inferences from relational data. this new method works in a transductive setting and employs a principled probabilistic classification in multi-relational datamining to face the challenges posed by some spatial datamining problems. Probabilistic inference allows us to compute the class probability and return, in addition to result of transductive classification, the confidence in the classification. the predictive accuracy of TRANSC has been compared to that of its inductive counterpart in an empirical study involving both a benchmark relational dataset and two spatial datasets. the results obtained are generally in favor of TRANSC, although improvements are small by a narrow margin.
Dimensionality reduction is a significant problem in patternrecognition and thus arouses broad interest in the machinelearning community. Different from the traditional linear dimensionality reduction methods, recen...
详细信息
ISBN:
(纸本)9781424410651
Dimensionality reduction is a significant problem in patternrecognition and thus arouses broad interest in the machinelearning community. Different from the traditional linear dimensionality reduction methods, recently some nonlinear Methods have been proposed in virtue of manifold learningthese methods can efficiently discover the low-dimensional nonlinear manifold in a high-dimensional data space and further preserve the manifold structure of the data points in the low-dimensional embedding space. Despite their attractive properties, these nonlinear methods are sensitive to the outliers in the data sets. Moreover, the exisiting locally nonlinear dimensionality reduction methods generally neglect the globally structural information. 117 this paper, we address these problems in the context of Locally Linear Embedding (LLE). through capturing the local and global geometry information simultaneously, we propose an alternative approach to make the local embedding relatively more robust, called as Alternative Robust Local Embedding (ARLE). It can not only suppress an unfavorable influence of the oulliers on the embedding process automatically, but also outperform LLE in 2-D data visualization due to the introduction of the global geometry. Experimental results and comparisons on both synthetic and real data sets show the effectiveness of ARLE.
Support vector machine (SVM) is a novel machinelearning method based on statistical learningtheory. A material erosion rate model based on principal component least square SVM (PCLS-SVM) is proposed PCA calculates p...
详细信息
ISBN:
(纸本)9781424410651
Support vector machine (SVM) is a novel machinelearning method based on statistical learningtheory. A material erosion rate model based on principal component least square SVM (PCLS-SVM) is proposed PCA calculates principal components in high dimensional feature space and reduces dimensions of sample. Cross validation method is used to select parameters of PCLS-SVM model. PCLS-SVM is applied to prediction of material erosion rate. Results indicate that this method features high learning speed and well generalization ability.
One of most important algorithms for miningdata streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
One of most important algorithms for miningdata streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system fVFDT on top of VFDT and VFDTc. We make the following four contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDT's processing time is O(n(2)). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but fVFDT just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, fVFDT's candidate split-test number decrease from O(n) to O(logn).4)lmprove the soft discretization method to be used in data streams mining, it overcomes the problem of noise data and improve the classification accuracy.
To distinguish chatter gestation, chatter recognition based on hybrid SOM/DHMM is proposed for dynamic patterns of chatter gestation in cutting process. At first FFT features are extracted from the vibration signal of...
详细信息
ISBN:
(纸本)9781424410651
To distinguish chatter gestation, chatter recognition based on hybrid SOM/DHMM is proposed for dynamic patterns of chatter gestation in cutting process. At first FFT features are extracted from the vibration signal of cutting process, then FFT vectors are presorted and coded into code book of integer numbers by SOM(Self-Organizing Feature Map), and these code books are introduced to DHMM (Discrete Hidden Markov Models), for machinelearning and classification. Finally the results of chatter gestation recognition and chatter prediction experiments are presented and show that the method proposed is effective.
the work presented here focuses on combining multiple classifiers to form single classifier for pattern classification, machinelearning for expert system, and datamining tasks. the basis of the combination is that e...
详细信息
ISBN:
(纸本)9783540770459
the work presented here focuses on combining multiple classifiers to form single classifier for pattern classification, machinelearning for expert system, and datamining tasks. the basis of the combination is that efficient concept learning is possible in many cases when the concepts learned from different approaches are combined to a more efficient concept. the experimental result of the algorithm, EMRL in a representative collection of different domain shows that it performs significantly better than the several state-of-the-art individual classifier, in case of 11 domains out of 25 data sets whereas the state-of-the-art individual classifier performs significantly better than EMRL only in 5 cases.
A method of patternrecognition of tool wear based on Discrete Hidden Markov Models (DHMM) is proposed to monitor tool wear and to predict tool failure. At the first FFT features are extracted from the vibration signa...
详细信息
ISBN:
(纸本)9781424410651
A method of patternrecognition of tool wear based on Discrete Hidden Markov Models (DHMM) is proposed to monitor tool wear and to predict tool failure. At the first FFT features are extracted from the vibration signal and cutting force in culling process, then FFT vectors are presorted and coded into code book of integer numbers by SOM, and these code books are introduced to DHMM for machinelearning to, build up 3-HMMs for different tool wear stage. And then, pattern is recognised by using maximum probability. Finally the results of tool wear recognition and failure prediction experiments were presented and shown that the method proposed is effective.
the S-transform is a time frequency analysis technique combining properties of the short-time Fourier and wavelet transforms. It provides frequency-dependent resolution while maintaining g a direct relationship with t...
详细信息
ISBN:
(纸本)9781424410651
the S-transform is a time frequency analysis technique combining properties of the short-time Fourier and wavelet transforms. It provides frequency-dependent resolution while maintaining g a direct relationship withthe Fourier spectrum. However, the frequency resolution of S-transform in high-frequency is unsatisfactory. In this paper, we present a data-adaptive S-transform by optimizing the window width according to the measure of 'concentration'. the proposed method is tested on a set of synthetic signals. the result shows that the proposed algorithm achieves higher resolution and energy concentration than the original S-transform and short-time Fourier transform.
暂无评论