We investigate here concept learning from incomplete examples, denoted here as ambiguous. We start from the learning from interpretations setting introduced by L. De Raedt and then follow the informal ideas presented ...
详细信息
ISBN:
(纸本)9783642030697
We investigate here concept learning from incomplete examples, denoted here as ambiguous. We start from the learning from interpretations setting introduced by L. De Raedt and then follow the informal ideas presented by H. Hirsh to extend the Version space paradigm to incomplete data: a hypothesis has to be compatible with all pieces of information provided regarding the examples. We propose and experiment an algorithm that given a set of ambiguous examples, learn a concept as an existential monotone DNF. We show that 1) boolean concepts can be learned, even with high incompleteness level as long as enough information is provided, and 2) monotone, non monotone DNF (i.e. including negative literals), and attribute-value hypotheses can be learned that way, using an appropriate background knowledge. We also show that a clever implementation, based on a multi-table representation is necessary to apply the method with high levels of incompleteness.
Ontologies and concept taxonomies are essential parts of the Semantic Web infrastructure. Since manual construction of taxonomies requires considerable efforts, automated methods for taxonomy construction should be co...
详细信息
ISBN:
(纸本)9789898111814
Ontologies and concept taxonomies are essential parts of the Semantic Web infrastructure. Since manual construction of taxonomies requires considerable efforts, automated methods for taxonomy construction should be considered. In this paper, an approach for automatic derivation of concept taxonomies from web search results is presented. the method is based on generating derivative features from web search data and applying the machinelearning techniques. the Support Vector machine (SVM) classifier is trained with known concept hyponym-hypernym pairs and the obtained classification model is used to predict new hyponymy (is-a) relations. Prediction results are used to generate concept taxonomies in OWL. the results of the application of the approach for constructing colour taxonomy are presented.
Existing datamining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Most relationships in spatial datasets are regional therefore there is a great...
详细信息
ISBN:
(纸本)9783642030697
Existing datamining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Most relationships in spatial datasets are regional therefore there is a great need to extract regional knowledge front spatial datasets. this paper proposes a novel framework to discover interesting regions characterized by "strong regional correlation relationships" between attributes. and methods to analyze differences and similarities between regions. the framework employs a two-phase approach: it first discovers regions by employing clustering algorithms that maximize a PCA-based fitness function and then applies post processing techniques to explain Underlying regional Structures and correlation patterns. Additionally, a new similarity measure that assesses the structural Similarity Of regions based on correlation sets is introduced. We evaluate our framework in I case study which centers on finding correlations between arsenic pollution and other factors in water wells and demonstrate that our framework effectively identifies regional correlation patterns.
the induction of knowledge from a data set. relies ill the execution of multiple datamining actions: to apply filters to clean and select the data, to train different algorithms (clustering, classification, regressio...
详细信息
ISBN:
(纸本)9783642030697
the induction of knowledge from a data set. relies ill the execution of multiple datamining actions: to apply filters to clean and select the data, to train different algorithms (clustering, classification, regression, association), to evaluate the results using different approaches (cross validation, statistical analysis), to visualize the, results, etc. In a real datamining process, previous actions are executed several times, sometimes in a loop, until an accurate result is obtained. However, performing previous tasks require's a datamining engineer or expert which supervises the design and evaluate the whole process. the goat of this paper is to describe MOLE, an architecture to automatize the data, mining process. the architecture assumes than die datamining process can be seen from a Classical planning perspective! and hence. that classical planning tools can be used to design process. MOLE is built and instantiated oil the basis of i) standard languages to describe the data set and the datamining process, ii) available Cools to design, execute and evaluate the datamining processes.
this paper is providing an introduction to the text mining methodology. there are many different researches which applying machinelearning to improve its management application efficiency in various domains. this res...
详细信息
Integration methods for ensemble learning can use two different approaches: combination or selection. the combination approach (also called fusion) consists on the combination of the predictions obtained by different ...
详细信息
ISBN:
(纸本)9783642030697
Integration methods for ensemble learning can use two different approaches: combination or selection. the combination approach (also called fusion) consists on the combination of the predictions obtained by different models in the ensemble to obtain the final ensemble predication. the selection approach selects one (or more) models from the ensemble according to the prediction performance of these models on similar data from the validation set. Usually, the method to select similar data is the k-nearest neighbors withthe Euclidean distance. In this paper we discuss other approaches to obtain similar data for the regression problem. We show that using similarity measures according to the target values improves results. We also show that selecting dynamically several models for the prediction task increases prediction accuracy comparing to the selection of just one model.
In this paper we are interested in musical data classification. For musical features representation, we propose to adopt a histogram structure in order to preserve a maximum amount of information. the melodic dimensio...
详细信息
ISBN:
(纸本)9783642025174
In this paper we are interested in musical data classification. For musical features representation, we propose to adopt a histogram structure in order to preserve a maximum amount of information. the melodic dimension of the data is described in terms of pitch values, pitch intervals, melodic direction and durations of notes as well as silences. Our purpose is to have a data representation well Suited to a generic framework for classifying melodies by means of known supervised machinelearning (ML) algorithms. Since Such algorithms are not expected to handle histogram-based feature values, we propose to transform the representation space in the patternrecognition process. this transformation is realized by partitioning the domain of each attribute using a clustering technique. the model is evaluated experimentally by implementing three kinds of classifiers (musical genre, composition style and emotional content).
Reinforcement learning is an important method of machinelearning. this paper using the graph theory to express varieties of knowledge points, which their's relationship is expressed by the graph of topological gr...
详细信息
ISBN:
(纸本)9781424452781
Reinforcement learning is an important method of machinelearning. this paper using the graph theory to express varieties of knowledge points, which their's relationship is expressed by the graph of topological graph. Applied the Technology of association rule Recommendation to deal withthe relationship between these knowledge points, give the corresponding of the recommendation work flow chart. In the paper data tables used to store the knowledge points, the algorithm to demonstrate the technical of association rule Recommendation feasibility and rationality.
Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in Wine of ensemble classification models, few of them could adapt to...
详细信息
ISBN:
(纸本)9783642030697
Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in Wine of ensemble classification models, few of them could adapt to Hie detection oil the different types of concept drifts from noisy streaming data in a demand on overheads of time and space. Motivated by this, a new classification algorithm for Concept drifting Detection based on an ensembling model of Random Decision Trees (called CDRDT) is proposed in this paper. Extensive studies with synthetic and real streaming dam demonstrate that in comparison to several classification algorithms for concept drifting data streams, CDRDT not only could effectively and efficiently detect the potential concept changes in the noisy data streams, but also performs much better oil the abilities of runtime and space with an improvement in predictive accuracy. thus, our proposed algorithm provides a significant reference to the classification for concept drifting data streams with noise in a light, weight way.
While for many problems in medicine classification models are being developed, Bayesian network classifiers do not seem to have become is widely accepted within the medical community as logistic regression models. We ...
详细信息
ISBN:
(纸本)9783642030697
While for many problems in medicine classification models are being developed, Bayesian network classifiers do not seem to have become is widely accepted within the medical community as logistic regression models. We compare first-order logistic regression and naive Bayesian classification in the domain of reproductive medicine and demonstrate that the two techniques can result in models of comparable performance. For Bayesian network classifiers to become more widely accepted within the medical community, we feel that they should be better aligned withtheir context of application. We describe how to incorporate well-known concepts of clinical relevance in the process Of Constructing and evaluating Bayesian network classifiers to achieve Such an alignment.
暂无评论