The clausal discovery engine CLAUDIEN is presented. CLAUDIEN is an inductive logic programming engine that fits in the descriptive data mining paradigm. CLAUDIEN addresses characteristic induction from interpretations...
详细信息
The clausal discovery engine CLAUDIEN is presented. CLAUDIEN is an inductive logic programming engine that fits in the descriptive data mining paradigm. CLAUDIEN addresses characteristic induction from interpretations, a task which is related to existing formalisations of induction in logic. In characteristic induction from interpretations, the regularities are represented by clausal theories, and the data using Herbrand interpretations. Because CLAUDIEN uses clausal logic to represent hypotheses, the regularities induced typically involve multiple relations or predicates. CLAUDIEN also employs a novel declarative bias mechanism to define the set of clauses that may appear in a hypothesis.
This paper discusses the role that background knowledge can play in building flexible multistrategy learning systems. We contend that a variety of learning strategies can be embodied in the background knowledge provid...
详细信息
This paper discusses the role that background knowledge can play in building flexible multistrategy learning systems. We contend that a variety of learning strategies can be embodied in the background knowledge provided to a general purpose learning algorithm. To be effective, the general purpose algorithm must have a mechanism for learning new concept descriptions that can refer to knowledge provided by the user or learned during some other task. The method of knowledge representation is a central problem in designing such a system since it should be possible to specify background knowledge in such a way that the learner can apply its knowledge to new information.
Three different formalizations of concept-learning in logic (as well as some variants) are analyzed and related. It is shown that learning from interpretations reduces to learning from entailment, which in rum reduces...
详细信息
Three different formalizations of concept-learning in logic (as well as some variants) are analyzed and related. It is shown that learning from interpretations reduces to learning from entailment, which in rum reduces to learning from satisfiability. The implications of this result for inductive logic programming and computational learning theory are then discussed, and guidelines for choosing a problem-setting are formulated. (C) 1997 Elsevier Science B.V.
When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a ver...
详细信息
When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real-world databases. The opposite extreme is to select a small data set, thereby being able to learn very expressive (first-order logic) hypotheses. A multistrategy approach allows one to include most of these advantages and exclude most of the disadvantages. Simpler learning algorithms detect hierarchies which are used to structure the hypothesis space for a more complex learning algorithm. The better structured the hypothesis space is, the better learning can prune away uninteresting or losing hypotheses and the faster it becomes. We have combined inductive logic programming (ILP) directly with a relational database management system. The ILP algorithm is controlled in a model-driven way by the user and in a data-driven way by structures that are induced by three simple learning algorithms.
Pre-pruning and Post-pruning are two standard techniques for handling noise in decision tree learning. Pre-pruning deals with noise during learning, while post-pruning addresses this problem after an overfitting theor...
详细信息
Pre-pruning and Post-pruning are two standard techniques for handling noise in decision tree learning. Pre-pruning deals with noise during learning, while post-pruning addresses this problem after an overfitting theory has been learned. We first review several adaptations of pre- and post-pruning techniques for separate-and-conquer rule learning algorithms and discuss some fundamental problems. The primary goal of this paper is to show how to solve these problems with two new algorithms that combine and integrate pre- and post-pruning.
We devise a method to generate descriptive classification rules of shape contours by using inductive learning. The classification rules are represented in the form of logic programs. We first transform input objects f...
详细信息
We devise a method to generate descriptive classification rules of shape contours by using inductive learning. The classification rules are represented in the form of logic programs. We first transform input objects from pixel representation into predicate representation. The transformation consists of preprocessing, feature extraction and symbolic transformation. We then use FOIL which is an indictive logicprogramming system to produce classification rules. Experiments on two sets of data were performed to justify our proposed method. Copyright (C) 1997 pattern Recognition Society.
In this paper, we design an algorithm to construct simply recursive programs from a finite set of good examples. (C) 1997 Published by Elsevier Science B.V.
In this paper, we design an algorithm to construct simply recursive programs from a finite set of good examples. (C) 1997 Published by Elsevier Science B.V.
We present a new approach, called First Order Regression (FOR), to handling numerical information in inductive logic programming (ILP). FOR is a combination of ILP and numerical regression. First-order logic descripti...
详细信息
We present a new approach, called First Order Regression (FOR), to handling numerical information in inductive logic programming (ILP). FOR is a combination of ILP and numerical regression. First-order logic descriptions are induced to carve out those subspaces that are amenable to numerical regression among real-valued variables. The program FORS is an implementation of this idea, where numerical regression is focused on a distinguished continuous argument of the target predicate. We show that this can be viewed as a generalisation of the usual ILP problem. Applications of FORS On several real-world data sets are described: the prediction of mutagenicity of chemicals, the modelling of liquid dynamics in a surge tank, predicting the roughness in steel grinding, finite element mesh design, and operator's skill reconstruction in electric discharge machining. A comparison of FORS' performance with previous results in these domains indicates that FORS is an effective tool for ILP applications that involve numerical data.
The automated construction of dynamic system models is an important application area for ILP. We describe a method that learns qualitative models from time-varying physiological signals. The goal is to understand the ...
详细信息
The automated construction of dynamic system models is an important application area for ILP. We describe a method that learns qualitative models from time-varying physiological signals. The goal is to understand the complexity of the learning task when faced with numerical data, what signal processing techniques are required, and how this affects learning. The qualitative representation is based on Kuipers' QSIM. The learning algorithm for model construction is based on Coiera's GENMODEL. We show that QSIM models are efficiently PAC learnable from positive examples only, and that GENMODEL is an ILP algorithm for efficiently constructing a QSIM model. We describe both GENMODEL which performs RLGG on qualitative states to learn a QSIM model, and the front-end processing and segmenting stages that transform a signal into a set of qualitative states. Next we describe results of experiments on data from six cardiac bypass patients. Useful models were obtained, representing both normal and abnormal physiological states. Model variation across time and across different levels of temporal abstraction and fault tolerance is explored. The assumption made by many previous workers that the abstraction of examples from data can be separated from the learning task is not supported by this study. Firstly, the effects of noise in the numerical data manifest themselves in the qualitative examples. Secondly, the models learned are directly dependent on the initial qualitative abstraction chosen.
Interest in research into knowledge discovery in databases (KDD) has been growing continuously because of the rapid increase in the amount of information embedded in real-world data. Several systems have been proposed...
详细信息
Interest in research into knowledge discovery in databases (KDD) has been growing continuously because of the rapid increase in the amount of information embedded in real-world data. Several systems have been proposed for studying the KDD process. One main task in a KDD system is to learn important and user-interesting knowledge from a set of collected data. Most proposed systems use simple machine learning methods to learn the pattern. This may result in efficient performance but the discovery quality is less useful. In this paper, we propose a method to integrated a new and complicated machine learning method called inductive logic programming (ILP) to improve the KDD quality. Such integration shows how this new learning technique can be easily applied to a KDD system and how it can improve the representation of the discovery. In our system, we use user's queries to indicate the importance and interestingness of the target knowledge. The system has been implemented on a SUN workstation using the Sybase database system. Detailed examples are also provided to illustrate the benefit of integrating the ILP technique with the KDD system.
暂无评论