The similarity measures used in first-order IBL so far have been limited to the function-free case. In this paper we show that a lot of power can be gained by allowing lists and other terms in the input representation...
详细信息
The similarity measures used in first-order IBL so far have been limited to the function-free case. In this paper we show that a lot of power can be gained by allowing lists and other terms in the input representation and designing similarity measures that work directly on these structures. We present an improved similarity measure for the first-order instance-based learner RIBL that employs the concept of edit distances to efficiently compute distances between lists and terms, discuss its computational and formal properties, and empirically demonstrate its additional power on a problem from the domain of biochemistry. The paper also includes a thorough reconstruction of RIBL'S overall algorithm.
This is a review paper, whose goal is to significantly improve our understanding of the crucial role of attribute interaction in data mining. The main contributions of this paper are as follows. Firstly, we show that ...
详细信息
This is a review paper, whose goal is to significantly improve our understanding of the crucial role of attribute interaction in data mining. The main contributions of this paper are as follows. Firstly, we show that the concept of attribute interaction has a crucial role across different kinds of problem in data mining, such as attribute construction, coping with small disjuncts, induction of first-order logic rules, detection of Simpson's paradox, and finding several types of interesting rules. Hence, a better understanding of attribute interaction can lead to a better understanding of the relationship between these kinds of problems, which are usually studied separately from each other. Secondly, we draw attention to the fact that most rule induction algorithms are based on a greedy search which does not cope well with the problem of attribute interaction, and point out some alternative kinds of rule discovery methods which tend to cope better with this problem. Thirdly, we discussed several algorithms and methods for discovering interesting knowledge that, implicitly or explicitly, are based on the concept of attribute interaction.
This paper presents a method for approximate match of first-order rules with unseen data. The method is useful especially in case of a multi-class problem or a noisy domain where unseen data are often not covered by t...
详细信息
This paper presents a method for approximate match of first-order rules with unseen data. The method is useful especially in case of a multi-class problem or a noisy domain where unseen data are often not covered by the rules. Our method employs the Backpropagation Neural Network for the approximation. To build the network, we propose a technique for generating features from the rules to be used as inputs to the network. Our method has been evaluated on four domains of first-order learning problems. The experimental results show improvements of our method over the use of the original rules. We also applied our method to approximate match of propositional rules converted from an unpruned decision tree. In this case, our method can be thought of as soft-pruning of the decision tree. The results on multi-class learning domains in the UCI repository of machine learning databases show that our method performs better than standard C4.5's pruned and unpruned trees.
Data mining techniques are becoming increasingly important in chemistry as databases become too large to examine manually. Data mining methods from the field of inductive logic programming (ILP) have potential advanta...
详细信息
Data mining techniques are becoming increasingly important in chemistry as databases become too large to examine manually. Data mining methods from the field of inductive logic programming (ILP) have potential advantages for structural chemical data. In this paper we present Warmr, the first ILP data mining algorithm to be applied to chemoinformatic data. We illustrate the value of Warmr by applying it to a well studied database of chemical compounds tested for carcinogenicity in rodents. Data mining was used to find all frequent substructures in the database, and knowledge of these frequent substructures is shown to add value to the database. One use of the frequent substructures was to convert them into probabilistic prediction rules relating compound description to carcinogenesis. These rules were found to be accurate on test data, and to give some insight into the relationship between structure and activity in carcinogenesis. The substructures were also used to prove that there existed no accurate rule, based purely on atom-bond substructure with less than seven conditions, that could predict carcinogenicity. This results put a lower bound on the complexity of the relationship between chemical structure and carcinogenicity. Only by using a data mining algorithm, and by doing a complete search, is it possible to prove such a result. Finally the frequent substructures were shown to add value by increasing the accuracy of statistical and machine learning programs that were trained to predict chemical carcinogenicity. We conclude that Warmr, and ILP data mining methods generally, are an important new tool for analysing chemical databases.
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation lan...
详细信息
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Q-functions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement learning. In particular, relational reinforcement learning allows us to employ structural representations, to abstract from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.
作者:
Horváth, TTurán, CGMD AiS
Inst Autonomous Intelligent Syst German Natl Res Ctr Informat Technol D-53754 St Augustin Germany Univ Illinois
Dept Math Stat & Comp Sci Chicago IL 60607 USA Hungarian Acad Sci
Res Grp Artificial Intelligence Szeged Hungary
The efficient learnability of restricted classes of logic programs is studied in the PAC framework of computational learning theory, We develop the product homomorphism method, which gives polynomial PAC learning algo...
详细信息
The efficient learnability of restricted classes of logic programs is studied in the PAC framework of computational learning theory, We develop the product homomorphism method, which gives polynomial PAC learning algorithms for a nonrecursive Horn clause with function-free ground background knowledge, if the background knowledge satisfies some structural properties. The method is based on a characterization of the concept that corresponds to the relative least general generalization of a set of positive examples with respect to the background knowledge. The characterization is formulated in terms of products and homomorphisms. In the applications this characterization is turned into an explicit combinatorial description, which is then translated into the language of nonrecursive Horn clauses, We show that a nonrecursive Horn clause is polynomially PAC-learnable if there is a single binary background predicate and the ground atoms in the background knowledge form a forest. If the ground atoms in the background knowledge form a disjoint union of cycles then the situation is different, as the shortest consistent hypothesis may have exponential size. In this case polynomial PAC-learnability holds if a different representation language is used. We also consider the complexity of hypothesis finding for multiple clauses in some restricted cases. (C) 2001 Elsevier Science B,V. All rights reserved.
We present a case of primary hyperparathyroidism with severe hypercalcaemia, treated successfully with ultrasound (US) guided percutaneous interstitial laser photocoagulation (ILP) of a single parathyroid tumour. To o...
详细信息
We present a case of primary hyperparathyroidism with severe hypercalcaemia, treated successfully with ultrasound (US) guided percutaneous interstitial laser photocoagulation (ILP) of a single parathyroid tumour. To our knowledge, this is the first reported case of ILP applied in primary hyperparathyroidism. US guided thermic tissue coagulation with ILP may be a nonsurgical alternative in patients with symptomatic hypercalcaemia due to a parathyroid tumour when surgery is contraindicated.
This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the Induct...
详细信息
This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the inductive logic programming (ILP) system PROGOL is applied to the problem of identifying potential pharmacophores for ACE inhibition. The case study reported in this paper supports four general lessons for machine learning and knowledge discovery, as well as more specific lessons for pharmacophore discovery, for inductive logic programming, and for ACE inhibition. The general lessons for machine learning and knowledge discovery are as follows. 1. An initial rediscovery step is a useful tool when approaching a new application domain. 2. General machine learning heuristics may fail to match the derails of an application domain, but it may be possible to successfully apply a heuristic-based algorithm in spite of the mismatch. 3. A complete search for all plausible hypotheses can provide useful information to a user, although experimentation may be required to choose between competing hypotheses. 4. A declarative knowledge representation facilitates the development and debugging of background knowledge in collaboration with a domain expert, as well as the communication of final results.
By analysing sequences of actions performed by a user, one can find frequent subsequences that can be suggested as macro (script) definitions. However, often these 'actions' have additional features. In this p...
详细信息
ISBN:
(纸本)3540423257
By analysing sequences of actions performed by a user, one can find frequent subsequences that can be suggested as macro (script) definitions. However, often these 'actions' have additional features. In this paper we combine an algorithm to detect frequent subsequences with an inductive logic programming system to automatically generate for each frequent subsequence the most specific 'template' for these additional features that is consistent with the observed frequent subsequences. The resulting system is implemented and used in an application where we automatically generate macros from logs of the use of a Unix command shell.
In this paper, we present a learning simulator consisting of an interface, an inference engine and an inductive logic programming(ILP) system. Possible usage of the simulator includes to check the behavior of CAI syst...
详细信息
ISBN:
(纸本)0780371011
In this paper, we present a learning simulator consisting of an interface, an inference engine and an inductive logic programming(ILP) system. Possible usage of the simulator includes to check the behavior of CAI systems, to be adopted as a novice agent in CAI systems, and for a teacher to check contents for study by observing the response of the simulator. The learning simulator learns interactively. First, a teacher ( or a CAI system) gives background knowledge as basic rules and examples. Next, a teacher asks the simulator some question. Using the background knowledge, rules generated by ILP and examples stored in the memory, the simulator answers the question. Then, the simulator stores the examples and updates rules for the next question, after the teacher tells the correct answer. We implement the simulator, using a Prolog interpreter and ILP system FOIL. We show learning results obtained through computer simulations.
暂无评论