In inductive logic programming (ILP), algorithms that are purely of the bottom-up or top-down type encounter several problems in practice. Since a majority of them are greedy ones, these algorithms stop when finding c...
详细信息
In inductive logic programming (ILP), algorithms that are purely of the bottom-up or top-down type encounter several problems in practice. Since a majority of them are greedy ones, these algorithms stop when finding clauses in local optima, according to the "quality" measure used for evaluating the results. Moreover, when learning clauses one by one, the induced clauses become less and less interesting as the algorithm is progressing to cover few remaining examples. In this paper, we propose a simulated annealing framework to overcome these problems. Using a refinement operator, we define neighborhood relations on clauses and on hypotheses (i.e. sets of clauses). With these relations and appropriate quality measures, we show how to induce clauses (in a coverage approach), or to induce hypotheses directly by using simulated annealing algorithms. We discuss the necessary conditions on the refinement operators and the evaluation measures to increase the effectiveness of the algorithm. Implementations (included a parallelized version of the algorithm) are described and experimentation results in terms of convergence of the method and in terms of accuracy are presented. (c) 2007 Elsevier Inc. All rights reserved.
This paper presents four novel approaches to enhance efficiency and effectiveness of inductive logic programming (ILP) systems, along with their implementation in a new ILP system, called TWEETY. The proposed approach...
详细信息
This paper presents four novel approaches to enhance efficiency and effectiveness of inductive logic programming (ILP) systems, along with their implementation in a new ILP system, called TWEETY. The proposed approaches include (1) a new declaration mechanism, called connection declarations, for bottom clause construction, which is simpler but more expressive than the commonly used mode declarations;(2) a new covering technique, called super_covering, which reduces the examples in such a way that recursion can be learned, independently from the ordering of the examples;(3) a new search heuristics, called neg_coverage heuristics, which guides the search using only the number of negative examples covered by each hypothesis and (4) a new search algorithm, called doubly_guided_search, which searches for best clauses by alternating the use of two search heuristics, i.e. the traditional coverage search heuristics and the new neg_coverage search heuristics. The TWEETY system is shown to be more effective and efficient than the state-of-the-art ILP system ALEPH;the proposed techniques can be used to enhance efficiency and effectiveness of ALEPH and other systems based on the same ILP principles.
Statistical relational learning (SRL) addresses one of the central open questions of AI: the combination of relational or first-order logic with principled probabilistic and statistical approaches to inference and lea...
详细信息
Statistical relational learning (SRL) addresses one of the central open questions of AI: the combination of relational or first-order logic with principled probabilistic and statistical approaches to inference and learning. This thesis approaches SRL from an inductive logic programming (ILP) perspective and starts with developing a general framework for SRL: probabilistic ILP. Based on this foundation, the thesis shows how to incorporate the logical concepts of objects and relations among these objects into Bayesian networks. As time and actions are not just other relations, it afterwards develops approaches to probabilistic ILP over time and for making complex decision in relational domains. Finally, it is shown that SRL approaches naturally yield kernels for structured data. The resulting approaches are illustrated using examples from genetics, bioinformatics, and planning domains.
Pancreatic cancer is a devastating disease and predicting the status of the patients becomes an important and urgent issue. The authors explore the applicability of inductive logic programming (ILP) method in the dise...
详细信息
Pancreatic cancer is a devastating disease and predicting the status of the patients becomes an important and urgent issue. The authors explore the applicability of inductive logic programming (ILP) method in the disease and show that the accumulated clinical laboratory data can be used to predict disease characteristics, and this will contribute to the selection of therapeutic modalities of pancreatic cancer. The availability of a large amount of clinical laboratory data provides clues to aid in the knowledge discovery of diseases. In predicting the differentiation of tumour and the status of lymph node metastasis in pancreatic cancer, using the ILP model, three rules are developed that are consistent with descriptions in the literature. The rules that are identified are useful to detect the differentiation of tumour and the status of lymph node metastasis in pancreatic cancer and therefore contributed significantly to the decision of therapeutic strategies. In addition, the proposed method is compared with the other typical classification techniques and the results further confirm the superiority and merit of the proposed method.
Three relevant areas of interest in symbolic Machine Learning are incremental supervised learning, multistrategy learning and predicate invention. In many real-world tasks, new observations may point out the inadequac...
详细信息
Three relevant areas of interest in symbolic Machine Learning are incremental supervised learning, multistrategy learning and predicate invention. In many real-world tasks, new observations may point out the inadequacy of the learned model. In such a case, incremental approaches allow to adjust it, instead of learning a new model from scratch. Specifically, when a negative example is wrongly classified by a model, specialization refinement operators are needed. A powerful way to specialize a theory in inductive logic programming is adding negated preconditions to concept definitions. This paper describes an empowered specialization operator that allows to introduce the negation of conjunctions of preconditions using predicate invention. An implementation of the operator is proposed, and experiments purposely devised to stress it prove that the proposed approach is correct and viable even under quite complex conditions.
In recent research, human-understandable explanations of machine learning models have received a lot of attention. Often explanations are given in form of model simplifications or visualizations. However, as shown in ...
详细信息
In recent research, human-understandable explanations of machine learning models have received a lot of attention. Often explanations are given in form of model simplifications or visualizations. However, as shown in cognitive science as well as in early AI research, concept understanding can also be improved by the alignment of a given instance for a concept with a similar counterexample. Contrasting a given instance with a structurally similar example which does not belong to the concept highlights what characteristics are necessary for concept membership. Such near misses have been proposed by Winston (Learning structural descriptions from examples, 1970) as efficient guidance for learning in relational domains. We introduce an explanation generation algorithm for relational concepts learned with inductive logic programming (GeNME). The algorithm identifies near miss examples from a given set of instances and ranks these examples by their degree of closeness to a specific positive instance. A modified rule which covers the near miss but not the original instance is given as an explanation. We illustrate GeNME with the well-known family domain consisting of kinship relations, the visual relational Winston arches domain, and a real-world domain dealing with file management. We also present a psychological experiment comparing human preferences of rule-based, example-based, and near miss explanations in the family and the arches domains.
In the domain of crystal engineering, various schemes have been proposed for the classification of hydrogen bonding (H-bonding) patterns observed in 3D crystal structures. In this study, the aim is to complement these...
详细信息
In the domain of crystal engineering, various schemes have been proposed for the classification of hydrogen bonding (H-bonding) patterns observed in 3D crystal structures. In this study, the aim is to complement these schemes with rules that predict H-bonding in crystals from 2D structural information only. Modern computational power and the advances in inductive logic programming (ILP) can now provide computational chemistry with the opportunity for extracting structure-specific rules from large databases that can be incorporated into expert systems. ILP technology is here applied to H-bonding in crystals to develop a self-extracting expert system utilizing data in the Cambridge Structural Database of small molecule crystal structures. A clear increase in performance was observed when the ILP system DMAX was allowed to refer to the local structural environment of the possible H-bond donor/acceptor pairs. This ability distinguishes ILP from more traditional approaches that build rules on the basis of global molecular properties.
inductive logic programming (ILP) is a study of machine learning systems that use clausal theories in first-order logic as a representation language. In this paper, we survey theoretical foundations of ILP from the vi...
详细信息
inductive logic programming (ILP) is a study of machine learning systems that use clausal theories in first-order logic as a representation language. In this paper, we survey theoretical foundations of ILP from the viewpoints of logic of Discovery and Machine Learning, and try to unify these two views with the support of the modern theory of logicprogramming. Firstly, we define several hypothesis construction methods in ILP and give their proof-theoretic foundations by treating then as a procedure which complete incomplete proofs. Next, we discuss the design of individual learning algorithms using these hypothesis construction methods. We review known results on learning logic programs in computational learning theory, and show that these algorithms are instances of a generic learning strategy with proof completion methods.
When comparing inductive logic programming (ILP) and attribute-value learning techniques, there is a trade-off between expressive power and efficiency. inductive logic programming techniques are typically more express...
详细信息
When comparing inductive logic programming (ILP) and attribute-value learning techniques, there is a trade-off between expressive power and efficiency. inductive logic programming techniques are typically more expressive but also less efficient. Therefore, the data sets handled by current inductive logic programming systems are small according to general standards within the data mining community. The main source of inefficiency lies in the assumption that several examples may be related to each other, so they cannot be handled independently. Within the learning from interpretations framework for inductive logic programming this assumption is unnecessary, which allows to scale up existing ILP algorithms. In this paper we explain this learning setting in the context of relational databases. We relate the setting to propositional data mining and to the classical ILP setting, and show that learning from interpretations corresponds to learning from multiple relations and thus extends the expressiveness of propositional learning, while maintaining its efficiency to a large extent (which is not the case in the classical ILP setting). As a case study, we present two alternative implementations of the ILP system TILDE (Top-down Induction of logical DEcision trees): TILDEclassic, which loads all data in main memory, and TILDELDS, which loads the examples one by one. We experimentally compare the implementations, showing TILDELDS can handle large data sets (in the order of 100,000 examples or 100 MB) and indeed scales up linearly in the number of examples.
Introducing fuzzy predicates in inductive logic programming may serve two different purposes: allowing for more adaptability when learning classical rules or getting more expressivity by learning fuzzy rules. This lat...
详细信息
Introducing fuzzy predicates in inductive logic programming may serve two different purposes: allowing for more adaptability when learning classical rules or getting more expressivity by learning fuzzy rules. This latter concern is the topic of this paper. Indeed, introducing fuzzy predicates in the antecedent and in the consequent of rules may convey different non-classical meanings. The paper focuses on the learning of gradual and certainty rules, which have an increased expressive power and have no simple crisp counterpart. The benefit and the application domain of each kind of rules are discussed. Appropriate confidence degrees for each type of rules are introduced. These confidence degrees play a major role in the adaptation of the classical FOIL inductive logic programming algorithm to the induction of fuzzy rules for guiding the learning process. The method is illustrated on a benchmark example and a case-study database.
暂无评论