Most current applications of inductive learning in databases take place in the context of a single extensional relation. This paper puts inductive learning in the context of a set of relations defined either extension...
详细信息
Most current applications of inductive learning in databases take place in the context of a single extensional relation. This paper puts inductive learning in the context of a set of relations defined either extensionally or intentionally in the framework of deductive databases. It presents LINUS, an inductive logic programming system that induces virtual relations from example positive and negative tuples and already defined relations in a deductive database. Based on the idea of transforming the problem of learning relations to attribute-value form, it incorporates several attribute-value learning systems. As the latter handle noisy data successfully, LINUS is able to learn relations from real life noisy databases. The paper illustrates the use of LINUS for learning virtual relations and then presents a study of its performance on noisy data.
Time plays an important role in the vast majority of problems and, as such, it is a vital issue to be considered when developing computer systems for solving problems. In the literature, one of the most influential fo...
详细信息
Time plays an important role in the vast majority of problems and, as such, it is a vital issue to be considered when developing computer systems for solving problems. In the literature, one of the most influential formalisms for representing time is known as Allen's Temporal Algebra based on a set of 13 relations (basic and reversed) that may hold between two time intervals. In spite of having a few drawbacks and limitations, Allen's formalism is still a convenient representation due to its simplicity and implementability and also, due to the fact that it has been the basis of several extensions. This paper explores the automatic learning of Allen's temporal relations by the inductive logic programming system FOIL, taking into account two possible representations for a time interval: (i) as a primitive concept and (ii) as a concept defined by the primitive concept of time point. The goals of the experiments described in the paper are (1) to explore the viability of both representations for use in automatic learning;(2) compare the facility and interpretability of the results;(3) evaluate the impact of the given examples for inducing a proper representation of the relations and (4) experiment with both representations under the assumption of a closed world (CWA), which would ease continuous learning using FOIL. Experimental results are presented and discussed as evidence that the CWA can be a convenient strategy when learning Allen's temporal relations.
Scientists form hypotheses and experimentally test them. If a hypothesis fails (is refuted), scientists try to explain the failure to eliminate other hypotheses. The more precise the failure analysis the more hypothes...
详细信息
Scientists form hypotheses and experimentally test them. If a hypothesis fails (is refuted), scientists try to explain the failure to eliminate other hypotheses. The more precise the failure analysis the more hypotheses can be eliminated. Thus inspired, we introduce failure explanation techniques for inductive logic programming. Given a hypothesis represented as a logic program, we test it on examples. If a hypothesis fails, we explain the failure in terms of failing sub-programs. In case a positive example fails, we identify failing sub-programs at the granularity of literals. We introduce a failure explanation algorithm based on analysing branches of SLD-trees. We integrate a meta-interpreter based implementation of this algorithm with the test-stage of the Popper ILP system. We show that fine-grained failure analysis allows for learning fine-grained constraints on the hypothesis space. Our experimental results show that explaining failures can drastically reduce hypothesis space exploration and learning times.
Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use o...
详细信息
Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an inductive logic programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class inductive logic programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms.
As a form of Machine Learning the study of inductive logic programming (ILP) is motivated by a central belief: relational description languages are better tin terms of accuracy and understandability) than propositiona...
详细信息
As a form of Machine Learning the study of inductive logic programming (ILP) is motivated by a central belief: relational description languages are better tin terms of accuracy and understandability) than propositional ones for certain real-world applications. This claim is investigated here for a particular application in structural molecular biology, that of constructing readable descriptions of the major protein folds. To the authors' knowledge Machine Learning has not previously been applied systematically to this task. In this application, the domain expert (third author) identified a natural divide between essentially propositional features and more structurally-oriented relational ones. The following null hypotheses are tested: 1) for a given ILP system (Progol) provision of relational background knowledge does not increase predictive accuracy, 2) a good propositional learning system (C5.0) without relational background knowledge will outperform Progol with relational background knowledge, 3) relational background knowledge does not produce improved explanatory insight. Null hypotheses 1) and 2) are both refuted on cross-validation results carried out over 20 of the most populated protein folds. Hypothesis 3 is refuted by demonstration of various insightful rules discovered only in the relationally-oriented learned rules.
Pre-pruning and Post-pruning are two standard techniques for handling noise in decision tree learning. Pre-pruning deals with noise during learning, while post-pruning addresses this problem after an overfitting theor...
详细信息
Pre-pruning and Post-pruning are two standard techniques for handling noise in decision tree learning. Pre-pruning deals with noise during learning, while post-pruning addresses this problem after an overfitting theory has been learned. We first review several adaptations of pre- and post-pruning techniques for separate-and-conquer rule learning algorithms and discuss some fundamental problems. The primary goal of this paper is to show how to solve these problems with two new algorithms that combine and integrate pre- and post-pruning.
We adopt the principal idea from Plotkin's Structural Operational Semantics (SOS), in which computation by a system is to be understood using: (a) a signature of configurations,;(b) a binary relation () defined ov...
详细信息
We adopt the principal idea from Plotkin's Structural Operational Semantics (SOS), in which computation by a system is to be understood using: (a) a signature of configurations,;(b) a binary relation () defined over;and (c) a meta-interpreter for general transition systems, defined at the level and . Using specific definitions for configurations and transition rules, the meta-interpreter generates an operational explanation of a system's behaviour in the form of the stepwise computations (transitions) involved. This setting is of special interest to inductive logic programming (ILP), given recent developments in meta-interpretive learning. We focus here on the specific application of obtaining automatically Petri net models of biological system behaviour. Using a simple logic program as a meta-interpreter with a meta-rule for guarded transitions we show that using definitions of biologically-known transitions, proofs constructed by the meta-interpreter allow us, just as in SOS, to explain system behaviour as stepwise transitions in Petri nets. In the meta-interpretive learning setting, the proofs identify hypotheses that together with the meta-interpreter and domain-knowledge logically entail the observed behaviour. Meta-interpretive learning enables us to go beyond the explanations available in SOS, which are purely deductive, since the meta-interpreter is allowed abductive steps in the proof. This enables us to "invent" transitions which have not been specified in domain-knowledge. We use this facility to deal with noisy data by constructing first a hypothesis that includes abduced transitions, followed by the use of a Viterbi-style computation to find the most likely sequence of transitions for a system with a specified initial and final state. Extensive experiments with some well-known biological systems show that this approach can reliably identify the correct set of transitions even with fairly high levels of noise and with moderate amount of missing values.
We introduce relational redescription mining, that is, the task of finding two structurally different patterns that describe nearly the same set of object pairs in a relational dataset. By extending redescription mini...
详细信息
We introduce relational redescription mining, that is, the task of finding two structurally different patterns that describe nearly the same set of object pairs in a relational dataset. By extending redescription mining beyond propositional and real-valued attributes, it provides a powerful tool to match different relational descriptions of the same concept. We propose an alternating scheme for solving this problem. Its core consists of a novel relational query miner that efficiently identifies discriminative connection patterns between pairs of objects. Compared to a baseline inductive logic programming (ILP) approach, our query miner is able to mine more complex queries, much faster. We performed extensive experiments on three real world relational datasets, and present examples of redescriptions found, exhibiting the power of the method to expressively capture relations present in these networks.
In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with well-established methods in inductive logic programming (ILP) and rule induction to provide ...
详细信息
In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with well-established methods in inductive logic programming (ILP) and rule induction to provide efficient and scalable algorithms for the classification of vast data sets. By construction, these classifications are based on the synthesis of simple rules, thus providing direct explanations of the obtained classifications. Apart from evaluating our approach on the common large scale data sets MNIST, Fashion-MNIST and IMDB, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with Allianz Private Krankenversicherung which is an insurance company offering diverse services in Germany.
Meta-level abduction is a method to abduce missing rules in explaining observations. By representing rule structures of a problem in a form of causal networks, meta-level abduction infers missing links and unknown nod...
详细信息
Meta-level abduction is a method to abduce missing rules in explaining observations. By representing rule structures of a problem in a form of causal networks, meta-level abduction infers missing links and unknown nodes from incomplete networks to complete paths for observations. We examine applicability of meta-level abduction on networks containing both positive and negative causal effects. Such networks appear in many domains including biology, in which inhibitory effects are important in several biological pathways. Reasoning in networks with inhibition involves nonmonotonic inference, which can be realized by making default assumptions in abduction. We show that meta-level abduction can consistently produce both positive and negative causal relations as well as invented nodes. Case studies of meta-level abduction are presented in p53 signaling networks, in which causal relations are abduced to suppress a tumor with a new protein and to stop DNA synthesis when damage has occurred. Effects of our method are also analyzed through experiments of completing networks randomly generated with both positive and negative links.
暂无评论