In this paper we present the work in progress on LogCHEM, an ILP based tool for discriminative interactive mining of chemical fragments. In particular, we describe the integration with a molecule visualisation softwar...
详细信息
ISBN:
(纸本)9783642024801
In this paper we present the work in progress on LogCHEM, an ILP based tool for discriminative interactive mining of chemical fragments. In particular, we describe the integration with a molecule visualisation software that allows the chemist to graphically control the search for interesting patterns in chemical fragments. Furthermore, we show how structured information, such as rings, functional groups like carboxyl, amine, methyl, ester, etc are integrated and exploited in LogCHEM.
Several learning systems based on Inverse Entailment (IE) have been proposed, some that compute single clause hypotheses, exemplified by Progol, and others that, produce multiple clauses in response to a single seed e...
详细信息
ISBN:
(纸本)9783642042379
Several learning systems based on Inverse Entailment (IE) have been proposed, some that compute single clause hypotheses, exemplified by Progol, and others that, produce multiple clauses in response to a single seed example. A common denominator of these systems, is a restricted hypothesis search space, within which each clause must individually explain some example E, or some member of an abductive explanation for E. This paper proposes a new IE approach, called Induction on Failure (IoF), that generalises existing Horn clause learning systems by allowing the computation of hypotheses within a larger search space, namely that of Connected Theories. A proof procedure for IoF is proposed that generalises existing IE systems and also resolves Yamamoto's example. A prototype implementation is also described. Finally, a semantics is presented called Connected Theory Generalisation, which is proved to extend Kernel Set Subsumption and to include hypotheses constructed within this new IoF approach.
Introduction: Information extraction (IE) systems have been proposed in recent years to extract genic interactions from bibliographical resources. They are limited to single interaction relations, and have to face a t...
详细信息
Introduction: Information extraction (IE) systems have been proposed in recent years to extract genic interactions from bibliographical resources. They are limited to single interaction relations, and have to face a trade-off between recall and precision, by focusing either on specific interactions (for precision), or general and unspecified interactions of biological entities (for recall). Yet, biologists need to process more complex data from literature, in order to study biological pathways. An ontology is an adequate formal representation to model this sophisticated knowledge. However, the tight integration of IE systems and ontologies is still a current research issue, a fortiori with complex ones that go beyond hierarchies. Method: We propose a rich modeling of genic interactions with an ontology, and show how it can be used within an IE system. The ontology is seen as a language specifying a normalized representation of text. First, IE is performed by extracting instances from natural language processing (NLP) modules. Then, deductive inferences on the ontology language are completed, and new instances are derived from previously extracted ones. Inference rules are learnt with an inductive logic programming (ILP) algorithm, using the ontology as the hypothesis language, and its instantiation on an annotated corpus as the example language. Learning is set in a multi-class setting to deal with the multiple ontological relations. Results: We validated our approach on an annotated corpus of gene transcription regulations in the Bacillus subtilis bacterium. We reach a global recall of 89.3% and a precision of 89.6%, with high scores for the ten semantic relations defined in the ontology. (C) 2009 Elsevier B.V. All rights reserved.
logic Programs with Annotated Disjunctions (LPADs) provide a simple and elegant framework for representing probabilistic knowledge in logicprogramming. In this paper we consider the problem of learning ground LPADs s...
详细信息
logic Programs with Annotated Disjunctions (LPADs) provide a simple and elegant framework for representing probabilistic knowledge in logicprogramming. In this paper we consider the problem of learning ground LPADs starting from a set of interpretations annotated with their probability. We present the system ALLPAD for solving this problem. ALLPAD modifies the previous system LLPAD in order to tackle real world learning problems more effectively. This is achieved by looking for an approximate solution rather than a perfect one. A number of experiments have been performed on real and artificial data for evaluating ALLPAD, showing the feasibility of the approach.
The robot described in this paper learns words that relate to objects and their attributes, and also learns concepts, which may be recursive, that involve relationships between several objects. Once the system is expl...
详细信息
The robot described in this paper learns words that relate to objects and their attributes, and also learns concepts, which may be recursive, that involve relationships between several objects. Once the system is explicitly taught some words by a human teacher it finds new objects that might help to refine its concepts. Once it has found a new object, it tries to generalise its concepts to include the new object and asks the teacher for feedback. The robot learns further properties of objects by interacting with them, by touching them or walking around them to gain a new perspective. The system learns semantic knowledge front spoken interactions, using speech recognition and generation, motion segmentation, feature extraction from images using Ripple Down Rules and generalisation using inductive logic programming. (C) 2008 Elsevier B.V. All rights reserved.
The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementations and applications. More recently there has be...
详细信息
The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementations and applications. More recently there has been an increased emphasis on Probabilistic ILP and the related fields of Statistical Relational Learning (SRL) and Structured Prediction. The goal of the current paper is to consider these emerging trends and chart out the strategic directions and open problems for the broader area of structured machine learning for the next 10 years.
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networ...
详细信息
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networks. In this paper we show how to upgrade another algorithm for learning Bayesian networks, namely ordering-search. For Bayesian networks, ordering-search was found to work better than structure-search. It is non-obvious that these results carry over to the relational case, however, since there ordering-search needs to be implemented quite differently. Hence, we perform an experimental comparison of these upgraded algorithms on four relational domains. We conclude that also in the relational case ordering-search is competitive with structure-search in terms of quality of the learned models, while ordering-search is significantly faster.
Sequential data represent an important source of potentially new medical knowledge. However, this type of data is rarely provided in a format suitable for immediate application of conventional mining algorithms. This ...
详细信息
Sequential data represent an important source of potentially new medical knowledge. However, this type of data is rarely provided in a format suitable for immediate application of conventional mining algorithms. This paper summarizes and compares three different sequential mining approaches based, respectively, on windowing, episode rules, and inductive logic programming. Windowing is one of the essential methods of data preprocessing. Episode rules represent general sequential mining, while inductive logic programming extracts first-order features whose structure is determined by background knowledge. The three approaches are demonstrated and evaluated in terms of a case study STULONG. It is a longitudinal preventive study of atherosclerosis where the data consist of a series of long-term observations recording the development of risk factors and associated conditions. The intention is to identify frequent sequential/temporal patterns. Possible relations between the patterns and an onset of any of the observed cardiovascular diseases are also studied.
The issue addressed in this paper concerns the discovery of frequent multi-dimensional patterns from relational sequences. The great variety of applications of sequential pattern mining, such as user profiling, medici...
详细信息
The issue addressed in this paper concerns the discovery of frequent multi-dimensional patterns from relational sequences. The great variety of applications of sequential pattern mining, such as user profiling, medicine, local weather forecast and bioinformatics, makes this problem one of the central topics in data mining. Nevertheless, sequential information may concern data on multiple dimensions and, hence, the mining of sequential patterns from multi-dimensional information results very important. In a multi-dimensional sequence each event depends on more than one dimension, such as in spatio-temporal sequences where an event may be spatially or temporally related to other events. In literature, the multi-relational data mining approach has been successfully applied to knowledge discovery from complex data. However, there exists no contribution to manage the general case of multi-dimensional data in which, for example, spatial and temporal information may co-exist. This work takes into account the possibility to mine complex patterns, expressed in a first-order language, in which events may occur along different dimensions. Specifically, multidimensional patterns are defined as a set of atomic first-order formulae in which events are explicitly represented by a variable and the relations between events are represented by a set of dimensional predicates. A complete framework and an inductive logic programming algorithm to tackle this problem are presented along with some experiments on artificial and real multi-dimensional sequences proving its effectiveness.
This paper presents a method that uses gene ontologies (GOs), together with the paradigm of relational subgroup discovery, to find compactly described groups of genes differentially expressed in specific cancers. The ...
详细信息
This paper presents a method that uses gene ontologies (GOs), together with the paradigm of relational subgroup discovery, to find compactly described groups of genes differentially expressed in specific cancers. The groups are described by means of relational logic features, extracted from publicly available GO information, and are straightforwardly interpretable by medical experts. We applied the proposed method to three gene expression data sets with the following respective sets of sample classes: 1) acute lymphoblastic leukemia (ALL) versus acute myeloid leukemia (AML);2) seven subtypes of ALL;and 3) 14 different types of cancers. Significant number of discovered groups of genes had a description that highlighted the underlying biological process responsible for distinguishing one class from the other classes. The quality of the discovered descriptions was also verified by cross validation. We believe that the. presented approach will significantly contribute to the application of relational machine learning to gene expression analysis, given the expected increase in both the quality and quantity of gene/protein annotations in the, near future.
暂无评论