logic Programs with Annotated Disjunctions (LPADs) provide a simple and elegant framework for representing probabilistic knowledge in logicprogramming. In this paper we consider the problem of learning ground LPADs s...
详细信息
logic Programs with Annotated Disjunctions (LPADs) provide a simple and elegant framework for representing probabilistic knowledge in logicprogramming. In this paper we consider the problem of learning ground LPADs starting from a set of interpretations annotated with their probability. We present the system ALLPAD for solving this problem. ALLPAD modifies the previous system LLPAD in order to tackle real world learning problems more effectively. This is achieved by looking for an approximate solution rather than a perfect one. A number of experiments have been performed on real and artificial data for evaluating ALLPAD, showing the feasibility of the approach.
The robot described in this paper learns words that relate to objects and their attributes, and also learns concepts, which may be recursive, that involve relationships between several objects. Once the system is expl...
详细信息
The robot described in this paper learns words that relate to objects and their attributes, and also learns concepts, which may be recursive, that involve relationships between several objects. Once the system is explicitly taught some words by a human teacher it finds new objects that might help to refine its concepts. Once it has found a new object, it tries to generalise its concepts to include the new object and asks the teacher for feedback. The robot learns further properties of objects by interacting with them, by touching them or walking around them to gain a new perspective. The system learns semantic knowledge front spoken interactions, using speech recognition and generation, motion segmentation, feature extraction from images using Ripple Down Rules and generalisation using inductive logic programming. (C) 2008 Elsevier B.V. All rights reserved.
The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementations and applications. More recently there has be...
详细信息
The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementations and applications. More recently there has been an increased emphasis on Probabilistic ILP and the related fields of Statistical Relational Learning (SRL) and Structured Prediction. The goal of the current paper is to consider these emerging trends and chart out the strategic directions and open problems for the broader area of structured machine learning for the next 10 years.
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networ...
详细信息
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networks. In this paper we show how to upgrade another algorithm for learning Bayesian networks, namely ordering-search. For Bayesian networks, ordering-search was found to work better than structure-search. It is non-obvious that these results carry over to the relational case, however, since there ordering-search needs to be implemented quite differently. Hence, we perform an experimental comparison of these upgraded algorithms on four relational domains. We conclude that also in the relational case ordering-search is competitive with structure-search in terms of quality of the learned models, while ordering-search is significantly faster.
Sequential data represent an important source of potentially new medical knowledge. However, this type of data is rarely provided in a format suitable for immediate application of conventional mining algorithms. This ...
详细信息
Sequential data represent an important source of potentially new medical knowledge. However, this type of data is rarely provided in a format suitable for immediate application of conventional mining algorithms. This paper summarizes and compares three different sequential mining approaches based, respectively, on windowing, episode rules, and inductive logic programming. Windowing is one of the essential methods of data preprocessing. Episode rules represent general sequential mining, while inductive logic programming extracts first-order features whose structure is determined by background knowledge. The three approaches are demonstrated and evaluated in terms of a case study STULONG. It is a longitudinal preventive study of atherosclerosis where the data consist of a series of long-term observations recording the development of risk factors and associated conditions. The intention is to identify frequent sequential/temporal patterns. Possible relations between the patterns and an onset of any of the observed cardiovascular diseases are also studied.
The issue addressed in this paper concerns the discovery of frequent multi-dimensional patterns from relational sequences. The great variety of applications of sequential pattern mining, such as user profiling, medici...
详细信息
The issue addressed in this paper concerns the discovery of frequent multi-dimensional patterns from relational sequences. The great variety of applications of sequential pattern mining, such as user profiling, medicine, local weather forecast and bioinformatics, makes this problem one of the central topics in data mining. Nevertheless, sequential information may concern data on multiple dimensions and, hence, the mining of sequential patterns from multi-dimensional information results very important. In a multi-dimensional sequence each event depends on more than one dimension, such as in spatio-temporal sequences where an event may be spatially or temporally related to other events. In literature, the multi-relational data mining approach has been successfully applied to knowledge discovery from complex data. However, there exists no contribution to manage the general case of multi-dimensional data in which, for example, spatial and temporal information may co-exist. This work takes into account the possibility to mine complex patterns, expressed in a first-order language, in which events may occur along different dimensions. Specifically, multidimensional patterns are defined as a set of atomic first-order formulae in which events are explicitly represented by a variable and the relations between events are represented by a set of dimensional predicates. A complete framework and an inductive logic programming algorithm to tackle this problem are presented along with some experiments on artificial and real multi-dimensional sequences proving its effectiveness.
This paper presents a method that uses gene ontologies (GOs), together with the paradigm of relational subgroup discovery, to find compactly described groups of genes differentially expressed in specific cancers. The ...
详细信息
This paper presents a method that uses gene ontologies (GOs), together with the paradigm of relational subgroup discovery, to find compactly described groups of genes differentially expressed in specific cancers. The groups are described by means of relational logic features, extracted from publicly available GO information, and are straightforwardly interpretable by medical experts. We applied the proposed method to three gene expression data sets with the following respective sets of sample classes: 1) acute lymphoblastic leukemia (ALL) versus acute myeloid leukemia (AML);2) seven subtypes of ALL;and 3) 14 different types of cancers. Significant number of discovered groups of genes had a description that highlighted the underlying biological process responsible for distinguishing one class from the other classes. The quality of the discovered descriptions was also verified by cross validation. We believe that the. presented approach will significantly contribute to the application of relational machine learning to gene expression analysis, given the expected increase in both the quality and quantity of gene/protein annotations in the, near future.
ProbLog is a recently introduced probabilistic extension of Prolog (De Raedt, et al. in Proceedings of the 20th international joint conference on artificial intelligence, pp. 2468-2473, 2007). A ProbLog program define...
详细信息
ProbLog is a recently introduced probabilistic extension of Prolog (De Raedt, et al. in Proceedings of the 20th international joint conference on artificial intelligence, pp. 2468-2473, 2007). A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defined by the success probability of a query in a randomly sampled program. This paper introduces the theory compression task for ProbLog, which consists of selecting that subset of clauses of a given ProbLog program that maximizes the likelihood w.r.t. a set of positive and negative examples. Experiments in the context of discovering links in real biological networks demonstrate the practical applicability of the approach.
Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in o...
详细信息
Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. Methods: In this paper, we describe the use and the customization of inductive logic programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Results: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. Conclusion: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.
It is well-known that heuristic search in ILP is prone to plateau phenomena. An explanation can be given after the work of Giordana and Saitta: the ILP covering test is NP-complete and therefore exhibits a sharp phase...
详细信息
It is well-known that heuristic search in ILP is prone to plateau phenomena. An explanation can be given after the work of Giordana and Saitta: the ILP covering test is NP-complete and therefore exhibits a sharp phase transition in its coverage probability. As the heuristic value of a hypothesis depends on the number of covered examples, the regions "yes" and "no" represent plateaus that need to be crossed during search without an informative heuristic value. Several subsequent works have extensively studied this finding by running several learning algorithms on a large set of artificially generated problems and argued that the occurrence of this phase transition dooms every learning algorithm to fail to identify the target concept. We note however that only generate-and-test learning algorithms have been applied and that this conclusion has to be qualified in the case of data-driven learning algorithms. Mostly building on the pioneering work of Winston on near-miss examples, we show that, on the same set of problems, a top-down data-driven strategy can cross any plateau if near-misses are supplied in the training set, whereas they do not change the plateau profile and do not guide a generate-and-test strategy. We conclude that the location of the target concept with respect to the phase transition alone is not a reliable indication of the learning problem difficulty as previously thought.
暂无评论