In real-life domains, learning systems often have to deal with various kinds of imperfections in data such as noise, incompleteness and inexactness. This problem seriously affects the knowledge discovery process, spec...
详细信息
In real-life domains, learning systems often have to deal with various kinds of imperfections in data such as noise, incompleteness and inexactness. This problem seriously affects the knowledge discovery process, specifically in the case of traditional Machine Learning approaches that exploit simple or constrained knowledge representations and are based on single inference mechanisms. Indeed, this limits their capability of discovering fundamental knowledge in those situations. In order to broaden the investigation and the applicability of machine learning schemes in such particular situations, it is necessary to move on to more expressive representations which require more complex inference mechanisms. However, the applicability of such new and complex inference mechanisms, such as abductive reasoning, strongly relies on a deep background knowledge about the specific application domain. This work aims at automatically discovering the meta-knowledge needed to abduction inference strategy to complete the incoming information in order to handle cases of missing knowledge.
Protein-protein interactions (PPIs) are intrinsic to almost all cellular processes. Different computational methods offer new chances to study PPIs. To predict PPIs, while the integrative methods use multiple data sou...
详细信息
Protein-protein interactions (PPIs) are intrinsic to almost all cellular processes. Different computational methods offer new chances to study PPIs. To predict PPIs, while the integrative methods use multiple data sources instead of a single source, the domain-based methods often use only protein domain features. Integration of both protein domain features and genomic/proteomic features from multiple databases can more effectively predict PPIs. Moreover, it allows discovering the reciprocal relationships between PPIs and biological features of their interacting partners. We developed a novel integrative domain-based method for predicting PPIs using inductive logic programming (ILP). Two principal domain features used were domain fusions and domain-domain interactions (DDIs). Various relevant features of proteins were exploited from five popular genomic and proteomic databases. By integrating these features, we constructed biologically significant ILP background knowledge of more than 278,000 ground facts. The experimental results through multiple 10-fold cross-validations demonstrated that our method predicts PPIs better than other computational methods in terms of typical performance measures. The proposed ILP framework can be applied to predict DDIs with high sensitivity and specificity. The induced ILP rules gave us many interesting, biologically reciprocal relationships among PPIs, protein domains, and PPI-related genomic/proteomic features. Supplementary material is available at (http://***/~s0560205/PPIandDDI/).
The paper introduces LOGAN-H-a system for learning first-order function-free Horn expressions from interpretations. The system is based on an algorithm that learns by asking questions and that was proved correct in pr...
详细信息
The paper introduces LOGAN-H-a system for learning first-order function-free Horn expressions from interpretations. The system is based on an algorithm that learns by asking questions and that was proved correct in previous work. The current paper shows how the algorithm can be implemented in a practical system, and introduces a new algorithm based on it that avoids interaction and learns from examples only. The LOGAN-H system implements these algorithms and adds several facilities and optimizations that allow efficient applications in a wide range of problems. As one of the important ingredients, the system includes several fast procedures for solving the subsumption problem, an NP-complete problem that needs to be solved many times during the learning process. We describe qualitative and quantitative experiments in several domains. The experiments demonstrate that the system can deal with varied problems, large amounts of data, and that it achieves good classification accuracy.
In this paper, we propose a novel class of wrappers (logic wrappers) inspired by the logic prog- ramming paradigm. The developed logic wrappers (L-wrapper) have declarative semantics, and therefore: (i) their specific...
详细信息
In this paper, we propose a novel class of wrappers (logic wrappers) inspired by the logic prog- ramming paradigm. The developed logic wrappers (L-wrapper) have declarative semantics, and therefore: (i) their specification is decoupled from their implementation and (ii) they can be generated using inductive logic programming. We also define a convenient way for mapping L-wrappers to XSLT for efficient processing using available XSLT processing engines.
Fuzzy predicates have been incorporated into machine learning and data mining to extend the types of data relationships that can be represented, to facilitate the interpretation of rules in linguistic terms, and to av...
详细信息
Fuzzy predicates have been incorporated into machine learning and data mining to extend the types of data relationships that can be represented, to facilitate the interpretation of rules in linguistic terms, and to avoid unnatural boundaries in partitioning attribute domains. The confidence of an association is classically measured by the co-occurrence of attributes in tuples in the database. The semantics of fuzzy rules, however, is not co-occurrence but rather graduality or certainty and is determined by the implication operator that defines the rule. In this paper we present a learning algorithm, based on inductive logic programming, that simultaneously learns the semantics and evaluates the validity of fuzzy rules. The learning algorithm selects the implication that maximizes rule confidence while trying to be as informative as possible. The use of inductive logic programming increases the expressive power of fuzzy rules while maintaining their linguistic interpretability. (c) 2006 Elsevier B.V. All rights reserved.
Daikon is an implementation of dynamic detection of likely invariants;that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a p...
详细信息
Daikon is an implementation of dynamic detection of likely invariants;that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program;these are often used in assert statements, documentation, and formal specifications. Examples include being constant (x = a), non-zero (x not equal 0), being in a range (a <= x <= b), linear relationships (y = ax + b), ordering (x <= y), functions from a library (x = fn(y)), containment (x epsilon y), sortedness (x is sorted), and many more. Users can extend Daikon to check for additional invariants. Dynamic invariant detection runs a program, observes the values that the program computes, and then reports properties that were true over the observed executions. Dynamic invariant detection is a machine learning technique that can be applied to arbitrary data. Daikon can detect invariants in C, C + +, Java, and Perl programs, and in record-structured data sources;it is easy to extend Daikon to other applications. Invariants can be useful in program understanding and a host of other applications. Daikon's output has been used for generating test cases, predicting incompatibilities in component integration, automating theorem proving, repairing inconsistent data structures, and checking the validity of data streams, among other tasks. Daikon is freely available in source and binary form, along with extensive documentation, at http://***/daikon/. (c) 2007 Elsevier B.V. All rights reserved.
In this paper we propose a new formalization of the inductive logic programming (ILP) problem for a better handling of exceptions. It is now encoded in first-order possibilistic logic. This allows us to handle excepti...
详细信息
In this paper we propose a new formalization of the inductive logic programming (ILP) problem for a better handling of exceptions. It is now encoded in first-order possibilistic logic. This allows us to handle exceptions by means of prioritized rules, thus taking lessons from non-monotonic reasoning. Indeed, in classical first-order logic, the exceptions of the rules that constitute a hypothesis accumulate and classifying an example in two different classes, even if one is the right one, is not correct. The possibilistic formalization provides a sound encoding of non-monotonic reasoning that copes with rules with exceptions and prevents an example to be classified in more than one class. The benefits of our approach with respect to the use of first-order decision lists are pointed out. The possibilistic logic view of ILP problem leads to an optimization problem at the algorithmic level. An algorithm based on simulated annealing that in one turn computes the set of rules together with their priority levels is proposed. The reported experiments show that the algorithm is competitive to standard ILP approaches on benchmark examples. (c) 2007 Elsevier B.V. All rights reserved.
Control flow compilation is a hybrid between classical WAM compilation and meta-call, limited to the compilation of non-recursive clause bodies. This approach is used successfully for the execution of dynamically gene...
详细信息
Control flow compilation is a hybrid between classical WAM compilation and meta-call, limited to the compilation of non-recursive clause bodies. This approach is used successfully for the execution of dynamically generated queries in an inductive logic programming setting (ILP). Control flow compilation reduces compilation times up to an order of magnitude, without slowing down execution. A lazy variant of control flow compilation is also presented. By compiling code by need, it removes the overhead of compiling unreached code (a frequent phenomenon in practical ILP settings), and thus reduces the size of the compiled code. Both dynamic compilation approaches have been implemented and were combined with query packs, an efficient ILP execution mechanism. It turns out that locality of data and code is important for performance. The experiments reported in the paper show that lazy control flow compilation is superior in both artificial and real life settings.
SimStudent is a machine-learning agent that learns cognitive skills by demonstration. SimStudent was originally built as a building block for Cognitive Tutor Authoring Tools to help an author build a cognitive model w...
详细信息
ISBN:
(纸本)9781586037642
SimStudent is a machine-learning agent that learns cognitive skills by demonstration. SimStudent was originally built as a building block for Cognitive Tutor Authoring Tools to help an author build a cognitive model without significant programming. In this paper, we evaluate a second use of SimStudent, viz., student modeling for Intelligent Tutoring Systems. The basic idea is to have SimStudent observe human students solving problems. It then creates a cognitive model that can replicate the students' performance. If the model is accurate, it would predict the human students' performance on novel problems. An evaluation study showed that when trained on 15 problems, SimStudent accurately predicted the human students' correct behavior on the novel problems more than 80% of the time. However, the current implementation of SimStudent does not accurately predict when the human students make errors.
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networ...
详细信息
ISBN:
(纸本)9783540749578
We discuss how to learn non-recursive directed probabilistic logical models from relational data. This problem has been tackled before by upgrading the structure-search algorithm initially proposed for Bayesian networks. In this paper we propose to upgrade another algorithm, namely ordering-search, since for Bayesian networks this was found to work better than structure-search. We experimentally compare the two upgraded algorithms on two relational domains. We conclude that there is no significant difference between the two algorithms in terms of quality of the learnt models while ordering-search is significantly faster.
暂无评论