To date, inductive logic programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunicati...
详细信息
To date, inductive logic programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunications, astronomy, text processing, financial markets and biology, machine-generated data are being generated continuously and on a vast scale. We see at least four kinds of problems that this presents for ILP: (1) it may not be possible to store all of the data, even in secondary memory;(2) even if it were possible to store the data, it may be impractical to construct an acceptable model using partitioning techniques that repeatedly perform expensive coverage or subsumption-tests on the data;(3) models constructed at some point may become less effective, or even invalid, as more data become available (exemplified by the "drift" problem when identifying concepts);and (4) the representation of the data instances may need to change as more data become available (a kind of "language drift" problem). In this paper, we investigate the adoption of a stream-based on-line learning approach to relational data. Specifically, we examine the representation of relational data in both an infinite-attribute setting, and in the usual fixed-attribute setting, and develop implementations that use ILP engines in combination with on-line model-constructors. The behaviour of each program is investigated using a set of controlled experiments, and performance in practical settings is demonstrated by constructing complete theories for some of the largest biochemical datasets examined by ILP systems to date, including one with a million examples;to the best of our knowledge, the first time this has been empirically demonstrated with ILP on a real-world data set.
Motivated by an analogy with matrix factorization, we introduce the problem of factorizing relational data. In matrix factorization, one is given a matrix and has to factorize it as a product of other matrices. In rel...
详细信息
Motivated by an analogy with matrix factorization, we introduce the problem of factorizing relational data. In matrix factorization, one is given a matrix and has to factorize it as a product of other matrices. In relational data factorization, the task is to factorize a given relation as a conjunctive query over other relations, i.e., as a combination of natural join operations. Given a conjunctive query and the input relation, the problem is to compute the extensions of the relations used in the query. Thus, relational data factorization is a relational analog of matrix factorization;it is also a form of inverse querying as one has to compute the relations in the query from the result of the query. The result of relational data factorization is neither necessarily unique nor required to be a lossless decomposition of the original relation. Therefore, constraints can be imposed on the desired factorization and a scoring function is used to determine its quality (often similarity to the original data). Relational data factorization is thus a constraint satisfaction and optimization problem. We show how answer set programming can be used for solving relational data factorization problems.
Answer Set programming (ASP) is a powerful modeling formalism for combinatorial problems. However, writing ASP models can be hard. We propose a novel method, called Sketched Answer Set programming (SkASP), aimed at fa...
详细信息
ISBN:
(纸本)9781538674499
Answer Set programming (ASP) is a powerful modeling formalism for combinatorial problems. However, writing ASP models can be hard. We propose a novel method, called Sketched Answer Set programming (SkASP), aimed at facilitating this. In SkASP, the user writes partial ASP programs, in which uncertain parts are left open and marked with question marks. In addition, the user provides a number of positive and negative examples of the desired program behaviour. SkASP then synthesises a complete ASP program. This is realized by rewriting the SkASP program into another ASP program, which can then be solved by traditional ASP solvers. We evaluate our approach on 21 well known puzzles and combinatorial problems inspired by Karps 21 NP-complete problems and on publicly available ASP encodings.
Interlanguage Pragmatics(ILP) was established in the late 1970 s. Since its emergence many researchers devoted themselves to its theoretical and empirical studies, and have made great achievements. The purpose of this...
详细信息
ISBN:
(纸本)9781510878693
Interlanguage Pragmatics(ILP) was established in the late 1970 s. Since its emergence many researchers devoted themselves to its theoretical and empirical studies, and have made great achievements. The purpose of this article is to review its research domains and data collection methods in ILP, and discuss its prospect in the future.
In this paper we address an issue that has been brought to the attention of the database community with the advent of the Semantic Web, i.e., the issue of how ontologies (and semantics conveyed by them) can help solvi...
详细信息
In this paper we address an issue that has been brought to the attention of the database community with the advent of the Semantic Web, i.e., the issue of how ontologies (and semantics conveyed by them) can help solving typical database problems, through a better understanding of Knowledge Representation (KR) aspects related to databases. In particular, we investigate this issue from the 1LP perspective by considering two database problems, (i) the definition of views and (ii) the definition of constraints, for a database whose schema is represented also by means of an ontology. Both can be reformulated as I LP problems and can benefit from the expressive and deductive power of the KR framework DL+LOG(V). We illustrate the application scenarios by means of examples.
In inductive learning of a broad concept, an algorithm should be able to distinguish concept examples from exceptions and noisy data. An approach through recursively finding patterns in exceptions turns out to corresp...
详细信息
In inductive learning of a broad concept, an algorithm should be able to distinguish concept examples from exceptions and noisy data. An approach through recursively finding patterns in exceptions turns out to correspond to the problem of learning default theories. Default logic is what humans employ in common-sense reasoning. Therefore, learned default theories are better understood by humans. In this paper, we present new algorithms to learn default theories in the form of non-monotonic logic programs. Experiments reported in this paper show that our algorithms are a significant improvement over traditional approaches based on inductive logic programming. Under consideration for acceptance in TPLP.
inductive learning has been employed successfully in various domains, however the inductive logic programming (ILP) systems focused on non-incremental learning tasks where independent sets of data are provided incoher...
详细信息
ISBN:
(纸本)9781538616390
inductive learning has been employed successfully in various domains, however the inductive logic programming (ILP) systems focused on non-incremental learning tasks where independent sets of data are provided incoherently. In this paper, we propose a new genetic algorithm-based ILP system, called GAILP, for incremental learning. GAILP is a covering algorithm which extracts hypotheses/rules from a collection of examples in a reliable way. It employs a genetic algorithm technique to discover various aspects of the potential combinations. GAILP induces every possible rule for the given combination and selects the most generic ones among them. It also eliminates rules which might become obsolete by the existence of more generic rules. Unlike other ILP systems, GAILP batches all given examples and background knowledge, then it groups the examples and prioritizes the induction process. This prioritization needs to be done to preserve dependency and to revise theory. The paper introduces GAILP's fundamentals mechanisms and demonstrates its algorithms with a running example.
This authoritative, expanded and updated second edition of Encyclopedia of Machine Learning and Data Mining provides easy access to core information for those seeking entry into any aspect within the broad field of Ma...
详细信息
ISBN:
(数字)9781489976871
ISBN:
(纸本)9781489976857
This authoritative, expanded and updated second edition of Encyclopedia of Machine Learning and Data Mining provides easy access to core information for those seeking entry into any aspect within the broad field of Machine Learning and Data Mining. A paramount work, its 800 entries - about 150 of them newly updated or added - are filled with valuable literature references, providing the reader with a portal to more detailed information on any given topic. Topics for the Encyclopedia of Machine Learning and Data Mining include Learning and logic, Data Mining, Applications, Text Mining, Statistical Learning, Reinforcement Learning, Pattern Mining, Graph Mining, Relational Mining, Evolutionary Computation, Information Theory, Behavior Cloning, and many others. Topics were selected by a distinguished international advisory board. Each peer-reviewed, highly-structured entry includes a definition, key words, an illustration, applications, a bibliography, and links to related literature. The entries are expository and tutorial, making this reference a practical resource for students, academics, or professionals who employ machine learning and data mining methods in their projects. Machine learning and data mining techniques have countless applications, including data science applications, and this reference is essential for anyone seeking quick access to vital information on the topic.
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research,especially for the development of diagnostics and effective therapeutic *** this work,we propose ...
详细信息
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research,especially for the development of diagnostics and effective therapeutic *** this work,we propose a methodology using inductive logic programming(ILP) to automatically extract knowledge about deleterious/neutral mutations from a multi-relational database,named *** used 8117 mutations in 805 proteins with known three-dimensional structure in our *** using ILP for learning,we obtained classification rules that can be interpreted by a human expert and that help to improve our understanding of the relationships between physico-chemical and evolutionary features and deleterious *** experimental results,compared with state-of-the-art methods,show that the proposed approach can be applied to predict the impact of single amino acid replacement on the function of a *** rules and the estimated effect of human non-synonymous polymorphisms on the function of a protein are available at http://***/sm2ph/***.
In inductive learning of a broad concept, an algorithm should be able to distinguish concept examples from exceptions and noisy data. An approach through recursively finding patterns in exceptions turns out to corresp...
详细信息
In inductive learning of a broad concept, an algorithm should be able to distinguish concept examples from exceptions and noisy data. An approach through recursively finding patterns in exceptions turns out to correspond to the problem of learning default theories. Default logic is what humans employ in common-sense reasoning. Therefore, learned default theories are better understood by humans. In this paper, we present new algorithms to learn default theories in the form of non-monotonic logic programs. Experiments reported in this paper show that our algorithms are a significant improvement over traditional approaches based on inductive logic programming. Under consideration for acceptance in TPLP.
暂无评论