A relation extraction system recognises pre-defined relation types between two identified entities from natural language documents. it is important for a task of automatically locating missing instances in knowledge b...
详细信息
ISBN:
(纸本)3540210067
A relation extraction system recognises pre-defined relation types between two identified entities from natural language documents. it is important for a task of automatically locating missing instances in knowledge base where the instance is represented as a triple ('entity - relation - entity'). A relation entry specifies a set of rules associated with the syntactic and semantic conditions under which appropriate relations would be extracted. Manually creating such rules requires knowledge from information experts and moreover, it is a time-consuming and error-prone task when the input sentences have little consistency in terms of structures and vocabularies. In this paper, we present an approach for applying a symbolic learning algorithm to sentences in order to automatically induce the extraction rules which then successfully classify a new sentence. The proposed approach takes into account semantic attributes (e.g., semantically close words and named-entities) in generalising common patterns among the sentences which enable the system to cope better with syntactically different but semantically similar sentences. Not only does this increase the number of relations extracted, but it also improves the accuracy in extracting relations by adding features which might not be discovered only with syntactic analysis. Experimental results show that this approach is effective on the sentences of the Web documents obtaining 17% higher precision and 34% higher recall values.
Relatively simple transformations can speed up the execution of queries for data mining considerably. While some ILP systems use such transformations, relatively little is known about them or how they relate to each o...
详细信息
Relatively simple transformations can speed up the execution of queries for data mining considerably. While some ILP systems use such transformations, relatively little is known about them or how they relate to each other. This paper describes a number of such transformations. Not all of them are novel, but there have been no studies comparing their efficacy. The main contributions of the paper are: (a) it clarifies the relationship between the transformations;(b) it contains an empirical study of what can be gained by applying the transformations;and (c) it provides some guidance on the kinds of problems that are likely to benefit from the transformations.
IndLog is a general purpose Prolog-based inductive logic programming (ILP) system. It is theoretically based on the Mode Directed Inverse Entailment and has several distinguishing features that makes it adequate for a...
详细信息
ISBN:
(数字)9783540302278
ISBN:
(纸本)3540232427
IndLog is a general purpose Prolog-based inductive logic programming (ILP) system. It is theoretically based on the Mode Directed Inverse Entailment and has several distinguishing features that makes it adequate for a wide range of applications. To search efficiently through large hypothesis spaces, IndLog uses original features like lazy evaluation of examples and Language Level Search. IndLog is applicable in numerical domains using the lazy evaluation of literals technique and Model Validation and Model Selection statistical-based techniques. IndLog has a MPI/LAM interface that enables its use in parallel or distributed environments, essential for Multi-relational Data Mining applications. Parallelism may be used in three flavours: splitting of the data among the computation nodes;parallelising the search through the hypothesis space and;using the different computation nodes to do theory-level search. IndLog has been applied successfully to major ILP literature datasets from the Life Sciences, Engineering, Reverse Engineering, Economics, Time-Series modelling to name a few.
When modern processors keep increasing the instruction window size and the issue width to exploit more instruction-level parallelism (ILP), the demand of larger physical register file is also on the increase. As a res...
详细信息
When modern processors keep increasing the instruction window size and the issue width to exploit more instruction-level parallelism (ILP), the demand of larger physical register file is also on the increase. As a result, register file access time represents one of the critical delays and can easily become a bottleneck. In this paper, we first discuss the possibilities of reducing register pressure by shortening the lifetime of physical registers, and evaluate several possible register renaming approaches. We then propose an efficient dynamic register renaming algorithm named LAER (Late Allocation and Early Release), which can be implemented through a two-level register file organization. In LAER renaming scheme, physical register allocations are delayed until the instructions are ready to be executed, and the physical registers in the first level are released once they become non-active, with the values backupped in the second level. We show that LAER algorithm can significantly reduce the register pressure with minimal cost of space and logic complexity, which means the same amount of ILP exploited with smaller physical register file, thus shorter register file access time and higher clock speed, or the same size of physical register file to achieve much higher performance.
inductive logic programming (ILP) is a study of machine learning systems that use clausal theories in first-order logic as a representation language. In this paper, we survey theoretical foundations of ILP from the vi...
详细信息
inductive logic programming (ILP) is a study of machine learning systems that use clausal theories in first-order logic as a representation language. In this paper, we survey theoretical foundations of ILP from the viewpoints of logic of Discovery and Machine Learning, and try to unify these two views with the support of the modern theory of logicprogramming. Firstly, we define several hypothesis construction methods in ILP and give their proof-theoretic foundations by treating then as a procedure which complete incomplete proofs. Next, we discuss the design of individual learning algorithms using these hypothesis construction methods. We review known results on learning logic programs in computational learning theory, and show that these algorithms are instances of a generic learning strategy with proof completion methods.
inductive logic programming (ILP) is a form of machine learning that induces rules from data using the language and syntax of logicprogramming. A rule construction algorithm forms rules that summarize data sets. Thes...
详细信息
ISBN:
(纸本)0780362624
inductive logic programming (ILP) is a form of machine learning that induces rules from data using the language and syntax of logicprogramming. A rule construction algorithm forms rules that summarize data sets. These rules can be used in a large spectrum of data mining activities. In ILP, the rules are constructed with a target predicate as the consequent, or head, of the rule, and with high-ranking literals forming the antecedent, or body, of the rule. The predicate rankings are obtained by applying predicate ranking algorithms to a domain (background) knowledge base. In this work, we present three new predicate ranking algorithms for the inductive logic programming system, INDED (pronounced "indeed"). The algorithms use a grouping technique employing basic set theoretic operations to generate the rankings. We also present results of applying the ranking algorithms to several problem domains, some of which are universal like the classical genealogy problem, and others, not so common. In particular, diagnosis is the main thread of many of our experiments. Here, although our experimentation relates to medical diagnosis in diabetes and Lyme disease, many of the same techniques and methodologies can be applied to other forms of diagnosis including system failure, sensor detection, and trouble-shooting.
作者:
Yamamoto, AHokkaido Univ
Fac Technol & Meme Media Lab Kita Ku Sapporo Hokkaido 0608628 Japan
For given logical formulae B and E such that B K E, hypothesis finding means the generation of a formula H such that Bboolean ANDH satisfies E. Hypothesis finding constitutes a basic technique for fields of inference,...
详细信息
For given logical formulae B and E such that B K E, hypothesis finding means the generation of a formula H such that Bboolean ANDH satisfies E. Hypothesis finding constitutes a basic technique for fields of inference, like inductive inference and knowledge discovery. In order to put various hypothesis finding methods proposed previously on one general ground, we use upward refinement and residue hypotheses. We show that their combination is a complete method for solving any hypothesis finding problem in clausal logic. We extend the relative subsumption relation, and show that some hypothesis finding methods previously presented can be regarded as finding hypotheses which subsume examples relative to a given background theory. Noting that the weakening rule may make hypothesis finding difficult to solve, we propose restricting this rule either to the inverse of resolution or to that of subsumption. We also note that this work is related to relevant logic. (C) 2002 Elsevier Science B.V. All rights reserved.
In this paper, we propose an inductive logic programming learning method which aims at automatically extracting special Noun-Verb (N-V) pairs from a corpus in order to build up semantic lexicons based on Pustejovsky...
详细信息
In this paper, we propose an inductive logic programming learning method which aims at automatically extracting special Noun-Verb (N-V) pairs from a corpus in order to build up semantic lexicons based on Pustejovsky's Generative Lexicon (GL) principles (Pustejovsky, 1995). In one of the components of this lexical model, called the qualia structure, words are described in terms of semantic roles. For example, the telic role indicates the purpose or function of an item (cut for knife), the agentive role its creation mode (build for house), etc. The qualia structure of a noun is mainly made up of verbal associations, encoding relational information. The inductive logic programming learning method that we have developed enables us to automatically extract from a corpus N-V pairs whose elements are linked by one of the semantic relations defined in the qualia structure in GL, and to distinguish them, in terms of surrounding categorial context from N-V pairs also present in sentences of the corpus but not relevant. This method has been theoretically and empirically validated, on a technical corpus. The N-V pairs that have been extracted will further be used in information retrieval applications for index expansion.
The study of protein structure has been driven largely by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will mak...
详细信息
The study of protein structure has been driven largely by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will make it increasingly difficult to analyse (and determine the principles responsible for) the distribution of proteins in fold space by inspection alone. Here, we demonstrate a machine-learning strategy that automatically determines the structural principles describing 45 folds. The rules learnt were shown to be both statistically significant and meaningful to protein experts. With the increasing emphasis on high-throughput experimental initiatives, machine-learning and other automated methods of analysis will become increasingly important for many biological problems. (C) 2003 Elsevier Ltd. All rights reserved.
The interest of introducing fuzzy predicates when learning rules is twofold. When dealing with numerical data, it enables us to avoid arbitrary discretization. Moreover, it enlarges the expressive power of what is lea...
详细信息
ISBN:
(纸本)3540200851
The interest of introducing fuzzy predicates when learning rules is twofold. When dealing with numerical data, it enables us to avoid arbitrary discretization. Moreover, it enlarges the expressive power of what is learned by considering different types of fuzzy rules, which may describe gradual behaviors of related attributes or uncertainty pervading conclusions. This paper describes different types of first-order fuzzy rules and a method for learning each type. Finally, we discuss the interest of each type of rules on a benchmark example.
暂无评论