In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of structured individuals. The approach of 1BC is to project the individuals along first-order features. These features are...
详细信息
In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of structured individuals. The approach of 1BC is to project the individuals along first-order features. These features are built from the individual using structural predicates referring to related objects ( e. g., atoms within molecules), and properties applying to the individual or one or several of its related objects ( e. g., a bond between two atoms). We describe an individual in terms of elementary features consisting of zero or more structural predicates and one property;these features are treated as conditionally independent in the spirit of the naive Bayes assumption. 1BC2 represents an alternative first-order upgrade to the naive Bayesian classifier by considering probability distributions over structured objects ( e. g., a molecule as a set of atoms), and estimating those distributions from the probabilities of its elements ( which are assumed to be independent). We present a unifying view on both systems in which 1BC works in language space, and 1BC2 works in individual space. We also present a new, efficient recursive algorithm improving upon the original propositionalisation approach of 1BC. Both systems have been implemented in the context of the first-order descriptive learner Tertius, and we investigate the differences between the two systems both in computational terms and on artificially generated data. Finally, we describe a range of experiments on ILP benchmark data sets demonstrating the viability of our approach.
This paper introduces a proof procedure that integrates Abductive logicprogramming (ALP) and inductive logic programming (ILP) to automate the learning of first order Horn clause theories from examples and background...
详细信息
This paper introduces a proof procedure that integrates Abductive logicprogramming (ALP) and inductive logic programming (ILP) to automate the learning of first order Horn clause theories from examples and background knowledge. The work builds upon a recent approach called Hybrid Abductive inductive Learning (HAIL) by showing how language bias can be practically and usefully incorporated into the learning process. A proof procedure for HAIL is proposed that utilises a set of user-specified mode declarations to learn hypotheses that satisfy a given language bias. A semantics is presented that accurately characterises the intended hypothesis space and includes the hypotheses derivable by the proof procedure. An implementation is described that combines an extension of the Kakas-Mancarella ALP procedure within an ILP procedure that generalises the Progol system of Muggleton. The explicit integration of abduction and induction is shown to allow the derivation of multiple clause hypotheses in response to a single seed example and to enable the inference of missing type information in a way not previously possible.
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure o...
详细信息
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higher-order logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of real-world data sets. By converting our kernel to a distance pseudo-metric for 1-nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene data set by more than 10%.
Relatively simple transformations can speed up the execution of queries for data mining considerably. While some ILP systems use such transformations, relatively little is known about them or how they relate to each o...
详细信息
Relatively simple transformations can speed up the execution of queries for data mining considerably. While some ILP systems use such transformations, relatively little is known about them or how they relate to each other. This paper describes a number of such transformations. Not all of them are novel, but there have been no studies comparing their efficacy. The main contributions of the paper are: (a) it clarifies the relationship between the transformations;(b) it contains an empirical study of what can be gained by applying the transformations;and (c) it provides some guidance on the kinds of problems that are likely to benefit from the transformations.
A relation extraction system recognises pre-defined relation types between two identified entities from natural language documents. it is important for a task of automatically locating missing instances in knowledge b...
详细信息
ISBN:
(纸本)3540210067
A relation extraction system recognises pre-defined relation types between two identified entities from natural language documents. it is important for a task of automatically locating missing instances in knowledge base where the instance is represented as a triple ('entity - relation - entity'). A relation entry specifies a set of rules associated with the syntactic and semantic conditions under which appropriate relations would be extracted. Manually creating such rules requires knowledge from information experts and moreover, it is a time-consuming and error-prone task when the input sentences have little consistency in terms of structures and vocabularies. In this paper, we present an approach for applying a symbolic learning algorithm to sentences in order to automatically induce the extraction rules which then successfully classify a new sentence. The proposed approach takes into account semantic attributes (e.g., semantically close words and named-entities) in generalising common patterns among the sentences which enable the system to cope better with syntactically different but semantically similar sentences. Not only does this increase the number of relations extracted, but it also improves the accuracy in extracting relations by adding features which might not be discovered only with syntactic analysis. Experimental results show that this approach is effective on the sentences of the Web documents obtaining 17% higher precision and 34% higher recall values.
IndLog is a general purpose Prolog-based inductive logic programming (ILP) system. It is theoretically based on the Mode Directed Inverse Entailment and has several distinguishing features that makes it adequate for a...
详细信息
ISBN:
(数字)9783540302278
ISBN:
(纸本)3540232427
IndLog is a general purpose Prolog-based inductive logic programming (ILP) system. It is theoretically based on the Mode Directed Inverse Entailment and has several distinguishing features that makes it adequate for a wide range of applications. To search efficiently through large hypothesis spaces, IndLog uses original features like lazy evaluation of examples and Language Level Search. IndLog is applicable in numerical domains using the lazy evaluation of literals technique and Model Validation and Model Selection statistical-based techniques. IndLog has a MPI/LAM interface that enables its use in parallel or distributed environments, essential for Multi-relational Data Mining applications. Parallelism may be used in three flavours: splitting of the data among the computation nodes;parallelising the search through the hypothesis space and;using the different computation nodes to do theory-level search. IndLog has been applied successfully to major ILP literature datasets from the Life Sciences, Engineering, Reverse Engineering, Economics, Time-Series modelling to name a few.
When modern processors keep increasing the instruction window size and the issue width to exploit more instruction-level parallelism (ILP), the demand of larger physical register file is also on the increase. As a res...
详细信息
When modern processors keep increasing the instruction window size and the issue width to exploit more instruction-level parallelism (ILP), the demand of larger physical register file is also on the increase. As a result, register file access time represents one of the critical delays and can easily become a bottleneck. In this paper, we first discuss the possibilities of reducing register pressure by shortening the lifetime of physical registers, and evaluate several possible register renaming approaches. We then propose an efficient dynamic register renaming algorithm named LAER (Late Allocation and Early Release), which can be implemented through a two-level register file organization. In LAER renaming scheme, physical register allocations are delayed until the instructions are ready to be executed, and the physical registers in the first level are released once they become non-active, with the values backupped in the second level. We show that LAER algorithm can significantly reduce the register pressure with minimal cost of space and logic complexity, which means the same amount of ILP exploited with smaller physical register file, thus shorter register file access time and higher clock speed, or the same size of physical register file to achieve much higher performance.
Describes a tool for quantitatively discriminating between meningioma and astrocytoma tumors. One of the uses of magnetic resonance imaging (MRI) in clinical diagnosis is in-vivo discrimination between tumor and norma...
详细信息
Describes a tool for quantitatively discriminating between meningioma and astrocytoma tumors. One of the uses of magnetic resonance imaging (MRI) in clinical diagnosis is in-vivo discrimination between tumor and normal tissue and between tumor types in the brain. There is much interest in increasing the qualitative and quantitative information available from these images. This article presents a study that uses the inductive logic programming tool Progol on measurements of signal intensities in clinical scan images of 28 patients (18 with meningiomas and 10 with astrocytomas) to attempt to discover knowledge that quantitatively dissriminates between the two types of tumors.
inductive logic programming (ILP) is a study of machine learning systems that use clausal theories in first-order logic as a representation language. In this paper, we survey theoretical foundations of ILP from the vi...
详细信息
inductive logic programming (ILP) is a study of machine learning systems that use clausal theories in first-order logic as a representation language. In this paper, we survey theoretical foundations of ILP from the viewpoints of logic of Discovery and Machine Learning, and try to unify these two views with the support of the modern theory of logicprogramming. Firstly, we define several hypothesis construction methods in ILP and give their proof-theoretic foundations by treating then as a procedure which complete incomplete proofs. Next, we discuss the design of individual learning algorithms using these hypothesis construction methods. We review known results on learning logic programs in computational learning theory, and show that these algorithms are instances of a generic learning strategy with proof completion methods.
Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is it...
详细信息
Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. We identify a number of parameters that influence the obtainable speedups, and validate and refine our analysis with experiments on a variety of data sets with two different implementations. Besides cross-validation, we also briefly explore the usefulness of these techniques for bagging. We conclude with some guidelines concerning when these optimizations should be considered.
暂无评论