Many domains in the field of inductive logic programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The ...
详细信息
Many domains in the field of inductive logic programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least L of these K clauses" thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time.
We introduce a test, named pi-subsumption, which computes partial subsumptions between a hypothesis h and an example e, as well as a measure, the subsumption index, which quantifies the covering degree between h and e...
详细信息
ISBN:
(纸本)354045375X
We introduce a test, named pi-subsumption, which computes partial subsumptions between a hypothesis h and an example e, as well as a measure, the subsumption index, which quantifies the covering degree between h and e. The behavior of this measure is studied on the phase transition problem.
Recent statistical performance studies of search algorithms in difficult combinatorial problems have demonstrated the benefits of randomising and restarting the search procedure. Specifically, it has been found that i...
详细信息
Recent statistical performance studies of search algorithms in difficult combinatorial problems have demonstrated the benefits of randomising and restarting the search procedure. Specifically, it has been found that if the search cost distribution of the non-restarted randomised search exhibits a slower-than-exponential decay (that is, a "heavy tail"), restarts can reduce the search cost expectation. We report on an empirical study of randomised restarted search in ILP. Our experiments conducted on a high-performance distributed computing platform provide an extensive statistical performance sample of five search algorithms operating on two principally different classes of ILP problems, one represented by an artificially generated graph problem and the other by three traditional classification benchmarks (mutagenicity, carcinogenicity, finite element mesh design). The sample allows us to (1) estimate the conditional expected value of the search cost (measured by the total number of clauses explored) given the minimum clause score required and a "cutoff" value (the number of clauses examined before the search is restarted), (2) estimate the conditional expected clause score given the cutoff value and the invested search cost, and (3) compare the performance of randomised restarted search strategies to a deterministic non-restarted search. Our findings indicate striking similarities across the five search algorithms and the four domains, in terms of the basic trends of both the statistics (1) and (2). Also, we observe that the cutoff value is critical for the performance of the search algorithm, and using its optimal value in a randomised restarted search may decrease the mean search cost (by several orders of magnitude) or increase the mean achieved score significantly with respect to that obtained with a deterministic non-restarted search.
We study several complexity parameters for first order formulas and their suitability for first order learning models. We show that the standard notion of size is not captured by sets of parameters that are used in th...
详细信息
ISBN:
(纸本)9783540399179
We study several complexity parameters for first order formulas and their suitability for first order learning models. We show that the standard notion of size is not captured by sets of parameters that are used in the literature and thus they cannot give a complete characterization in terms of learnability with polynomial resources. We then identify an alternative notion of size and a simple set of parameters that are useful for first order Horn Expressions. These parameters are the number of clauses in the expression, the maximum number of distinct terms in a clause, and the maximum number of literals in a clause. Matching lower bounds derived using the Vapnik Chervonenkis dimension complete the picture showing that these parameters are indeed crucial.
inductive logic programming (ILP) is a well-known machine learning technique for learning concepts from relational data. Nevertheless, ILP systems are not robust enough to noisy or unseen data in real world domains. F...
详细信息
inductive logic programming (ILP) is a well-known machine learning technique for learning concepts from relational data. Nevertheless, ILP systems are not robust enough to noisy or unseen data in real world domains. Furthermore, in multi-class problems, if the example is not matched with any learned rules, it cannot be classified. This paper presents a novel hybrid learning method to alleviate this restriction by enabling Neural Networks to handle first-order logic programs directly. The proposed method, called First-Order logical Neural Network (FOLNN), employs the standard feedforward neural network and integrates inductive learning from examples and background knowledge. We also propose a method for determining the appropriate variable substitution in FOLNN learning by using Multiple-Instance Learning (MIL). In the experiments, the proposed method has been evaluated on two first-order learning problems, i.e., the Finite Element Mesh Design and Mutagenesis and compared with the state-of-the-art, the PROGOL system. The experimental results show that the proposed method performs better than PROGOL.
Background: We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descr...
详细信息
Background: We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. Results: The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. Conclusion: Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.
This paper presents a methodology to design a discrete-event system (DES) for the on-line supervision of a biotechnological process. The DES is synthesized applying wavelet transform and inductive logic programming on...
详细信息
This paper presents a methodology to design a discrete-event system (DES) for the on-line supervision of a biotechnological process. The DES is synthesized applying wavelet transform and inductive logic programming on the measured signals constrained to the biotechnologist expert validation. (C) 2002 Elsevier Science B.V. All rights reserved.
An important ingredient in agent-mediated electronic commerce is the presence of intelligent mediating agents that assist electronic commerce participants (e.g. individual users, other agents, organisations). These me...
详细信息
An important ingredient in agent-mediated electronic commerce is the presence of intelligent mediating agents that assist electronic commerce participants (e.g. individual users, other agents, organisations). These mediating agents are in principle autonomous agents that interact with their environments (e.g. other agents and web-servers) oil behalf of participants who have delegated tasks to them. For mediating agents a (preference) model of participants is indispensable. In this paper, a generic mediating agent architecture is introduced. Furthermore, we discuss our view of user preference modelling and its need in agent-mediated electronic commerce. We survey the state of the art in the field of preference modelling and suggest that the preferences of electronic commerce participants can be modelled by learning from their behaviour. In particular, we employ an existing machine learning method called inductive logic programming (ILP). We argue that this method can be used by mediating agents to detect regularities in the behaviour of the involved participants and induce hypotheses about their preferences automatically. Finally, we discuss some advantages and disadvantages of using inductive logic programming as a method for learning user preferences and compare this method with other approaches. (c) 2005 Elsevier B.V. All rights reserved.
A continuing problem with inductive logic programming (ILP) has been the poor handling of numbers. Constraint inductive logic programming (CILP) aims to solve this problem with ILP. We propose a new approach to genera...
详细信息
A continuing problem with inductive logic programming (ILP) has been the poor handling of numbers. Constraint inductive logic programming (CILP) aims to solve this problem with ILP. We propose a new approach to generating numerical constraints in CILP, and describe an implementation of the CILP system (namely, BPU-CILP). In our approach, methods from pattern recognition and multivariate data analysis, such as Fisher's linear discriminant, dynamic clustering and principal component analysis, are introduced into CILP. The BPU-CILP can generate various forms of polynomial constraints of multiple dimensions, without additional background knowledge. As a result, the constraint logic program covering all positive examples and consistent with all negative examples can be derived automatically.
inductive learning in First-Order logic (FOL) is a hard task due to both the prohibitive size of the search space and the computational cost of evaluating hypotheses. This paper describes an evolutionary algorithm for...
详细信息
inductive learning in First-Order logic (FOL) is a hard task due to both the prohibitive size of the search space and the computational cost of evaluating hypotheses. This paper describes an evolutionary algorithm for concept learning in (a fragment of) FOL. The algorithm, called ECL (for Evolutionary Concept Learner), evolves a population of Horn clauses by repeated selection, mutation and optimization of more fit clauses. ECL relies on four greedy mutation operators for searching the hypothesis space, and employs an optimization phase that follows each mutation. Experimental results show that ECL works well in practice.
暂无评论