This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the Induct...
详细信息
This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the inductive logic programming (ILP) system PROGOL is applied to the problem of identifying potential pharmacophores for ACE inhibition. The case study reported in this paper supports four general lessons for machine learning and knowledge discovery, as well as more specific lessons for pharmacophore discovery, for inductive logic programming, and for ACE inhibition. The general lessons for machine learning and knowledge discovery are as follows. 1. An initial rediscovery step is a useful tool when approaching a new application domain. 2. General machine learning heuristics may fail to match the derails of an application domain, but it may be possible to successfully apply a heuristic-based algorithm in spite of the mismatch. 3. A complete search for all plausible hypotheses can provide useful information to a user, although experimentation may be required to choose between competing hypotheses. 4. A declarative knowledge representation facilitates the development and debugging of background knowledge in collaboration with a domain expert, as well as the communication of final results.
By analysing sequences of actions performed by a user, one can find frequent subsequences that can be suggested as macro (script) definitions. However, often these 'actions' have additional features. In this p...
详细信息
ISBN:
(纸本)3540423257
By analysing sequences of actions performed by a user, one can find frequent subsequences that can be suggested as macro (script) definitions. However, often these 'actions' have additional features. In this paper we combine an algorithm to detect frequent subsequences with an inductive logic programming system to automatically generate for each frequent subsequence the most specific 'template' for these additional features that is consistent with the observed frequent subsequences. The resulting system is implemented and used in an application where we automatically generate macros from logs of the use of a Unix command shell.
In this paper, we present a learning simulator consisting of an interface, an inference engine and an inductive logic programming(ILP) system. Possible usage of the simulator includes to check the behavior of CAI syst...
详细信息
ISBN:
(纸本)0780371011
In this paper, we present a learning simulator consisting of an interface, an inference engine and an inductive logic programming(ILP) system. Possible usage of the simulator includes to check the behavior of CAI systems, to be adopted as a novice agent in CAI systems, and for a teacher to check contents for study by observing the response of the simulator. The learning simulator learns interactively. First, a teacher ( or a CAI system) gives background knowledge as basic rules and examples. Next, a teacher asks the simulator some question. Using the background knowledge, rules generated by ILP and examples stored in the memory, the simulator answers the question. Then, the simulator stores the examples and updates rules for the next question, after the teacher tells the correct answer. We implement the simulator, using a Prolog interpreter and ILP system FOIL. We show learning results obtained through computer simulations.
This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completen...
详细信息
This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption as a substitute for negative examples, and the use of intensional background knowledge. The development of FOIDL was originally motivated by the problem of learning to generate the past tense of English verbs;however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that FOIDL's decision lists enable it to produce generally more accurate results than a range of methods previously applied to this problem. Tests with a selection of list-processing problems from Bratko's introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allow FOIDL to learn correct programs from far fewer examples than FOIL.
This paper addresses an important application of machine learning (ML) in design. One of the major bottlenecks in the process of engineering analysis by using the finite-element method-a design of the finite-element m...
详细信息
This paper addresses an important application of machine learning (ML) in design. One of the major bottlenecks in the process of engineering analysis by using the finite-element method-a design of the finite-element mesh-was a subject of improvement. Defining an appropriate geometric mesh model that ensures low approximation errors and avoids unnecessary computational overhead is a very difficult and time-consuming task based mainly on the user's experience. A knowledge base for finite-element mesh design has been constructed using the ML techniques. Ten mesh models have been used as a source of training examples. The mesh dataset was probably the first real-world relational dataset and became one of the most widely used training set for experimenting with inductive logic programming (ILP) systems. After several experiments with different ML systems in the last few years, the ILP system CLAUDIEN was chosen to construct the rules for determining the appropriate mesh resolution values. The ILP has been found to be an effective approach to the problem of mesh design. An evaluation of the resulting knowledge base shows that the mesh design patterns are captured well by the induced rules and represent a solid basis for practical application. The aim of this paper is not only to present the real-life ML application to design, but also to describe and discuss a relation of the work being done to the topic of this special issue: the proposed "dimensions" of ML in design.
This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways ...
详细信息
This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways of transforming it into a learner for ordinal classification tasks. These algorithm variants are compared on a number of benchmark data sets to verify the relative strengths and weaknesses of the strategies and to study the trade-off between optimal categorical classification accuracy (hit rate) and minimum distance-based error. Preliminary results indicate that this is a promising avenue towards algorithms that combine aspects of classification and regression.
We present a new method for discovering knowledge from structured data which are represented Ly graphs in the framework of inductive logic programming. A graph, or network, is widely used for representing relations be...
详细信息
We present a new method for discovering knowledge from structured data which are represented Ly graphs in the framework of inductive logic programming. A graph, or network, is widely used for representing relations between various data and expressing a small and easily understandable hypothesis. The analyzing system directly manipulating graphs is useful for knowledge discovery. Our method uses Formal Graph System (FGS) as a knowledge representation language for graph structured data. FGS is a kind of logicprogramming system which directly deals with graphs just like first order terms. And our method employs a refutably inductive inference algorithm as a learning algorithm. A refutably inductive inference algorithm is a special type of inductive inference algorithm with refutability of hypothesis spaces. and is suitable for knowledge discovery. We give a sufficiently large hypothesis space, the set of weakly reducing FGS programs. And we show that this hypothesis space is refutably inferable from complete data. We have designed and implemented a prototype of a knowledge discovery system KD-FGS, which is based on our method and acquires knowledge directly from graph structured data. Finally we discuss the applicability of our method for graph structured data with experimental results on some graph theoretical notions.
Character recognition systems can contribute tremendously to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, check v...
详细信息
Character recognition systems can contribute tremendously to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, check verification and a large variety of banking, business and data entry applications. The main theme of this paper is the automatic recognition of hand printed Arabic characters using machine learning. Conventional methods have relied on hand-constructed dictionaries which are tedious to construct and difficult to make tolerant to variation in writing styles. The advantages of machine learning are that it can generalize over the large degree of variation between writing styles and recognition rules can be constructed by example. The system was tested on a sample of handwritten characters from several individuals whose writing ranged from acceptable to poor in quality and the correct average recognition rate obtained using cross-validation was 89.65%.
The discovery of interesting patterns in relational databases is an important data mining task. This paper is concerned with the development of a search algorithm for first-order hypothesis spaces adopting an importan...
详细信息
The discovery of interesting patterns in relational databases is an important data mining task. This paper is concerned with the development of a search algorithm for first-order hypothesis spaces adopting an important pruning technique (termed subset pruning here) from association rule mining in a first-order setting. The basic search algorithm is extended by so-called requires and excludes constraints allowing to declare prior knowledge about the data, such as mutual exclusion or generalization relationships among attributes, so that it can be exploited for further structuring and restricting the search space. Furthermore, it is illustrated how to process taxonomies and numerical attributes in the search algorithm. Several task settings using different interestingness criteria and search modes with corresponding pruning criteria are described. Three settings serve as test beds for evaluation of the proposed approach. The experimental evaluation shows that the impact of subset pruning is significant, since it reduces the number of hypothesis evaluations in many cases by about 50%. The impact of generalization relationships is shown to be less effective in our experimental set-up.
One of the obstacles to widely using first-order logic languages is the fact that relational inference is intractable in the worst case. This paper presents an any-time relational inference algorithm: it proceeds by s...
详细信息
One of the obstacles to widely using first-order logic languages is the fact that relational inference is intractable in the worst case. This paper presents an any-time relational inference algorithm: it proceeds by stochastically sampling the inference search space, after this space has been judiciously restricted using strongly-typed logic-like declarations. We present a relational learner producing programs geared to stochastic inference, named STILL, to enforce the potentialities of this framework. STILL handles examples described as definite or constrained clauses, and uses sampling-based heuristics again to achieve any-time learning. Controlling both the construction and the exploitation of logic programs yields robust relational reasoning, where deductive biases are compensated for by inductive biases, and vice versa.
暂无评论