This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completen...
详细信息
This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption as a substitute for negative examples, and the use of intensional background knowledge. The development of FOIDL was originally motivated by the problem of learning to generate the past tense of English verbs;however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that FOIDL's decision lists enable it to produce generally more accurate results than a range of methods previously applied to this problem. Tests with a selection of list-processing problems from Bratko's introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allow FOIDL to learn correct programs from far fewer examples than FOIL.
This paper addresses an important application of machine learning (ML) in design. One of the major bottlenecks in the process of engineering analysis by using the finite-element method-a design of the finite-element m...
详细信息
This paper addresses an important application of machine learning (ML) in design. One of the major bottlenecks in the process of engineering analysis by using the finite-element method-a design of the finite-element mesh-was a subject of improvement. Defining an appropriate geometric mesh model that ensures low approximation errors and avoids unnecessary computational overhead is a very difficult and time-consuming task based mainly on the user's experience. A knowledge base for finite-element mesh design has been constructed using the ML techniques. Ten mesh models have been used as a source of training examples. The mesh dataset was probably the first real-world relational dataset and became one of the most widely used training set for experimenting with inductive logic programming (ILP) systems. After several experiments with different ML systems in the last few years, the ILP system CLAUDIEN was chosen to construct the rules for determining the appropriate mesh resolution values. The ILP has been found to be an effective approach to the problem of mesh design. An evaluation of the resulting knowledge base shows that the mesh design patterns are captured well by the induced rules and represent a solid basis for practical application. The aim of this paper is not only to present the real-life ML application to design, but also to describe and discuss a relation of the work being done to the topic of this special issue: the proposed "dimensions" of ML in design.
This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways ...
详细信息
This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways of transforming it into a learner for ordinal classification tasks. These algorithm variants are compared on a number of benchmark data sets to verify the relative strengths and weaknesses of the strategies and to study the trade-off between optimal categorical classification accuracy (hit rate) and minimum distance-based error. Preliminary results indicate that this is a promising avenue towards algorithms that combine aspects of classification and regression.
We present a new method for discovering knowledge from structured data which are represented Ly graphs in the framework of inductive logic programming. A graph, or network, is widely used for representing relations be...
详细信息
We present a new method for discovering knowledge from structured data which are represented Ly graphs in the framework of inductive logic programming. A graph, or network, is widely used for representing relations between various data and expressing a small and easily understandable hypothesis. The analyzing system directly manipulating graphs is useful for knowledge discovery. Our method uses Formal Graph System (FGS) as a knowledge representation language for graph structured data. FGS is a kind of logicprogramming system which directly deals with graphs just like first order terms. And our method employs a refutably inductive inference algorithm as a learning algorithm. A refutably inductive inference algorithm is a special type of inductive inference algorithm with refutability of hypothesis spaces. and is suitable for knowledge discovery. We give a sufficiently large hypothesis space, the set of weakly reducing FGS programs. And we show that this hypothesis space is refutably inferable from complete data. We have designed and implemented a prototype of a knowledge discovery system KD-FGS, which is based on our method and acquires knowledge directly from graph structured data. Finally we discuss the applicability of our method for graph structured data with experimental results on some graph theoretical notions.
Character recognition systems can contribute tremendously to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, check v...
详细信息
Character recognition systems can contribute tremendously to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, check verification and a large variety of banking, business and data entry applications. The main theme of this paper is the automatic recognition of hand printed Arabic characters using machine learning. Conventional methods have relied on hand-constructed dictionaries which are tedious to construct and difficult to make tolerant to variation in writing styles. The advantages of machine learning are that it can generalize over the large degree of variation between writing styles and recognition rules can be constructed by example. The system was tested on a sample of handwritten characters from several individuals whose writing ranged from acceptable to poor in quality and the correct average recognition rate obtained using cross-validation was 89.65%.
The discovery of interesting patterns in relational databases is an important data mining task. This paper is concerned with the development of a search algorithm for first-order hypothesis spaces adopting an importan...
详细信息
The discovery of interesting patterns in relational databases is an important data mining task. This paper is concerned with the development of a search algorithm for first-order hypothesis spaces adopting an important pruning technique (termed subset pruning here) from association rule mining in a first-order setting. The basic search algorithm is extended by so-called requires and excludes constraints allowing to declare prior knowledge about the data, such as mutual exclusion or generalization relationships among attributes, so that it can be exploited for further structuring and restricting the search space. Furthermore, it is illustrated how to process taxonomies and numerical attributes in the search algorithm. Several task settings using different interestingness criteria and search modes with corresponding pruning criteria are described. Three settings serve as test beds for evaluation of the proposed approach. The experimental evaluation shows that the impact of subset pruning is significant, since it reduces the number of hypothesis evaluations in many cases by about 50%. The impact of generalization relationships is shown to be less effective in our experimental set-up.
One of the obstacles to widely using first-order logic languages is the fact that relational inference is intractable in the worst case. This paper presents an any-time relational inference algorithm: it proceeds by s...
详细信息
One of the obstacles to widely using first-order logic languages is the fact that relational inference is intractable in the worst case. This paper presents an any-time relational inference algorithm: it proceeds by stochastically sampling the inference search space, after this space has been judiciously restricted using strongly-typed logic-like declarations. We present a relational learner producing programs geared to stochastic inference, named STILL, to enforce the potentialities of this framework. STILL handles examples described as definite or constrained clauses, and uses sampling-based heuristics again to achieve any-time learning. Controlling both the construction and the exploitation of logic programs yields robust relational reasoning, where deductive biases are compensated for by inductive biases, and vice versa.
The aim of relational learning is to develop methods for the induction of hypotheses in representation formalisms that are more expressive than attribute-value representation. Most work on relational learning has been...
详细信息
The aim of relational learning is to develop methods for the induction of hypotheses in representation formalisms that are more expressive than attribute-value representation. Most work on relational learning has been focused on induction in subsets of first order logic like Horn clauses. In this paper we introduce the representation formalism based on feature terms and we introduce the corresponding notions of subsumption and anti-unification. Then we explain INDIE, a heuristic bottom-up learning method that induces class hypotheses, in the form of feature terms, from positive and negative examples. The biases used in INDIE while searching the hypothesis space are explained while describing INDIE's algorithms. The representational bias of INDIE can be summarised in that it makes an intensive use of sorts and sort hierarchy, and in that it does not use negation but focuses on detecting path equalities. We show the results of INDIE in some classical relational datasets showing that it's able to find hypotheses at a level comparable to the original ones. The differences between INDIE's hypotheses and those of the other systems are explained by the bias in searching the hypothesis space and on the representational bias of the hypothesis language of each system.
Often scientists need to locate appropriate software for their problems and then select from among many alternatives. We have previously proposed an approach for dealing with this task by processing performance data o...
详细信息
Often scientists need to locate appropriate software for their problems and then select from among many alternatives. We have previously proposed an approach for dealing with this task by processing performance data of the targeted software. This approach has been tested using a customized implementation referred to as PYTHIA. This experience made us realize the complexity of the algorithmic discovery of knowledge from performance data and of the management of these data together with the discovered knowledge. To address this issue, we created PYTHLA-II-a modular framework and system which combines a general knowledge discovery in databases (KDD) methodology and recommender system technologies to provide advice about scientific software/hardware artifacts. The functionality and effectiveness of the system is demonstrated for two existing performance studies using sets of software for solving partial differential equations. From the end-user perspective, PYTHIA-II allows users to specify the problem to be solved and their computational objectives. In turn, PYTHIA-II (i) selects the software available for the user's problem, (ii) suggests parameter values, and (iii) assesses the recommendation provided. PYTHIA-II provides all the necessary facilities to set up database schemas for testing suites and associated performance data in order to test sets of software. Moreover, it allows easy interfacing of alternative data mining and recommendation facilities. PYTHIA-II is an open-ended system implemented on public domain software and has been used for performance evaluation in several different problem domains.
We discuss the adoption of a three-valued setting for inductive concept learning. Distinguishing between what is true, what is false and what is unknown can be useful in situations where decisions have to be taken on ...
详细信息
We discuss the adoption of a three-valued setting for inductive concept learning. Distinguishing between what is true, what is false and what is unknown can be useful in situations where decisions have to be taken on the basis of scarce, ambiguous, or downright contradictory information. In a three-valued setting, we learn a definition for both the target concept and its opposite, considering positive and negative examples as instances of two disjoint classes. To this purpose, we adopt Extended logic Programs (ELP) under a Well-Founded Semantics with explicit negation (WFSX) as the representation formalism for learning, and show how ELPs can be used to specify combinations of strategies in a declarative way also coping with contradiction and exceptions. Explicit negation is used to represent the opposite concept, while default negation is used to ensure consistency and to handle exceptions to general rules. Exceptions are represented by examples covered by the definition for a concept that belong to the training set for the opposite concept. Standard inductive logic programming techniques are employed to learn the concept and its opposite. Depending on the adopted technique, we can learn the most general or the least general definition. Thus, four epistemological varieties occur, resulting from the combination of most general and least general solutions for the positive and negative concept. We discuss the factors that should be taken into account when choosing and strategically combining the generality levels for positive and negative concepts. In the paper, we also handle the issue of strategic combination of possibly contradictory learnt definitions of a predicate and its explicit negation. All in all, we show that extended logic programs under well-founded semantics with explicit negation add expressivity to learning tasks, and allow the tackling of a number of representation and strategic issues in a principled way. Our techniques have been implemented and exam
暂无评论