This paper formalises the concept of learning symbolic rules from multisource data in a cardiac monitoring context. Our sources, electrocardiograms and arterial blood pressure measures, describe cardiac behaviours fro...
详细信息
This paper formalises the concept of learning symbolic rules from multisource data in a cardiac monitoring context. Our sources, electrocardiograms and arterial blood pressure measures, describe cardiac behaviours from different viewpoints. To learn interpretable rules, we use an inductive logic programming (ILP) method. We develop an original strategy to cope with the dimensionality issues caused by using this ILP technique on a rich multisource language. The results show that our method greatly improves the feasibility and the efficiency of the process while staying accurate. They also confirm the benefits of using multiple sources to improve the diagnosis of cardiac arrhythmias.
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow kn...
详细信息
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources.
Multi-relational data mining (MRDM) is to enumerate frequently appeared patterns in data, the patterns which are appeared not only in a relational table but over a collection of tables. Although a database usually con...
详细信息
ISBN:
(纸本)9780769539232
Multi-relational data mining (MRDM) is to enumerate frequently appeared patterns in data, the patterns which are appeared not only in a relational table but over a collection of tables. Although a database usually consists of many relational tables, most of data mining approaches treat patterns only on a table. An approach based on ILP (inductive logic programming) is a promising approach and it treats patterns on many tables. Pattern miners based on the ILP approach produce expressive patterns and are wide-applicative but computationally expensive. MAPIX[2] has an advantage that it constructs patterns by combining atomic properties extracted from sampled examples. By restricting patterns into combinations of the atomic properties it gained efficiency compared with other algorithms. In order to scale MAPIX to treat large dataset on standard relational database systems, this paper studies implementation issues.
It is well known that modeling with constraints networks require a fair expertise. Thus tools able to automatically generate such networks have gained a major interest. The major contribution of this paper is to set a...
详细信息
ISBN:
(纸本)9780769542638
It is well known that modeling with constraints networks require a fair expertise. Thus tools able to automatically generate such networks have gained a major interest. The major contribution of this paper is to set a new framework based on inductive logic programming able to build a constraint model from solutions and non-solutions of related problems. The model is expressed in a middle-level modeling language. On this particular relational learning problem, traditional top-down search methods fall into blind search and bottom-up search methods produce too expensive coverage tests. Recent works in inductive logic programming about phase transition and crossing plateau shows that no general solution can face all these difficulties. In this context, we have designed an algorithm combining the major qualities of these two types of search techniques. We present experimental results on some benchmarks ranging from puzzles to scheduling problems.
This paper reviews experiments with an approach to discovery through robot's experimentation in its environment. In addition to discovering laws that enable predictions, we are particularly interested in the mecha...
详细信息
In this paper we present a method for semantic annotation of texts, which is based on a deep linguistic analysis (DLA) and inductive logic programming (ILP). The combination of DLA and ILP have following benefits: Man...
详细信息
ISBN:
(纸本)9783642177484
In this paper we present a method for semantic annotation of texts, which is based on a deep linguistic analysis (DLA) and inductive logic programming (ILP). The combination of DLA and ILP have following benefits: Manual selection of learning features is not needed. The learning procedure has full available linguistic information at its disposal and it is capable to select relevant parts itself. Learned extraction rules can be easily visualized, understood and adapted by human. A description, implementation and initial evaluation of the method are the main contributions of the paper.
In order to find an effective for the disambiguation, we explore the ways of complementing statistical approaches with the use of 'domain theories', and suppose that disambiguation decisions can supply tacit i...
详细信息
ISBN:
(纸本)9780878492695
In order to find an effective for the disambiguation, we explore the ways of complementing statistical approaches with the use of 'domain theories', and suppose that disambiguation decisions can supply tacit information about such theories, and the theories can be in part automatically induced from such data. The experiment results can be used successfully in disambiguating other sentences from the same domain.
State-of-the-art theta-subsumption engines like Django (C) and Resumer2 (Java) are implemented in imperative languages. Since theta-subsumption is inherently a logic problem, in this paper we explore how to e ffi cien...
详细信息
ISBN:
(纸本)9783939897170
State-of-the-art theta-subsumption engines like Django (C) and Resumer2 (Java) are implemented in imperative languages. Since theta-subsumption is inherently a logic problem, in this paper we explore how to e ffi ciently implement it in Prolog. theta-subsumption is an important problem in computational logic and particularly relevant to the inductive logic programming (ILP) community as it is at the core of the hypotheses coverage test which is often the bottleneck of an ILP system. Also, since most of those systems are implemented in Prolog, they can immediately take advantage of a Prolog based theta-subsumption engine. We present a relatively simple (approximate to 1000 lines in Prolog) but e ffi cient and general theta-subsumption engine, Subsumer. Crucial to Subsumer's performance is the dynamic and recursive decomposition of a clause in sets of independent components. Also important are ideas borrowed from constraint programming that empower Subsumer to e ffi ciently work on clauses with up to several thousand literals and several dozen distinct variables. Using the notoriously challenging Phase Transition dataset we show that, cputime wise, Subsumer clearly outperforms the Django subsumption engine and is competitive with the more sophisticated, state-of-the-art, Resumer2. Furthermore, Subsumer's memory requirements are only a small fraction of those engines and it can handle arbitrary Prolog clauses whereas Django and Resumer2 can only handle Datalog clauses.
Background: Chemical compounds affecting a bioactivity can usually be classified into several groups, each of which shares a characteristic substructure. We call these substructures "basic active structures"...
详细信息
Background: Chemical compounds affecting a bioactivity can usually be classified into several groups, each of which shares a characteristic substructure. We call these substructures "basic active structures" or BASs. The extraction of BASs is challenging when the database of compounds contains a variety of skeletons. Data mining technology, associated with the work of chemists, has enabled the systematic elaboration of BASs. Results: This paper presents a BAS knowledge base, BASIC, which currently covers 46 activities and is available on the Internet. We use the dopamine agonists D1, D2, and Dauto as examples and illustrate the process of BAS extraction. The resulting BASs were reasonably interpreted after proposing a few template structures. Conclusions: The knowledge base is useful for drug design. Proposed BASs and their supporting structures in the knowledge base will facilitate the development of new template structures for other activities, and will be useful in the design of new lead compounds via reasonable interpretations of active structures.
We propose a 2-stage ILP-based design algorithm for hybrid-HOXCs based optical networks. The hybrid-HOXC consists of an optical waveband cross-connect and an electrical cross-connect which grooms only wavelength paths...
详细信息
ISBN:
(纸本)9780819485571;0819485578
We propose a 2-stage ILP-based design algorithm for hybrid-HOXCs based optical networks. The hybrid-HOXC consists of an optical waveband cross-connect and an electrical cross-connect which grooms only wavelength paths. Its effectiveness is evaluated through numerical experiments. Impact of electrical/optical port cost ratio on the total network cost is also investigated.
暂无评论