Exploiting mutual explanations for interactive learning is presented as part of an interdisciplinary research project on transparent machine learning for medical decision support. Focus of the project is to combine de...
详细信息
Exploiting mutual explanations for interactive learning is presented as part of an interdisciplinary research project on transparent machine learning for medical decision support. Focus of the project is to combine deep learning black box approaches with interpretable machine learning for classification of different types of medical images to combine the predictive accuracy of deep learning and the transparency and comprehensibility of interpretable models. Specifically, we present an extension of the inductive logic programming system Aleph to allow for interactive learning. Medical experts can ask for verbal explanations. They can correct classification decisions and in addition can also correct the explanations. Thereby, expert knowledge can be taken into account in form of constraints for model adaption.
Motivation: Atomic resolution modeling of large multimolecular assemblies is a key task in Structural Cell Biology. Experimental techniques can provide atomic resolution structures of single proteins and small complex...
详细信息
Motivation: Atomic resolution modeling of large multimolecular assemblies is a key task in Structural Cell Biology. Experimental techniques can provide atomic resolution structures of single proteins and small complexes, or low resolution data of large multimolecular complexes. Results: We present a novel integrative computational modeling method, which integrates both low and high resolution experimental data. The algorithm accepts as input atomic resolution structures of the individual subunits obtained from X-ray, NMR or homology modeling, and interaction data between the subunits obtained from mass spectrometry. The optimal assembly of the individual subunits is formulated as an Integer Linear programming task. The method was tested on several representative complexes, both in the bound and unbound cases. It placed correctly most of the subunits of multimolecular complexes of up to 16 subunits and significantly outperformed the CombDock and Haddock multimolecular docking methods.
Sequential data represent an important source of potentially new medical knowledge. However, this type of data is rarely provided in a format suitable for immediate application of conventional mining algorithms. This ...
详细信息
Sequential data represent an important source of potentially new medical knowledge. However, this type of data is rarely provided in a format suitable for immediate application of conventional mining algorithms. This paper summarizes and compares three different sequential mining approaches based, respectively, on windowing, episode rules, and inductive logic programming. Windowing is one of the essential methods of data preprocessing. Episode rules represent general sequential mining, while inductive logic programming extracts first-order features whose structure is determined by background knowledge. The three approaches are demonstrated and evaluated in terms of a case study STULONG. It is a longitudinal preventive study of atherosclerosis where the data consist of a series of long-term observations recording the development of risk factors and associated conditions. The intention is to identify frequent sequential/temporal patterns. Possible relations between the patterns and an onset of any of the observed cardiovascular diseases are also studied.
Conception, design, and implementation of cDNA microarray experiments present a variety of bioinformatics challenges for biologists and computational scientists. The multiple stages of data acquisition and analysis ha...
详细信息
Conception, design, and implementation of cDNA microarray experiments present a variety of bioinformatics challenges for biologists and computational scientists. The multiple stages of data acquisition and analysis have motivated the design of Expresso, a system for microarray experiment management. Salient aspects of Expresso include support for clone replication and randomized placement;automatic gridding, extraction of expression data from each spot, and quality monitoring;flexible methods of combining data from individual spots into information about clones and functional categories;and the use of inductive logic programming for higher-level data analysis and mining. The development of Expresso is occurring in parallel with several generations of microarray experiments aimed at elucidating genomic responses to drought stress in loblolly pine seedlings. The current experimental design incorporates 384 pine cDNAs replicated and randomly placed in two specific microarray layouts. We describe the design of Expresso as well as results of analysis with Expresso that suggest the importance of molecular chaperones and membrane transport proteins in mechanisms conferring successful adaptation to long-term drought stress. Copyright (C) 2002 John Wiley Sons, Ltd.
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of ...
详细信息
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem. The motivation for this novel approach is twofold. First, exploratory data mining isi well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings. We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.
Relative least general generalization, proposed by Plotkin, is widely used for generalizing first-order clauses in inductive logic programming, and this paper describes an extension of Plotkin's work to allow vari...
详细信息
Relative least general generalization, proposed by Plotkin, is widely used for generalizing first-order clauses in inductive logic programming, and this paper describes an extension of Plotkin's work to allow various computation domains: Herbrand Universe, sets, numerical data, etc. The theta-subsumption in Plotkin's framework is replaced by a more general constraint-based subsumption. Since this replacement is analogous to that of unification by constraint solving in Constraint logicprogramming, the resultant method can be viewed as a Constraint logicprogramming version of relative least general generalization. Constraint-based subsumption, however, leads to a search on an intractably large hypothesis space. We therefore provide meta-level constraints that are used as semantic bias on the hypothesis language. The constraints functional dependency and monotonicity are introduced by analyzing clausal relationships. Finally, the advantage of the proposed method is demonstrated through a simple layout problem, where geometric constraints used in space planning tasks are produced automatically.
Given a set of candidate Datalog rules, the Datalog synthesis-as-rule-selection problem chooses a subset of these rules that satisfies a specification (such as an input-output example). Building off prior work using c...
详细信息
Given a set of candidate Datalog rules, the Datalog synthesis-as-rule-selection problem chooses a subset of these rules that satisfies a specification (such as an input-output example). Building off prior work using counterexample-guided inductive synthesis, we present a progression of three solver-based approaches for solving Datalog synthesis-as-rule-selection problems. Two of our approaches offer some advantages over existing approaches, and can be used more generally to solve arbitrary SMT formulas containing Datalog predicates;the third-an encoding into standard, off-the-shelf answer set programming (ASP)-leads to significant speedups (similar to 9x geomean) over the state of the art while synthesizing higher quality programs. Our progression of solutions explores the space of interactions between SAT/SMT and Datalog, identifying ASP as a promising tool for working with and reasoning about Datalog. Along the way, we identify Datalog programs as monotonic SMT theories, which enjoy particularly efficient interactions in SMT;our plugins for popular SMT solvers make it easy to load an arbitrary Datalog program into the SMT solver as a custom monotonic theory. Finally, we evaluate our approaches using multiple underlying solvers to provide a more thorough and nuanced comparison against the current state of the art.
SimStudent is a machine-learning agent initially developed to help novice authors to create cognitive tutors without heavy programming. Integrated into an existing suite of software tools called Cognitive Tutor Author...
详细信息
SimStudent is a machine-learning agent initially developed to help novice authors to create cognitive tutors without heavy programming. Integrated into an existing suite of software tools called Cognitive Tutor Authoring Tools (CTAT), SimStudent helps authors to create an expert model for a cognitive tutor by tutoring SimStudent on how to solve problems. There are two different ways to author an expert model with SimStudent. In the context of Authoring by Tutoring, the author interactively tutors SimStudent by posing problems to SimStudent, providing feedback on the steps performed by SimStudent, and also demonstrating steps as a response to SimStudent's hint requests when SimStudent cannot perform steps correctly. In the context of Authoring by Demonstration, the author demonstrates solution steps, and SimStudent attempts to induce underlying domain principles by generalizing those worked-out examples. We conducted evaluation studies to investigate which authoring strategy better facilitates authoring and found two key results. First, the expert model generated with Authoring by Tutoring is better and has higher accuracy while maintaining the same level of completeness than the one generated with Authoring by Demonstration. The reason for this better accuracy is that the expert model generated by tutoring benefits from negative feedback provided for SimStudent's incorrect production applications. Second, authoring by Tutoring requires less time than Authoring by Demonstration. This enhanced authoring efficiency is partially because (a) when Authoring by Demonstration, the author needs to test the quality of the expert model, whereas the formative assessment of the expert model is done naturally by observing SimStudent's performance when Authoring by Tutoring, and (b) the number of steps that need to be demonstrated during tutoring decreases as learning progresses.
There have been significant efforts to understand, describe, and predict the social commerce intention of users in the areas of social commerce and web data management. Based on recent developments in knowledge graph ...
详细信息
There have been significant efforts to understand, describe, and predict the social commerce intention of users in the areas of social commerce and web data management. Based on recent developments in knowledge graph and inductive logic programming in artificial intelligence, in this paper, we propose a knowledge-graph-based social commerce intention analysis method. In particular, a knowledge base is constructed to represent the social commerce environment by integrating information related to social relationships, social commerce factors, and domain background knowledge. In this study, knowledge graphs are used to represent and visualize the entities and relationships related to social commerce, while inductive logic programming techniques are used to discover implicit information that can be used to interpret the information behaviors and intentions of the users. Evaluation tests confirmed the effectiveness of the proposed method. In addition, the feasibility of using knowledge graphs and knowledge-based data mining techniques in the social commerce environment is also confirmed.
暂无评论