Sequential patternmining is new trend in datamining domain with many useful applications, especially commercial application but it also results surprised effect in adaptive learning. Suppose there is an adaptive e-l...
详细信息
When using granular computing for problem solving, one can focus on a specific level of understanding without looking at unwanted details of subsequent (more precise) levels. We present a granular computing framework ...
详细信息
When using granular computing for problem solving, one can focus on a specific level of understanding without looking at unwanted details of subsequent (more precise) levels. We present a granular computing framework for growing hierarchical self-organizing maps. This approach is ideal since the maps are arranged in a hierarchical manner and each is a complete abstraction of a pattern within data. The framework allows us to precisely define the connections between map levels. Formulating a neuron as a granule, the actions of granule construction and decomposition correspond to the growth and absorption of neurons in the previous model. In addition, we investigate the effects of updating granules with new information on both coarser and finer granules that have a derived relationship. Called bidirectional update propagation, the method ensures pattern consistency among data abstractions. An algorithm for the construction, decomposition, and updating of the granule-based self-organizing map is introduced. With examples, we demonstrate the effectiveness of this framework for abstracting patterns on many levels. (C) 2009 Elsevier B.V. All rights reserved.
We investigate here concept learning from incomplete examples, denoted here as ambiguous. We start from the learning from interpretations setting introduced by L. De Raedt and then follow the informal ideas presented ...
详细信息
ISBN:
(纸本)9783642030697
We investigate here concept learning from incomplete examples, denoted here as ambiguous. We start from the learning from interpretations setting introduced by L. De Raedt and then follow the informal ideas presented by H. Hirsh to extend the Version space paradigm to incomplete data: a hypothesis has to be compatible with all pieces of information provided regarding the examples. We propose and experiment an algorithm that given a set of ambiguous examples, learn a concept as an existential monotone DNF. We show that 1) boolean concepts can be learned, even with high incompleteness level as long as enough information is provided, and 2) monotone, non monotone DNF (i.e. including negative literals), and attribute-value hypotheses can be learned that way, using an appropriate background knowledge. We also show that a clever implementation, based on a multi-table representation is necessary to apply the method with high levels of incompleteness.
In a world where massive amounts of data are recorded on a large scale we need datamining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm ...
详细信息
ISBN:
(纸本)9783642030697
In a world where massive amounts of data are recorded on a large scale we need datamining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
The induction of knowledge from a data set. relies ill the execution of multiple datamining actions: to apply filters to clean and select the data, to train different algorithms (clustering, classification, regressio...
详细信息
ISBN:
(纸本)9783642030697
The induction of knowledge from a data set. relies ill the execution of multiple datamining actions: to apply filters to clean and select the data, to train different algorithms (clustering, classification, regression, association), to evaluate the results using different approaches (cross validation, statistical analysis), to visualize the, results, etc. In a real datamining process, previous actions are executed several times, sometimes in a loop, until an accurate result is obtained. However, performing previous tasks require's a datamining engineer or expert which supervises the design and evaluate the whole process. The goat of this paper is to describe MOLE, an architecture to automatize the data, mining process. The architecture assumes than die datamining process can be seen from a Classical planning perspective! and hence. that classical planning tools can be used to design process. MOLE is built and instantiated oil the basis of i) standard languages to describe the data set and the datamining process, ii) available Cools to design, execute and evaluate the datamining processes.
datamining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. In this paper, k-means clustering algorithm has been extensively studied ...
详细信息
ISBN:
(纸本)9781424429011
datamining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. In this paper, k-means clustering algorithm has been extensively studied for gene expression analysis. Since our purpose is to demonstrate the effectiveness of the k-means algorithm for a wide variety of data sets, we have chosen two patternrecognitiondata and thirteen microarray data sets with both overlapping and non-overlapping cluster boundaries, where the number of features/genes ranges from 4 to 7129 and number of sample ranges from 32 to 683. The number of clusters ranges from two to eleven. We use the clustering error rate (or, clustering accuracy) as evaluation metrics to measure the performance of k-means algorithm.
Existing datamining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Most relationships in spatial datasets are regional therefore there is a great...
详细信息
ISBN:
(纸本)9783642030697
Existing datamining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Most relationships in spatial datasets are regional therefore there is a great need to extract regional knowledge front spatial datasets. This paper proposes a novel framework to discover interesting regions characterized by "strong regional correlation relationships" between attributes. and methods to analyze differences and similarities between regions. The framework employs a two-phase approach: it first discovers regions by employing clustering algorithms that maximize a PCA-based fitness function and then applies post processing techniques to explain Underlying regional Structures and correlation patterns. Additionally, a new similarity measure that assesses the structural Similarity Of regions based on correlation sets is introduced. We evaluate our framework in I case study which centers on finding correlations between arsenic pollution and other factors in water wells and demonstrate that our framework effectively identifies regional correlation patterns.
Among the central challenges of Ambient Assisted Living systems are the autonomous and reliable recognition of the assisted person39;s current situation and the proactive offering and rendering of adequate assistanc...
详细信息
ISBN:
(纸本)9789639799424
Among the central challenges of Ambient Assisted Living systems are the autonomous and reliable recognition of the assisted person's current situation and the proactive offering and rendering of adequate assistance services. In the context of emergency support, such situations may be acute emergency situations or long-term deviations from typical behavior that will result in emergency situations in the future. To optimize the treatment of the former and the prevention of the latter, reliable recognition of characteristic activities of daily living is necessary. In this paper, we present our multi-agent-based activity recognition framework as well as experiences made with it. Besides a detailed discussion of our hybrid recognition approach, we also elaborate on the tailoring of the underlying reasoning models to the individual environments and users in an initial learning phase. Finally, we present experiences made with the recognition framework in our Ambient Assisted Living Laboratory.
In large scale applications, hundreds of new subjects may be regularly enrolled in a biometric system. To account for the variations in data distribution caused by these new enrollments, biometric systems require regu...
详细信息
ISBN:
(纸本)9781424450190
In large scale applications, hundreds of new subjects may be regularly enrolled in a biometric system. To account for the variations in data distribution caused by these new enrollments, biometric systems require regular re-training which usually results in a very large computational overhead. This paper formally introduces the concept of online learning in biometrics. We demonstrate its application in classifier update algorithms to re-train classifier decision boundaries. Specifically, the algorithm employs online learning technique in a 2v-Granular Soft Support Vector machine for rapidly training and updating face recognition systems. The proposed online classifier is used in a face recognition application for classifying genuine and impostor match scores impacted by different covariates. Experiments on a heterogeneous face database of 1,194 subjects show that the proposed online classifier not only improves the verification accuracy but also significantly reduces the computational cost.
We talk about two key aspects of the quality-time trade-offs in time limited search based reasoning namely, design of efficient anytime algorithms and formulations for meta-reasoning (or control) to optimize the compu...
详细信息
ISBN:
(纸本)9783642111631
We talk about two key aspects of the quality-time trade-offs in time limited search based reasoning namely, design of efficient anytime algorithms and formulations for meta-reasoning (or control) to optimize the computational trade-off under various constrained environments. We present the ideas behind novel anytime heuristic search algorithms, both contract and interruptible. We also describe new meta-control strategies that address parameter control along with time deliberation.
暂无评论