the induction of knowledge from a data set. relies ill the execution of multiple datamining actions: to apply filters to clean and select the data, to train different algorithms (clustering, classification, regressio...
详细信息
ISBN:
(纸本)9783642030697
the induction of knowledge from a data set. relies ill the execution of multiple datamining actions: to apply filters to clean and select the data, to train different algorithms (clustering, classification, regression, association), to evaluate the results using different approaches (cross validation, statistical analysis), to visualize the, results, etc. In a real datamining process, previous actions are executed several times, sometimes in a loop, until an accurate result is obtained. However, performing previous tasks require's a datamining engineer or expert which supervises the design and evaluate the whole process. the goat of this paper is to describe MOLE, an architecture to automatize the data, mining process. the architecture assumes than die datamining process can be seen from a Classical planning perspective! and hence. that classical planning tools can be used to design process. MOLE is built and instantiated oil the basis of i) standard languages to describe the data set and the datamining process, ii) available Cools to design, execute and evaluate the datamining processes.
When using granular computing for problem solving, one can focus on a specific level of understanding without looking at unwanted details of subsequent (more precise) levels. We present a granular computing framework ...
详细信息
When using granular computing for problem solving, one can focus on a specific level of understanding without looking at unwanted details of subsequent (more precise) levels. We present a granular computing framework for growing hierarchical self-organizing maps. this approach is ideal since the maps are arranged in a hierarchical manner and each is a complete abstraction of a pattern within data. the framework allows us to precisely define the connections between map levels. Formulating a neuron as a granule, the actions of granule construction and decomposition correspond to the growth and absorption of neurons in the previous model. In addition, we investigate the effects of updating granules with new information on both coarser and finer granules that have a derived relationship. Called bidirectional update propagation, the method ensures pattern consistency among data abstractions. An algorithm for the construction, decomposition, and updating of the granule-based self-organizing map is introduced. With examples, we demonstrate the effectiveness of this framework for abstracting patterns on many levels. (C) 2009 Elsevier B.V. All rights reserved.
Clustering is a, widely used unsupervised data analysis technique in machinelearning. However, a common requirement amongst many existing clustering methods is that all pairwise distances between patterns must be com...
详细信息
ISBN:
(纸本)9783642040306
Clustering is a, widely used unsupervised data analysis technique in machinelearning. However, a common requirement amongst many existing clustering methods is that all pairwise distances between patterns must be computed in advance. this makes it computationallly expensive and difficult to cope with large scale data used in several applications, such as in bioinformatics. In this paper we propose a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small fraction of the entire data, while the remaining data is processed sequentially and the tree adapted constructively. Preliminary results using this approach show that the quality of the clusters obtained does not degrade while reducing the computational needs.
Integration methods for ensemble learning can use two different approaches: combination or selection. the combination approach (also called fusion) consists on the combination of the predictions obtained by different ...
详细信息
ISBN:
(纸本)9783642030697
Integration methods for ensemble learning can use two different approaches: combination or selection. the combination approach (also called fusion) consists on the combination of the predictions obtained by different models in the ensemble to obtain the final ensemble predication. the selection approach selects one (or more) models from the ensemble according to the prediction performance of these models on similar data from the validation set. Usually, the method to select similar data is the k-nearest neighbors withthe Euclidean distance. In this paper we discuss other approaches to obtain similar data for the regression problem. We show that using similarity measures according to the target values improves results. We also show that selecting dynamically several models for the prediction task increases prediction accuracy comparing to the selection of just one model.
there has been significant recent interest in sparse metric learning (SML) in which we simultaneously learn both a good distance metric and a low-dimensional representation. Unfortunately, the performance of existing ...
详细信息
ISBN:
(纸本)9781424452422
there has been significant recent interest in sparse metric learning (SML) in which we simultaneously learn both a good distance metric and a low-dimensional representation. Unfortunately, the performance of existing sparse metric learning approaches is usually limited because the authors assumed certain problem relaxations or they target the SML objective indirectly. In this paper, we propose a Generalized Sparse Metric learning method (GSML). this novel framework offers a unified view for understanding many of the popular sparse metric learning algorithms including the Sparse Metric learning framework proposed in [15], the Large Margin Nearest Neighbor (LMNN) [21][22], and the D-ranking Vector machine (D-ranking VM) [14]. Moreoven GSML also establishes a close relationship withthe Pairwise Support Vector machine [20]. Furthermore, the proposed framework is capable of extending many current non-sparse metric learning models such as Relevant Vector machine (RCA) [4] and a state-of-the-art method proposed in [23] into their sparse versions. We present the detailed framework, provide theoretical justifications, build various connections with other models, and propose a practical iterative optimization method, making the framework boththeoretically important and practically scalable for medium or large datasets. A series of experiments show that the proposed approach can outperform previous methods in terms of both test accuracy and dimension reduction, on six realworld benchmark datasets.
Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in Wine of ensemble classification models, few of them could adapt to...
详细信息
ISBN:
(纸本)9783642030697
Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in Wine of ensemble classification models, few of them could adapt to Hie detection oil the different types of concept drifts from noisy streaming data in a demand on overheads of time and space. Motivated by this, a new classification algorithm for Concept drifting Detection based on an ensembling model of Random Decision Trees (called CDRDT) is proposed in this paper. Extensive studies with synthetic and real streaming dam demonstrate that in comparison to several classification algorithms for concept drifting data streams, CDRDT not only could effectively and efficiently detect the potential concept changes in the noisy data streams, but also performs much better oil the abilities of runtime and space with an improvement in predictive accuracy. thus, our proposed algorithm provides a significant reference to the classification for concept drifting data streams with noise in a light, weight way.
there exist several music composition systems that generate blues chord progressions, jazz improvisation, or classical pieces. Such systems often work by applying a set of rules explicitly provided to the system to de...
详细信息
While for many problems in medicine classification models are being developed, Bayesian network classifiers do not seem to have become is widely accepted within the medical community as logistic regression models. We ...
详细信息
ISBN:
(纸本)9783642030697
While for many problems in medicine classification models are being developed, Bayesian network classifiers do not seem to have become is widely accepted within the medical community as logistic regression models. We compare first-order logistic regression and naive Bayesian classification in the domain of reproductive medicine and demonstrate that the two techniques can result in models of comparable performance. For Bayesian network classifiers to become more widely accepted within the medical community, we feel that they should be better aligned withtheir context of application. We describe how to incorporate well-known concepts of clinical relevance in the process Of Constructing and evaluating Bayesian network classifiers to achieve Such an alignment.
Different efforts have been done to address the problem of information overload on the Internet. Recommender systems aim at directing users through this information space, toward the resources that best meet their nee...
详细信息
ISBN:
(纸本)9781424446803
Different efforts have been done to address the problem of information overload on the Internet. Recommender systems aim at directing users through this information space, toward the resources that best meet their needs and interests by extracting knowledge from the previous users' interactions. In this paper, we propose an algorithm to solve the web page recommendation problem. In our algorithm, we use distributed learning automata to learn the behavior of previous users' and recommend pages to the current user based on learned pattern. Our experiments on real data set show that the proposed algorithm performs better than the other algorithms that we compared to and, at the same time, it is less complex than other algorithms with respect to memory usage and computational cost too.
With widespread use of microarray technology as a potential diagnostics tool, the comparison of results obtained from the use of different platforms is of interest. When inference methods are designed using data colle...
详细信息
ISBN:
(纸本)9783642040306
With widespread use of microarray technology as a potential diagnostics tool, the comparison of results obtained from the use of different platforms is of interest. When inference methods are designed using data collected using a particular platform, they are unlikely to work directly on measurements taken from a different type of array. We report on this cross-platform transfer problem, and show that, working with transcriptome representations at binary numerical precision, similar to the gene expression bar code method, helps circumvent the variability across platforms in several cancer classification tasks. We compare our approach with a recent machinelearning method specifically designed for shifting distributions, i.e., problems in which the training and testing data are not, drawn from identical probability distributions, and show superior performance in three of the four problems in which we could directly compare.
暂无评论