We investigate a committee-based approach for active learning of real-valued functions. this is a variance-only strategy for selection of informative training data. As such it is shown to suffer when the model class i...
详细信息
ISBN:
(纸本)9783540772255
We investigate a committee-based approach for active learning of real-valued functions. this is a variance-only strategy for selection of informative training data. As such it is shown to suffer when the model class is misspecified since the learner's bias is high. Conversely, the strategy outperforms passive selection when the model class is very expressive since active minimization of the variance avoids overfitting.
there are lots of validation indexes and techniques to study clustering results. Biclustering algorithms have been applied in Systems Biology, principally in DNA Microarray analysis, for the last years, with great suc...
详细信息
ISBN:
(纸本)9783540772255
there are lots of validation indexes and techniques to study clustering results. Biclustering algorithms have been applied in Systems Biology, principally in DNA Microarray analysis, for the last years, with great success. Nowadays, there is a big set of biclustering algorithms each one based in different concepts, but there are few intercomparisons that measure their performance. We review and present here some numerical measures, new and evolved from traditional clustering validation techniques, to allow comparisons and validation of biclustering algorithms.
We show how a previously derived method of using reinforcement learning for supervised clustering of a data set can lead to a sub-optimal solution if the cluster prototypes are initialised to poor positions. We then d...
详细信息
ISBN:
(纸本)9783540772255
We show how a previously derived method of using reinforcement learning for supervised clustering of a data set can lead to a sub-optimal solution if the cluster prototypes are initialised to poor positions. We then develop three novel reward functions which show great promise in overcoming poor initialization. We illustrate the results on several data sets. We then use the clustering methods with an underlying latent space which enables us to create topology preserving mappings. We illustrate this method on both real and artificial data sets.
this paper describes a state-of-the-art parallel data mining solution that employs wavelet analysis for scalable outlier detection in large complex spatio-temporal data. the algorithm has been implemented on multiproc...
详细信息
ISBN:
(纸本)9783540772255
this paper describes a state-of-the-art parallel data mining solution that employs wavelet analysis for scalable outlier detection in large complex spatio-temporal data. the algorithm has been implemented on multiprocessor architecture and evaluated on real-world meteorological data. Our solution on high-performance architecture can process massive and complex spatial data at reasonable time and yields improved prediction.
We apply learning vector quantization to the analysis of tiling microarray data. As an example we consider the classification of C. elegans genomic probes as intronic or exonic. Training is based on the current annota...
详细信息
ISBN:
(纸本)9783540772255
We apply learning vector quantization to the analysis of tiling microarray data. As an example we consider the classification of C. elegans genomic probes as intronic or exonic. Training is based on the current annotation of the genome. Relevance learning techniques are used to weight and select features according to their importance for the classification. Among other findings, the analysis suggests that correlations between the perfect match intensity of a particular probe and its neighbors are highly relevant for successful exon identification.
To be successful with certain classification problems or knowledge discovery tasks it is not sufficient to look at the available variables at a single point in time, but their development has to be traced over a perio...
详细信息
ISBN:
(纸本)9783540772255
To be successful with certain classification problems or knowledge discovery tasks it is not sufficient to look at the available variables at a single point in time, but their development has to be traced over a period of time. It is shown that patterns and sequences of labeled intervals represent a particularly well suited data format for this purpose. An extension of existing classifiers is proposed that enables them to handle this kind of sequential data. Compared to earlier approaches the expressiveness of the pattern language (using Allen et al.'s interval relationships) is increased, which allows the discovery of many temporal patterns common to real-world applications.
this paper illustrates how to compare different agent-based models and how to compare an agent-based model with real data. As examples we investigate ARFIMA models, the probability density function, and the spectral d...
详细信息
ISBN:
(纸本)9783540772255
this paper illustrates how to compare different agent-based models and how to compare an agent-based model with real data. As examples we investigate ARFIMA models, the probability density function, and the spectral density function. We illustrate the methodology in an analysis of the agent-based model developed by Levy, Levy, Solomon (2000), and confront it withthe S&P 500 for a comparison with real life data.
In this paper we propose to do portfolio management using reinforcement learning (RL) and independent factor model. Factors in independent factor model are mutually independent and exhibit better predictability. RL is...
详细信息
ISBN:
(纸本)9783540772255
In this paper we propose to do portfolio management using reinforcement learning (RL) and independent factor model. Factors in independent factor model are mutually independent and exhibit better predictability. RL is applied to each factor to capture temporal dependence and provide investment suggestion on factor. Optimal weights on factors are found by portfolio optimization method subject to the investment suggestions and general portfolio constraints. Experimental results and analysis are given to show that the proposed method has better performance when compare to two alternative portfolio management systems.
Frequent disjunctive pattern is known to be a sophisticated method of text mining in a single document that satisfies anti-monotonicity, by which we can discuss efficient algorithm based on APRIORI. In this work, we p...
详细信息
ISBN:
(纸本)9783540772255
Frequent disjunctive pattern is known to be a sophisticated method of text mining in a single document that satisfies anti-monotonicity, by which we can discuss efficient algorithm based on APRIORI. In this work, we propose a new online and single-pass algorithm by which we can extract current frequent disjunctive patterns by a weighting method for past events from a news stream. And we discuss some experimental results.
this paper presents a tool for web usage mining. the aim is centered on providing a tool that facilitates the mining process rather than implement elaborated algorithms and techniques. the tool covers different phases...
详细信息
ISBN:
(纸本)9783540772255
this paper presents a tool for web usage mining. the aim is centered on providing a tool that facilitates the mining process rather than implement elaborated algorithms and techniques. the tool covers different phases of the CRISP-DM methodology as data, preparation, data selection, modeling and evaluation. the algorithms used in the modeling phase are those implemented in the Weka project. the tool has been tested in a web site to find access and navigation patterns.
暂无评论