Based on work [1] we investigate the quantifying of statistical structural model parameters of inflation in Slovak economics. Dynamic and SVM's (Supper Vector machine) modelling approaches are used for automated s...
详细信息
ISBN:
(纸本)9788096956241
Based on work [1] we investigate the quantifying of statistical structural model parameters of inflation in Slovak economics. Dynamic and SVM's (Supper Vector machine) modelling approaches are used for automated specification of a functional form of the model in datamining systems. Based on dynamic modelling, we provide the fit of the inflation models over the period 1993-2003 in the Slovak Republic, and use them as a tool to compare their forecasting abilities withthose obtained using SVM's method. Some methodological contributions are made to dynamic and SVM's modelling approaches in economics and to their use in datamining systems. the study discusses, analytically and numerically demonstrates the quality and interpretability of the obtained results. the SVM's methodology is extended to predict the time series models
the work presented here focuses on combining multiple classifiers to form single classifier for pattern classification, machinelearning for expert system, and datamining tasks. the basis of the combination is that e...
详细信息
ISBN:
(纸本)9783540770459
the work presented here focuses on combining multiple classifiers to form single classifier for pattern classification, machinelearning for expert system, and datamining tasks. the basis of the combination is that efficient concept learning is possible in many cases when the concepts learned from different approaches are combined to a more efficient concept. the experimental result of the algorithm, EMRL in a representative collection of different domain shows that it performs significantly better than the several state-of-the-art individual classifier, in case of 11 domains out of 25 data sets whereas the state-of-the-art individual classifier performs significantly better than EMRL only in 5 cases.
the mining of frequent itemsets has been extensively studied in datamining, and many methods have been proposed for this problem. However, mining all the frequent itemsets will lead to a huge number of itemsets and n...
详细信息
ISBN:
(纸本)9781424409723
the mining of frequent itemsets has been extensively studied in datamining, and many methods have been proposed for this problem. However, mining all the frequent itemsets will lead to a huge number of itemsets and numerous redundant association rules. Fortunately, this problem can be cured by mining only frequent closed itemsets (FCls), which results in a much smaller number of itemsets. Nevertheless, it is still difficult to find FCIs when the database becomes too large to allow a memory-resident representation. In this paper, a methodology called hierarchical partitioning is proposed for dividing the database into a set of multi-leveled sub-databases of manageable sizes to fit into memory. the advantage of hierarchical partitioning is that the FCIs can be found directly from sub-databases without rescanning the original database for support and subset checking.
We address the problem of learning automatically to map heterogeneous semi-structured documents onto a mediated target XML schema. We adopt a machinelearning approach where the mapping between input and target docume...
详细信息
ISBN:
(纸本)9783540734987
We address the problem of learning automatically to map heterogeneous semi-structured documents onto a mediated target XML schema. We adopt a machinelearning approach where the mapping between input and target documents is learned from a training corpus of documents. We first introduce a general stochastic model of semi structured documents generation and transformation. this model relies on the concept of meta-document which is a latent variable providing a link between input and target documents. It allows us to learn the correspondences when the input documents are expressed in a large variety of schemas. We then detail an instance of the general model for the particular task of HTML to XML conversion. this instance is tested on three different corpora using two different inference methods: a dynamic programming method and an approximate LaSO-based method.
Given any generative classifier based on an inexact density model, we can define a discriminative counterpart that reduces its asymptotic error rate, while increasing the estimation variance. An optimal bias-variance ...
详细信息
ISBN:
(纸本)9780769530697
Given any generative classifier based on an inexact density model, we can define a discriminative counterpart that reduces its asymptotic error rate, while increasing the estimation variance. An optimal bias-variance balance might be found using Hybrid Generative-Discriminative (HGD) approaches. In these paper these methods are defined in a unfied framework. this allow us to find sufficient conditions under which an improvement in generalization performances is guaranteed Numerical experiments illustrate the well fondness of our statements.
the recently introduced transductive confidence machines (TCMs) framework allows to extend classifiers such that they satisfy the calibration property. this means that the error rate can be set by the user prior to cl...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
the recently introduced transductive confidence machines (TCMs) framework allows to extend classifiers such that they satisfy the calibration property. this means that the error rate can be set by the user prior to classification. An analytical proof of the calibration property was given for TCMs applied in the on-line learning setting. However, the nature of this learning setting restricts the applicability of TCMs. In this paper we provide strong empirical evidence that the calibration property also holds in the off-line learning setting. Our results extend the range of applications in which TCMs can be applied. We may conclude that TCMs are appropriate in virtually any application domain.
there have been many studies on efficient discovery of frequent patterns in large databases. the usual framework is to use a minimal support threshold to obtain all frequent patterns. However, it is nontrivial for use...
详细信息
ISBN:
(纸本)9781424409723
there have been many studies on efficient discovery of frequent patterns in large databases. the usual framework is to use a minimal support threshold to obtain all frequent patterns. However, it is nontrivial for users to choose a suitable minimal support threshold. In this paper, a new mining task called mining top-rank-k frequent patterns, where k is the biggest rank value of all frequent patterns to be mined, has been proposed. After deep analyzing the properties of top-rank-k frequent patterns, we propose an efficient algorithm called FAE to mining top-rank-k frequent patterns. FAE is the abbreviation of "Filtering and Extending". During the mining process of FAE, the undesired patterns are filtered and useful patterns are selected to generate other longer potential frequent patterns. this strategy greatly reduces the search space. We also present results of applying these algorithms to a synthetic data set, which show the effectiveness of our algorithms.
We present a search algorithm for mining closed sets in high dimensional binary datasets. Our algorithm is designed for dense datasets, where the percentage of 1's in the dataset is usually higher than 10%, and th...
详细信息
ISBN:
(纸本)9780769530697
We present a search algorithm for mining closed sets in high dimensional binary datasets. Our algorithm is designed for dense datasets, where the percentage of 1's in the dataset is usually higher than 10%, and the total number of closed sets is much larger than the number of objects in the dataset. Our algorithm is memory efficient since, unlike many other closed set mining algorithms, it does not require all patterns mined so far to be kept in the memory. Optimization techniques are introduced in this paper and we also present a parallel version of our algorithm.
this paper focuses on a method of defining target variables, which is not defined accurately in datamining and its application. In the area of datamining, there is a class of target variables that can't be defin...
详细信息
ISBN:
(纸本)9781424409723
this paper focuses on a method of defining target variables, which is not defined accurately in datamining and its application. In the area of datamining, there is a class of target variables that can't be defined accurately, such as the definitions of high-end customers and degree of customer loyalty in the customer relationship. However, defining target variables is pre-requisite to construct a model of supervised learning. this paper introduces a method - "thermometer-Onion" for defining the target variables and presents an example to illustrate its application.
mining maximal frequent itemsets in data streams is more difficult than miningthem in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pa...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
mining maximal frequent itemsets in data streams is more difficult than miningthem in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pass algorithm called FpMFI-DS, which mines all maximal frequent itemsets in Landmark windows or Sliding windows in data streams based on FP-Tree. A new structure of FP-Tree is designed for storing all transactions in Landmark windows or Sliding windows in data streams. To improve the efficiency of the algorithm, a new pruning technique, extension support equivalency pruning (ESEquivPS), is imported to it. the experiments show that our algorithm is efficient and scalable. It is suitable for mining MFIs both in static database and in data streams.
暂无评论