the area of interest for this paper covers patternrecognition method, which can find and classify all useful relations between data entries in the time series. Genetic algorithm has been deployed to prepare and gover...
详细信息
ISBN:
(纸本)0769522866
the area of interest for this paper covers patternrecognition method, which can find and classify all useful relations between data entries in the time series. Genetic algorithm has been deployed to prepare and govern a set of independent patterns. For each pattern additional quality value has been added this value corresponds to the level of certainty and is introduced in the work. Practical application of this solution consists of data fitting and prediction If . Analyzed data can be non continuous and incomplete. In uncertain cases algorithm presents either no response at all or more than one answer to processed data. Architecture of the system offers possibility to interleave learning phase with use. Genetic Algorithm applied in the method facilitates niche techniques as well as crowd factor and specialized population selection methods. Early testing results, which include prediction and fitting of simple time series with up to 50 percent of missing data, are presented at the end of the paper.
Drifting learning (DL) is an effective method to solve the regression problem in the field of datamining. the approach is established based on the combination Of Local Weighted learning (LWL) algorithm and Statistica...
详细信息
ISBN:
(纸本)076952432X
Drifting learning (DL) is an effective method to solve the regression problem in the field of datamining. the approach is established based on the combination Of Local Weighted learning (LWL) algorithm and Statistical learningtheory (SLT). It is shown from the theoretic analysis and simulation that better performance on estimation precision and generalization ability than the traditional methods can be achieved And this method is suitable for modeling complex industrial process with multiple work modes. In the algorithm, the optimized bandwidth selection is a key factor on the generalization performance and real-time performance. this paper first analyzes the effect of the optimized bandwidth on drifting learning method based on theoretic analysis and simulation, and then provides a novel optimized bandwidth selection algorithm. the simulation results show that the proposed approach can achieves performance superior to the existed methods.
the proceedings contain 138 papers. the topics discussed include: handling generalized cost functions in the partitioning optimization problem through sequential binary programming;online hierarchical clustering in a ...
详细信息
ISBN:
(纸本)0769522785
the proceedings contain 138 papers. the topics discussed include: handling generalized cost functions in the partitioning optimization problem through sequential binary programming;online hierarchical clustering in a data warehouse environment;eMailSift: Email classification based on structure and content;an empirical Bayes approach to detect anomalies in dynamic multidimensional arrays;classifier fusion using shared sampling distribution for boosting;improving automatic query classification via semi-supervised learning;ViVo: visual vocabulary construction for mining biomedical images;adaptive product normalization: using online learning for record linkage in comparison shopping;using information-theoretic measures to access association rule interestingness;shortest-path kernels on graphs;mining frequent spatio-temporal sequential patterns;modeling multiple time series for anomaly detection;and WARP: time warping for periodicity detection.
Naive Bayes(simply NB)[12] has been widely used in machinelearning and datamining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, researchers have ma...
详细信息
ISBN:
(纸本)0769522785
Naive Bayes(simply NB)[12] has been widely used in machinelearning and datamining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, researchers have made a substantial amount of effort to improve naive Bayes. the related research work can be broadly divided into two approaches: eager learning and lazy learning, depending on when the major computation occurs. Different from eager approach, the key idea for extending naive Bayes from the lazy approach is to learn a naive Bayes for each testing example. In recent years, some lazy extensions of naive Bayes have been proposed. For example, SNNB[18], LWNB[7], and LBR[19]. All are aiming at improving the classification accuracy of naive Bayes. In many real-world machinelearning and datamining applications, however an accurate ranking is more desirable than an accurate classification. Responding to this fact, we present a lazy learning algorithm called instance greedily cloning naive Bayes (simply IGCNB) in this paper Our motivation is to improve naive Bayes' ranking performance measured by AUC[4, 14]. We experimentally tested our algorithm, using the whole 36 UCI datasets recommended by Weka[1], and compared it to C4.4[16], NB[12], SNNB[18] and LWNB[7]. the experimental results show that our algorithm outperforms all the other algorithms used to compare significantly in yielding accurate ranking.
Classification is one of the main tasks in machinelearning, datamining and patternrecognition. Compared withthe extensively studied data-driven approaches, the interactively user-driven approaches are less explore...
详细信息
ISBN:
(纸本)0780391365
Classification is one of the main tasks in machinelearning, datamining and patternrecognition. Compared withthe extensively studied data-driven approaches, the interactively user-driven approaches are less explored A granular computing model is suggested for re-examiningthe classification problems. An interactive classification method using the granule network is proposed, which allows multi-strategies for granule tree construction and enhances the understanding and interpretation of the classification process. this method is complementary to the existing classification methods.
Traditional text mining systems employ shallow parsing techniques and focus on concept extraction and taxonomic relation extraction. this paper presents a novel system called CRCTOL for mining rich semantic knowledge ...
详细信息
Latent factor models offer a very useful framework for modeling dependencies in high-dimensional multivariate data. In this work we investigate a class of latent factor models with hidden noisy-or units that let us de...
详细信息
ISBN:
(纸本)9780898715934
Latent factor models offer a very useful framework for modeling dependencies in high-dimensional multivariate data. In this work we investigate a class of latent factor models with hidden noisy-or units that let us decouple high dimensional vectors of observable binary random variables using a 'small' number of hidden binary factors. Since the problem of learning of such models from data is intractable, we develop its variational approximation. We analyze special properties of the optimization problem, in particular its "built-in" regularization effect and discuss its importance for model recovery. We test the noisy-or model on an image deconvolution problem and illustrate the ability of the variational method to succesfully learn the underlying image components. Finally, we apply the latent noisy-or model to analyze citations in a large collection of Statistical machinelearning papers and show the benefit of the model and algorithms by discovering useful and semantically sound components characterizing the dataset.
Real life transaction data often miss some occurrences of items that are actually present. As a consequence some potentially interesting frequent patterns cannot be discovered, since with exact matching the number of ...
详细信息
Linear discriminant analysis (LDA) as a dimension reduction method is widely used in datamining and machinelearning. It however suffers from the small sample size (SSS) problem when data dimensionality is greater th...
详细信息
ISBN:
(纸本)0769522785
Linear discriminant analysis (LDA) as a dimension reduction method is widely used in datamining and machinelearning. It however suffers from the small sample size (SSS) problem when data dimensionality is greater than the sample size. Many modified methods have been proposed to address some aspect of this difficulty from a particular viewpoint. A comprehensive framework that provides a complete solution to the SSS problem is still missing. In this paper we provide a unified approach to LDA, and investigate the SSS problem in the framework of statistical learningtheory. In such a unified approach, our analysis results in a deeper understanding of LDA. We demonstrate that LDA (and its nonlinear extension) belongs to the same framework where powerful classifiers such as support vector machines (SVMs) are formulated. In addition, this approach allows us to establish an error bound for LDA. Finally our experiments validate our theoretical analysis results.
暂无评论