Naive Bayes(simply NB)[12] has been widely used in machinelearning and datamining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, researchers have ma...
详细信息
ISBN:
(纸本)0769522785
Naive Bayes(simply NB)[12] has been widely used in machinelearning and datamining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, researchers have made a substantial amount of effort to improve naive Bayes. the related research work can be broadly divided into two approaches: eager learning and lazy learning, depending on when the major computation occurs. Different from eager approach, the key idea for extending naive Bayes from the lazy approach is to learn a naive Bayes for each testing example. In recent years, some lazy extensions of naive Bayes have been proposed. For example, SNNB[18], LWNB[7], and LBR[19]. All are aiming at improving the classification accuracy of naive Bayes. In many real-world machinelearning and datamining applications, however an accurate ranking is more desirable than an accurate classification. Responding to this fact, we present a lazy learning algorithm called instance greedily cloning naive Bayes (simply IGCNB) in this paper Our motivation is to improve naive Bayes' ranking performance measured by AUC[4, 14]. We experimentally tested our algorithm, using the whole 36 UCI datasets recommended by Weka[1], and compared it to C4.4[16], NB[12], SNNB[18] and LWNB[7]. the experimental results show that our algorithm outperforms all the other algorithms used to compare significantly in yielding accurate ranking.
Classification is one of the main tasks in machinelearning, datamining and patternrecognition. Compared withthe extensively studied data-driven approaches, the interactively user-driven approaches are less explore...
详细信息
ISBN:
(纸本)0780391365
Classification is one of the main tasks in machinelearning, datamining and patternrecognition. Compared withthe extensively studied data-driven approaches, the interactively user-driven approaches are less explored A granular computing model is suggested for re-examiningthe classification problems. An interactive classification method using the granule network is proposed, which allows multi-strategies for granule tree construction and enhances the understanding and interpretation of the classification process. this method is complementary to the existing classification methods.
Traditional text mining systems employ shallow parsing techniques and focus on concept extraction and taxonomic relation extraction. this paper presents a novel system called CRCTOL for mining rich semantic knowledge ...
详细信息
Latent factor models offer a very useful framework for modeling dependencies in high-dimensional multivariate data. In this work we investigate a class of latent factor models with hidden noisy-or units that let us de...
详细信息
ISBN:
(纸本)9780898715934
Latent factor models offer a very useful framework for modeling dependencies in high-dimensional multivariate data. In this work we investigate a class of latent factor models with hidden noisy-or units that let us decouple high dimensional vectors of observable binary random variables using a 'small' number of hidden binary factors. Since the problem of learning of such models from data is intractable, we develop its variational approximation. We analyze special properties of the optimization problem, in particular its "built-in" regularization effect and discuss its importance for model recovery. We test the noisy-or model on an image deconvolution problem and illustrate the ability of the variational method to succesfully learn the underlying image components. Finally, we apply the latent noisy-or model to analyze citations in a large collection of Statistical machinelearning papers and show the benefit of the model and algorithms by discovering useful and semantically sound components characterizing the dataset.
Real life transaction data often miss some occurrences of items that are actually present. As a consequence some potentially interesting frequent patterns cannot be discovered, since with exact matching the number of ...
详细信息
Linear discriminant analysis (LDA) as a dimension reduction method is widely used in datamining and machinelearning. It however suffers from the small sample size (SSS) problem when data dimensionality is greater th...
详细信息
ISBN:
(纸本)0769522785
Linear discriminant analysis (LDA) as a dimension reduction method is widely used in datamining and machinelearning. It however suffers from the small sample size (SSS) problem when data dimensionality is greater than the sample size. Many modified methods have been proposed to address some aspect of this difficulty from a particular viewpoint. A comprehensive framework that provides a complete solution to the SSS problem is still missing. In this paper we provide a unified approach to LDA, and investigate the SSS problem in the framework of statistical learningtheory. In such a unified approach, our analysis results in a deeper understanding of LDA. We demonstrate that LDA (and its nonlinear extension) belongs to the same framework where powerful classifiers such as support vector machines (SVMs) are formulated. In addition, this approach allows us to establish an error bound for LDA. Finally our experiments validate our theoretical analysis results.
this paper discusses a consistency in patterns of language use across domain-specific collections of text. We present a method for the automatic identification of domain-specific keywords - specialist terms - based on...
详细信息
We present CoLe, a cooperative datamining approach for discovering hybrid knowledge. It employs multiple different datamining algorithms, and combines results from them to enhance the mined knowledge. For our medica...
详细信息
Privacy consideration has much significance in the application of datamining. It is very important that the privacy of individual parties will not be exposed when datamining techniques are applied to a large collect...
详细信息
Online auction Web sites are fast changing, highly dynamic, and complex as they involve tremendous sellers and potential buyers, as well as a huge amount of items listed for bidding. We develop a two-phase framework w...
详细信息
暂无评论