the nearest-neighbor (NN) classifier has long been used in patternrecognition, exploratory data analysis, and datamining problems. A vital consideration in obtaining good results withthis technique is the choice of...
详细信息
ISBN:
(纸本)9780769527017
the nearest-neighbor (NN) classifier has long been used in patternrecognition, exploratory data analysis, and datamining problems. A vital consideration in obtaining good results withthis technique is the choice of distance function, and correspondingly which features to consider when computing distances between samples. In this paper a new ensemble technique is proposed to improve the performance of NN classifier the proposed approach combines multiple NN classifiers, where each classifier uses a different distance function and potentially a different set of features (feature vector). these feature vectors are determined for each distance metric using Simple Voting Scheme incorporated in Tabu Search (TS). the proposed ensemble classifier with different distance metrics and different feature vectors (TS-DF/NN) is evaluated using various benchmark data sets from UCI machinelearning Repository. Results have indicated a significant increase in the performance when compared with various well-known classifiers. Furthermore, the proposed ensemble method is also compared with ensemble classifier using different distance metrics but with same feature vector (with or without Feature Selection (FS)).
Privacy-Preserving datamining is an important area that studies privacy issues of datamining. When the goal is to share datamining results, two privacy-related problems may arise. the first one is how to compute th...
详细信息
ISBN:
(纸本)9780769527017
Privacy-Preserving datamining is an important area that studies privacy issues of datamining. When the goal is to share datamining results, two privacy-related problems may arise. the first one is how to compute the data-mining results among several parties without sharing the data. Cryptography-based primitives are the basic tool used to develop ad-hoc secure multi-party computation protocols that share information as less as possible during the computation under different adversary models. the second one is how to produce datamining results that provably do not contain threats to the anonymity of individuals. the concept of k-anonymity has been used to discover anonymity-preserving frequent patterns, and centralized algorithms have been developed. In this paper and for the first time, we study how to produce anonymity-preserving datamining results in a distributed environment. We present two privacy-preserving strategies and show their feasibility through experimental analysis.
In active learning, a machinelearning algorithm is given an unlabeled set of examples U, and is allowed to request labels for a relatively small subset of U to use for training. the goal is then to judiciously choose...
详细信息
ISBN:
(纸本)9780769527017
In active learning, a machinelearning algorithm is given an unlabeled set of examples U, and is allowed to request labels for a relatively small subset of U to use for training. the goal is then to judiciously choose which examples in U to have labeled in order to optimize some performance criterion, e.g. classification accuracy. We study how active learning affects AUC We examine two existing algorithms from the literature and present our own active learning algorithms designed to maximize the AUC of the hypothesis. One of our algorithms was consistently the top performer, and Closest Sampling from the literature often came in second behind it. When good posterior probability estimates were available, our heuristics were by far the best.
Insufficiency of training data is a major obstacle in machinelearning and datamining applications. Many different semi-supervised learning algorithms have been proposed to tackle this difficulty by leveraging a larg...
详细信息
ISBN:
(纸本)9780769527017
Insufficiency of training data is a major obstacle in machinelearning and datamining applications. Many different semi-supervised learning algorithms have been proposed to tackle this difficulty by leveraging a large amount of unlabeled data. However, most Of them focus on semi-supervised classification. In this paper we propose a semi-supervised regression algorithm named Semi-Supervised Kernel Regression (SSKR). While classical kernel regression is only based on labeled examples, our approach extends it to all observed examples using a weighting factor to modulate the effect of unlabeled examples. Experimental results prove that SSKR significantly outperforms traditional kernel regression and graph-based semi-supervised regression methods.
With very low extra computational cost, the entire solution path can be computed for various learning algorithms like support vector classification (SVC) and support vector regression (SVR). In this paper, we extend t...
详细信息
ISBN:
(纸本)9780769527017
With very low extra computational cost, the entire solution path can be computed for various learning algorithms like support vector classification (SVC) and support vector regression (SVR). In this paper, we extend this promising approach to semi-supervised learning algorithms. In particular, we consider finding the solution path for the Laplacian support vector machine (LapSVM) which is a semi-supervised classification model based on manifold regularization. One advantage of the this algorithm is that the coefficient path is piecewise linear with respect to the regularization parameter, hence its computational complexity is quadratic in the number of labeled examples.
datamining has evolved as a new discipline at the intersection of several existing areas, including database Systems, machinelearning, Optimization, and Statistics. An important question is whether the field has mat...
详细信息
ISBN:
(纸本)9780769527017
datamining has evolved as a new discipline at the intersection of several existing areas, including database Systems, machinelearning, Optimization, and Statistics. An important question is whether the field has matured to the point where it has originated substantial new problems and techniques that distinguish it from its parent disciplines. In this paper, we discuss a class of new problems and techniques that show great promise for exploratory mining, while synthesizing and generalizing ideas from the parent disciplines. While the class of problems we discuss is broad, there is a common underlying objective-to look beyond a single datamining step (e.g., data summarization or model construction) and address the combined process of data selection and transformation, parameter and algorithm selection, and model construction. the fundamental difficulty lies in the large space of alternative choices at each step, and good solutions must provide a natural framework for managing this complexity. We regard this as a grand challenge for datamining, and see the ideas in this paper as promising initial steps towards a rigorous exploratory framework that supports the entire process. this is joint work with several people, in particular, Beechung Chen.
We propose a novel algorithm, RankDE, to build an ensemble using an extra artificial dataset. RankDE aims at improving the overall ranking performance, which is crucial in many machinelearning applications. this algo...
详细信息
ISBN:
(纸本)9780769527017
We propose a novel algorithm, RankDE, to build an ensemble using an extra artificial dataset. RankDE aims at improving the overall ranking performance, which is crucial in many machinelearning applications. this algorithm constructs artificial datasets that are diverse withthe current training dataset in terms of ranking. We conduct experiments with real-world data sets to compare RankDE with some traditional and state-of-the-art ensembling algorithms of Bagging, Adaboost, DECORATE and Rankboost in terms of ranking. the experiments show that RankDE outperforms Bagging, DECORATE, Adaboost, and Rankboost when limited data is available. When enough training data is available, it is competitive with DECORATE and Adaboost.
In simulation-based functional verification, composing and debugging testbenches can be tedious and time-consuming. A simulation-based data-mining approach [3] was proposed as an alternative for functional test patter...
详细信息
ISBN:
(纸本)9780769526270
In simulation-based functional verification, composing and debugging testbenches can be tedious and time-consuming. A simulation-based data-mining approach [3] was proposed as an alternative for functional test pattern generation. However, the core of the approach is in solving Boolean learning, which is the problem of learning Boolean functions from bit-level simulation data. In this paper an efficient datamining engine based on novel decision-diagram(DD) based learning approaches is presented. We compare the DD-based learning approaches to other known methods such as Nearest Neighbor and Support Vector machine. We show that the new Boolean data miner is efficient for practical use and the learned results can provide compact and accurately approximate representations of Boolean functions. Finally, we show that the proposed methodology incorporated withthe current Boolean data miner can achieve a high fault coverage (95.36%) on the OpenRISC 1200 microprocessor demonstrating the effectiveness of our approach.
In this paper we propose a novel and general approach for time-series datamining. As an alternative to traditional ways of designing specific algorithm to mine certain kind of pattern directly from the data, our appr...
详细信息
ISBN:
(纸本)9780769527017
In this paper we propose a novel and general approach for time-series datamining. As an alternative to traditional ways of designing specific algorithm to mine certain kind of pattern directly from the data, our approach extracts the temporal structure of the time-series data by learning Markovian models, and then uses well established methods to efficiently mine a wide variety of patterns from the topology graph of the learned models. We consolidate the approach by explaining the use of some well-known Markovian models on mining several kinds of patterns. We then present a novel high-order hidden Markov model, the variable-length hidden Markov model (VLHMM), which combines the advantages of well-known Markovian models and has the superiority in both efficiency and accuracy. therefore, it can mine a much wider variety of patterns than each of prior Markovian models. We demonstrate the power of VLHMM by mining four kinds of interesting patterns from 3D motion capture data, which is typical for the high-dimensionality and complex dynamics.
作者:
Wang, Ching WeiUniv Lincoln
Vis & Artificial Intelligence Grp Dept Comp & Informat Lincoln LN6 7TS England
A reliable and precise classification of tumours is essential for successful treatment of cancer. Recent researches have confirmed the utility of ensemble machinelearning algorithms for gene expression data analysis....
详细信息
ISBN:
(纸本)9781424400324
A reliable and precise classification of tumours is essential for successful treatment of cancer. Recent researches have confirmed the utility of ensemble machinelearning algorithms for gene expression data analysis. In this paper, a new ensemble machinelearning algorithm is proposed for classification and prediction on gene expression data. the algorithm is tested and compared withthree popular adopted ensembles, i.e. bagging, boosting and arcing. the results show that the proposed algorithm greatly outperforms existing methods, achieving high accuracy over 12 gene expression datasets.
暂无评论