Biometric data like fingerprints are often highly structured and of high dimension. The "curse of dimensionality" poses great challenge to subsequent patternrecognition algorithms including neural networks ...
详细信息
ISBN:
(数字)9783540738718
ISBN:
(纸本)9783540738701
Biometric data like fingerprints are often highly structured and of high dimension. The "curse of dimensionality" poses great challenge to subsequent patternrecognition algorithms including neural networks due to high computational complexity. A common approach is to apply dimensionality reduction (DR) to project the original data onto a lower dimensional space that preserves most of the useful information. Recently, we proposed Twin Kernel Embedding (TKE) that processes structured or non-vectorial data directly without vectorization. Here, we apply this method to clustering and visualizing fingerprints in a 2-dimensional space. It works by learning an optimal kernel in the latent space from a distance metric defined on the input fingerprints instead of a kernel. The outputs are the embeddings of the fingerprints and a kernel Gram matrix in the latent space that can be used in subsequent learning procedures like Support Vector machine (SVM) for classification or recognition. Experimental results confirmed the usefulness of the proposed method.
During the past number of years, machinelearning and datamining techniques have received considerable attention among the intrusion detection researchers to address the weaknesses of knowledgebase detection techniqu...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
During the past number of years, machinelearning and datamining techniques have received considerable attention among the intrusion detection researchers to address the weaknesses of knowledgebase detection techniques. This has led to the application of various supervised and unsupervised techniques for the purpose of intrusion detection. In this paper, we conduct a set of experiments to analyze the performance of unsupervised techniques considering their main design choices. These include the heuristics proposed for distinguishing abnormal data from normal data and the distribution of dataset used for training. We evaluate the performance of the techniques with various distributions of training and test datasets, which are constructed from KDD99 dataset, a widely accepted resource for IDS evaluations. This comparative study is not only a blind comparison between unsupervised techniques, but also gives some guidelines to researchers and practitioners on applying these techniques to the area of intrusion detection.
Dimension reduction methods are often applied in machinelearning and datamining problems. Linear subspace methods are the commonly used ones, such as principal component analysis (PCA), Fisher's linear discrimin...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Dimension reduction methods are often applied in machinelearning and datamining problems. Linear subspace methods are the commonly used ones, such as principal component analysis (PCA), Fisher's linear discriminant analysis (FDA), et al. In this paper, we describe a novel feature extraction method for binary classification problems. Instead of finding linear subspaces, our method finds lower-dimensional affine subspaces for data observations. Our method can be understood as a generalization of the Fukunaga-Koontz Transformation. We show that the proposed method has a closed-form solution and thus can be solved very efficiently. Also we investigate the information-theoretical properties of the new method and study the relationship of our method with other methods. The experimental results show that our method, as PCA and FDA, can be used as another preliminary data-exploring tool to help solve machinelearning and datamining problems.
A Reflex Fuzzy Min-Max Neural Network (RFMN) capable of learning from missing data is presented. Many real world problems involve machine leaning with missing values or attributes. Thus, learning with missing or incom...
详细信息
The work presented here focuses on combining multiple classifiers to form single classifier for pattern classification, machinelearning for expert system, and datamining tasks. The basis of the combination is that e...
详细信息
ISBN:
(纸本)9783540770459
The work presented here focuses on combining multiple classifiers to form single classifier for pattern classification, machinelearning for expert system, and datamining tasks. The basis of the combination is that efficient concept learning is possible in many cases when the concepts learned from different approaches are combined to a more efficient concept. The experimental result of the algorithm, EMRL in a representative collection of different domain shows that it performs significantly better than the several state-of-the-art individual classifier, in case of 11 domains out of 25 data sets whereas the state-of-the-art individual classifier performs significantly better than EMRL only in 5 cases.
Description logics have emerged as one of the most successful formalisms for knowledge representation and reasoning. They are now widely used as a basis for ontologies in the Semantic Web. To extend and analyse ontolo...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Description logics have emerged as one of the most successful formalisms for knowledge representation and reasoning. They are now widely used as a basis for ontologies in the Semantic Web. To extend and analyse ontologies, automated methods for knowledge acquisition and mining are being sought for. Despite its importance for knowledge engineers, the learning problem in description logics has not been investigated as deeply as its counterpart for logic programs. We propose the novel idea of applying evolutionary inspired methods to solve this task. In particular, we show how Genetic Programming can be applied to the learning problem in description logics and combine it with techniques from Inductive Logic Programming. We base our algorithm on thorough theoretical foundations and present a preliminary evaluation.
Computational procedures using independence assumptions in various forms are popular in machinelearning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understa...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Computational procedures using independence assumptions in various forms are popular in machinelearning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understanding of when they work is available, but a definite answer seems to be lacking. This paper derives distributions that maximizes the statewise difference to the respective product of marginals. These distributions are, in a sense the worst distribution for predicting an outcome of the data generating mechanism by independence. We also restrict the scope of new theoretical results by showing explicitly that, depending on context, independent ('Naive') classifiers can be as bad as tossing coins. Regardless of this, independence may beat the generating model in learning supervised classification and we explicitly provide one such scenario.
Current metrics for evaluating the performance of Bayesian network structure learning includes order statistics of the data likelihood of learned structures, the average data likelihood, and average convergence time. ...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Current metrics for evaluating the performance of Bayesian network structure learning includes order statistics of the data likelihood of learned structures, the average data likelihood, and average convergence time. In this work, we define a new metric that directly measures a structure learning algorithm's ability to correctly model causal associations among variables in a data set. By treating membership in a Markov Blanket as a retrieval problem, we use ROC analysis to compute a structure learning algorithm's efficacy in capturing causal associations at varying strengths. Because our metric moves beyond error rate and data-likelihood with a measurement of stability, this is a better characterization of structure learning performance. Because the structure learning problem is NP-hard, practical algorithms are either heuristic or approximate. For this reason, an understanding of a structure learning algorithm's stability and boundary value conditions is necessary. We contribute to state of the art in the data-mining community with a new tool for understanding the behavior of structure learning techniques.
Network intrusion detection systems typically detect worms by examining packet or flow logs for known signatures. Not only does this approach mean worms cannot be detected until the signatures are created, but that va...
详细信息
ISBN:
(纸本)9781595937704
Network intrusion detection systems typically detect worms by examining packet or flow logs for known signatures. Not only does this approach mean worms cannot be detected until the signatures are created, but that variants of known worms will remain undetected since they will have different signatures. The intuitive solution is to write more generic signatures. This solution, however, would increase the false alarm rate and is therefore practically not feasible. This paper reports on the feasibility of using a machinelearning technique to detect variants of known worms in real-time. Support vector machines (SVMs) are a machinelearning technique known to perform well at various patternrecognition tasks, such as text categorization and handwritten digit recognition. Given the efficacy of SVMs in standardpatternrecognition problems this work applies SVMs to the worm detection problem. Specifically, we investigate the optimal configuration of SVMs and associated kernel functions to classify various types of synthetically generated worms. We demonstrate that the optimal configuration for real time detection of variants of known worms is to use a linear kernel, and unnormalized bi-gram frequency counts as input. Copyright 2007 ACM.
The recently introduced transductive confidence machines (TCMs) framework allows to extend classifiers such that they satisfy the calibration property. This means that the error rate can be set by the user prior to cl...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
The recently introduced transductive confidence machines (TCMs) framework allows to extend classifiers such that they satisfy the calibration property. This means that the error rate can be set by the user prior to classification. An analytical proof of the calibration property was given for TCMs applied in the on-line learning setting. However, the nature of this learning setting restricts the applicability of TCMs. In this paper we provide strong empirical evidence that the calibration property also holds in the off-line learning setting. Our results extend the range of applications in which TCMs can be applied. We may conclude that TCMs are appropriate in virtually any application domain.
暂无评论