the proceedings contain 68 papers. the topics discussed include: incremental classification rules based on association rules using formal concept analysis;finite mixture models with negative components;principles of m...
详细信息
ISBN:
(纸本)3540269231
the proceedings contain 68 papers. the topics discussed include: incremental classification rules based on association rules using formal concept analysis;finite mixture models with negative components;principles of multi-kernel datamining;a comprehensible SOM-based scoring system;linear manifold clustering;clustering document images using graph summaries;unsupervised learning of visual feature hierarchies;a new multidimensional feature transformation for linear classifiers and its applications;embedding time series data for classification;statistical supports for frequent itemsets on data streams;neural expert model applied to phonemes recognition;and signature-based approach for intrusion detection.
Dimension reduction methods are often applied in machinelearning and datamining problems. Linear subspace methods are the commonly used ones, such as principal component analysis (PCA), Fisher's linear discrimin...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Dimension reduction methods are often applied in machinelearning and datamining problems. Linear subspace methods are the commonly used ones, such as principal component analysis (PCA), Fisher's linear discriminant analysis (FDA), et al. In this paper, we describe a novel feature extraction method for binary classification problems. Instead of finding linear subspaces, our method finds lower-dimensional affine subspaces for data observations. Our method can be understood as a generalization of the Fukunaga-Koontz Transformation. We show that the proposed method has a closed-form solution and thus can be solved very efficiently. Also we investigate the information-theoretical properties of the new method and study the relationship of our method with other methods. the experimental results show that our method, as PCA and FDA, can be used as another preliminary data-exploring tool to help solve machinelearning and datamining problems.
During the past number of years, machinelearning and datamining techniques have received considerable attention among the intrusion detection researchers to address the weaknesses of knowledgebase detection techniqu...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
During the past number of years, machinelearning and datamining techniques have received considerable attention among the intrusion detection researchers to address the weaknesses of knowledgebase detection techniques. this has led to the application of various supervised and unsupervised techniques for the purpose of intrusion detection. In this paper, we conduct a set of experiments to analyze the performance of unsupervised techniques considering their main design choices. these include the heuristics proposed for distinguishing abnormal data from normal data and the distribution of dataset used for training. We evaluate the performance of the techniques with various distributions of training and test datasets, which are constructed from KDD99 dataset, a widely accepted resource for IDS evaluations. this comparative study is not only a blind comparison between unsupervised techniques, but also gives some guidelines to researchers and practitioners on applying these techniques to the area of intrusion detection.
Description logics have emerged as one of the most successful formalisms for knowledge representation and reasoning. they are now widely used as a basis for ontologies in the Semantic Web. To extend and analyse ontolo...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Description logics have emerged as one of the most successful formalisms for knowledge representation and reasoning. they are now widely used as a basis for ontologies in the Semantic Web. To extend and analyse ontologies, automated methods for knowledge acquisition and mining are being sought for. Despite its importance for knowledge engineers, the learning problem in description logics has not been investigated as deeply as its counterpart for logic programs. We propose the novel idea of applying evolutionary inspired methods to solve this task. In particular, we show how Genetic Programming can be applied to the learning problem in description logics and combine it with techniques from Inductive Logic Programming. We base our algorithm on thorough theoretical foundations and present a preliminary evaluation.
Computational procedures using independence assumptions in various forms are popular in machinelearning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understa...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Computational procedures using independence assumptions in various forms are popular in machinelearning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understanding of when they work is available, but a definite answer seems to be lacking. this paper derives distributions that maximizes the statewise difference to the respective product of marginals. these distributions are, in a sense the worst distribution for predicting an outcome of the data generating mechanism by independence. We also restrict the scope of new theoretical results by showing explicitly that, depending on context, independent ('Naive') classifiers can be as bad as tossing coins. Regardless of this, independence may beat the generating model in learning supervised classification and we explicitly provide one such scenario.
Traditional methods in datamining cannot be applied to all types of data with equal success. Innovative methods for model creation are needed to address the lack of model performance for data from which it is difficu...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Traditional methods in datamining cannot be applied to all types of data with equal success. Innovative methods for model creation are needed to address the lack of model performance for data from which it is difficult to extract relationships. this paper proposes a set of algorithms that allow the integration of data from multiple datasets that are related, as well as results from the implementation of these techniques using data from the field of Predictive Toxicology. the results show significant improvements when related data is used to aid in the model creation process, both overall and in specific data ranges. the proposed algorithms have potential for use within any field where multiple datasets exist, particularly in fields combining computing, chemistry and biology.
the datamining technique is applied to search stable feature set and build authentication rules of handwriting signature in this paper. Supervised by datamining technique, 10 stable features including-maximum speed,...
详细信息
ISBN:
(纸本)9781424410651
the datamining technique is applied to search stable feature set and build authentication rules of handwriting signature in this paper. Supervised by datamining technique, 10 stable features including-maximum speed, maximum acceleration, the amount and the places of inflexions and etc have been selected from 61 original signature features. Taking the selected feature set as the input attribute, true or false signature sample clusters are trained and learned to build authentication rules supervised by data technique to lest the validity of the selected feature set. the result of the lest shows that the selected feature set is effective to identify handwriting signature and the average veracity of Chinese authentication is zip to 92%. It is proved that datamining technique is an effective method to identify handwriting signature.
Current metrics for evaluating the performance of Bayesian network structure learning includes order statistics of the data likelihood of learned structures, the average data likelihood, and average convergence time. ...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
Current metrics for evaluating the performance of Bayesian network structure learning includes order statistics of the data likelihood of learned structures, the average data likelihood, and average convergence time. In this work, we define a new metric that directly measures a structure learning algorithm's ability to correctly model causal associations among variables in a data set. By treating membership in a Markov Blanket as a retrieval problem, we use ROC analysis to compute a structure learning algorithm's efficacy in capturing causal associations at varying strengths. Because our metric moves beyond error rate and data-likelihood with a measurement of stability, this is a better characterization of structure learning performance. Because the structure learning problem is NP-hard, practical algorithms are either heuristic or approximate. For this reason, an understanding of a structure learning algorithm's stability and boundary value conditions is necessary. We contribute to state of the art in the data-mining community with a new tool for understanding the behavior of structure learning techniques.
According to the load properties of electric power,four kinds of component forecasting Models ore chosen and a new combination forecasting model based on Self-organizing datamining algorithm is introducted in this pa...
详细信息
ISBN:
(纸本)9781424410651
According to the load properties of electric power,four kinds of component forecasting Models ore chosen and a new combination forecasting model based on Self-organizing datamining algorithm is introducted in this paper the forecasted results of each component forcasting models are used as the input of self-organizing datamining algorithm, and the output are the results Of Combination forecasting. In order to verify , the validity and maneuverability of the model, a load forecasting example is given and the result show that this model can improve the forecasting ability remarkably when comparing to optimal combination forecasting, and artificial neural network combination forecasting.
the recently introduced transductive confidence machines (TCMs) framework allows to extend classifiers such that they satisfy the calibration property. this means that the error rate can be set by the user prior to cl...
详细信息
ISBN:
(数字)9783540734994
ISBN:
(纸本)9783540734987
the recently introduced transductive confidence machines (TCMs) framework allows to extend classifiers such that they satisfy the calibration property. this means that the error rate can be set by the user prior to classification. An analytical proof of the calibration property was given for TCMs applied in the on-line learning setting. However, the nature of this learning setting restricts the applicability of TCMs. In this paper we provide strong empirical evidence that the calibration property also holds in the off-line learning setting. Our results extend the range of applications in which TCMs can be applied. We may conclude that TCMs are appropriate in virtually any application domain.
暂无评论