the use of data mining techniques for intrusion detection (ID) is one of the ongoing issues in the field of computer security, but little attention has been placed in engineering ID activities. this paper presents a f...
详细信息
ISBN:
(纸本)3540228810
the use of data mining techniques for intrusion detection (ID) is one of the ongoing issues in the field of computer security, but little attention has been placed in engineering ID activities. this paper presents a framework that models the ID process as a set of cooperative tasks each supporting a specialized activity. Specifically, the framework organises raw audit data into a set of relational tables and applies data mining algorithms to generate intrusion detection models. Specialized components of a commercial DBMS have been used to validate the proposed approach. Results show that the framework works well in capturing patterns of intrusion while the availability of an integrated software environment allows a high level of modularity in performing each task.
Multiple classifier systems based on neural networks can give proved generalisation performance as compared with single classifier systems. We examine collaboration in multi-net systems through in-situ learning explor...
详细信息
ISBN:
(纸本)3540228810
Multiple classifier systems based on neural networks can give proved generalisation performance as compared with single classifier systems. We examine collaboration in multi-net systems through in-situ learning exploring how generalisation can be improved through the simultaneous learning in networks and their combination. We present two in-situ trained systems;first, one based upon the simple ensemble, combining supervised networks in parallel, and second, a combination of unsupervised and supervised networks in, sequence. Results for these are compared with existing approaches demonstrating that in-situ trained systems perform better than similar pre-trained systems.
data mining is useful means for discovering valuable patterns, associations, trends, and dependencies in data. data mining is often required to be performed among a group of sites, where the precondition is that no pr...
详细信息
ISBN:
(纸本)3540228810
data mining is useful means for discovering valuable patterns, associations, trends, and dependencies in data. data mining is often required to be performed among a group of sites, where the precondition is that no privacy of any site should be leaked out to other sites. In this paper a distributed privacy-preserving data mining algorithm is proposed. the proposed algorithm is characterized with (1) its ability to preserve the privacy without any coordinator site, and specially its ability to resist the collusion;and (2) its lightweight since only the random number is used for preserving the privacy, Performance analysis and experimental results are provided for demonstrating the effectiveness of the proposed algorithm.
A novel algorithm, named DESCRY, for clustering very large multidimensional data sets with numerical attributes is presented. DESCRY discovers clusters having different shape, size, and density and when data contains ...
详细信息
ISBN:
(纸本)3540228810
A novel algorithm, named DESCRY, for clustering very large multidimensional data sets with numerical attributes is presented. DESCRY discovers clusters having different shape, size, and density and when data contains noise by first finding and clustering a small set of points, called meta-points, that well depict the shape of clusters present in the data set. Final clusters are obtained by assigning each point to one of the partial clusters. the computational complexity of DESCRY is linear both in the data set size and in the data set dimensionality. Experiments show the very good qualitative results obtained comparable withthose obtained by state of the art clustering algorithms.
It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. Toward this ends, this paper presents a topic-specific crawler computing the degree...
详细信息
ISBN:
(纸本)3540228810
It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. Toward this ends, this paper presents a topic-specific crawler computing the degree of relevance and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. In the experiments, we test our topic-specific crawler in terms of the accuracy of its classification, the crawling efficiency, and the crawling consistency. In case of using 51 representative terms, it turned out that the resulting accuracy of the classification was 97.8%.
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machin...
详细信息
ISBN:
(纸本)3540228810
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machine learning techniques of support vector machines and neural networks to the problem of amalgamating catalogues of galaxies as objects from two disparate data sources: radio and optical. Formulating this as a classification problem presents several challenges, including dealing with a highly unbalanced data set. Unlike the conventional approach to the problem (which is based on a likelihood ratio) machine learning does not require density estimation and is shown here to provide a significant improvement in performance. We also report some experiments that explore the importance of the radio and optical data features for the matching problem.
Model uncertainty refers to the risk associated with basing prediction on only one model. In semi-supervised learning, this uncertainty is greater than in supervised learning (for the same total number of instances) g...
详细信息
ISBN:
(纸本)3540228810
Model uncertainty refers to the risk associated with basing prediction on only one model. In semi-supervised learning, this uncertainty is greater than in supervised learning (for the same total number of instances) given that many data points are ufflabelled. An optimal Bayes classifier (OBC) reduces model uncertainty by averaging predictions across the entire model space weighted by the models' posterior probabilities. For a given model space and prior distribution OBC produces the lowest risk. We propose an information theoretic method to construct an OBC for probabilistic semi-supervised learning using Markov chain Monte Carlo sampling. this contrasts with typical semi-supervised learningthat attempts to find the single most probable model using EM. Empirical results verify that OBC yields more accurate predictions than the best single model.
As a vast number of services have been flooding into the Internet, it is more likely for the Internet resources to be exposed to various hacking activities such as Code Red and SQL Slammer worm. Since various worms qu...
详细信息
ISBN:
(纸本)3540228810
As a vast number of services have been flooding into the Internet, it is more likely for the Internet resources to be exposed to various hacking activities such as Code Red and SQL Slammer worm. Since various worms quickly spread over the Internet using self-propagation mechanism, it is crucial to detect worm propagation and protect them for secure network infrastructure. In this paper, we propose a mechanism to detect worm propagation using the computation of entropy of network traffic and the compilation of network traffic. In experiments, we tested our framework in simulated network settings and could successfully detect worm propagation.
Many time series exhibit dynamics over vastly different time scales. the standard way to capture this behavior is to assume that the slow dynamics are a "trend", to de-trend the data, and then to model the f...
详细信息
ISBN:
(纸本)3540228810
Many time series exhibit dynamics over vastly different time scales. the standard way to capture this behavior is to assume that the slow dynamics are a "trend", to de-trend the data, and then to model the fast dynamics. However, for nonlinear dynamical systems this is generally insufficient. In this paper we describe a new method, utilizing two distinct nonlinear modeling architectures to capture both fast and slow dynamics. Slow dynamics are modeled withthe method of analogues, and fast dynamics with a deterministic radial basis function network. When combined the resulting model out-performs either individual system.
Symbolization of time series is an important preprocessing subroutine for many data mining tasks. However, it is usually difficult, if not impossible, to apply the traditional static symbolization approach on streamin...
详细信息
ISBN:
(纸本)3540228810
Symbolization of time series is an important preprocessing subroutine for many data mining tasks. However, it is usually difficult, if not impossible, to apply the traditional static symbolization approach on streaming time series, because of either the low efficiency of re-computing the typical sub-series, or the low capability of representing the up-to-date series characters. this paper presents a novel symbolization method, in which the typical sub-series are dynamically adjusted to fit the up-to-date characters of streaming time series. It works in an incremental form without scanning the whole date set. Experiments on data set from stock market justify the superiority of the proposed method over the traditional ones.
暂无评论