Under the Bayesian Ying-Yang (BYY) harmony learningtheory, a harmony function has been developed for Gaussian mixture model with an important feature that, via its maximization through a gradient learning rule, model...
详细信息
ISBN:
(纸本)3540228810
Under the Bayesian Ying-Yang (BYY) harmony learningtheory, a harmony function has been developed for Gaussian mixture model with an important feature that, via its maximization through a gradient learning rule, model selection can be made automatically during parameter learning on a set of sample data from a Gaussian mixture. this paper proposes two further gradient learning rules, called conjugate and natural gradient learning rules, respectively, to efficiently implement the maximization of the harmony function on Gaussian mixture. It is demonstrated by simulation experiments that these two new gradient learning rules not only work well, but also converge more quickly than the general gradient ones.
Most financial time series processes are nonstationary and their frequency characteristics are time-dependant. In this paper we present a time series summarization and prediction framework to analyse nonstationary, vo...
详细信息
ISBN:
(纸本)3540228810
Most financial time series processes are nonstationary and their frequency characteristics are time-dependant. In this paper we present a time series summarization and prediction framework to analyse nonstationary, volatile and high-frequency time series data. Multiscale wavelet analysis is used to separate out the trend, cyclical fluctuations and autocorrelational effects. the framework can generate verbal signals to describe each effect. the summary output is used to reason about the future behaviour of the time series and to give a prediction. Experiments on the intra-day European currency spot exchange rates are described. the results are compared with a neural network prediction framework.
the use of data mining techniques for intrusion detection (ID) is one of the ongoing issues in the field of computer security, but little attention has been placed in engineering ID activities. this paper presents a f...
详细信息
ISBN:
(纸本)3540228810
the use of data mining techniques for intrusion detection (ID) is one of the ongoing issues in the field of computer security, but little attention has been placed in engineering ID activities. this paper presents a framework that models the ID process as a set of cooperative tasks each supporting a specialized activity. Specifically, the framework organises raw audit data into a set of relational tables and applies data mining algorithms to generate intrusion detection models. Specialized components of a commercial DBMS have been used to validate the proposed approach. Results show that the framework works well in capturing patterns of intrusion while the availability of an integrated software environment allows a high level of modularity in performing each task.
data mining is useful means for discovering valuable patterns, associations, trends, and dependencies in data. data mining is often required to be performed among a group of sites, where the precondition is that no pr...
详细信息
ISBN:
(纸本)3540228810
data mining is useful means for discovering valuable patterns, associations, trends, and dependencies in data. data mining is often required to be performed among a group of sites, where the precondition is that no privacy of any site should be leaked out to other sites. In this paper a distributed privacy-preserving data mining algorithm is proposed. the proposed algorithm is characterized with (1) its ability to preserve the privacy without any coordinator site, and specially its ability to resist the collusion;and (2) its lightweight since only the random number is used for preserving the privacy, Performance analysis and experimental results are provided for demonstrating the effectiveness of the proposed algorithm.
A novel algorithm, named DESCRY, for clustering very large multidimensional data sets with numerical attributes is presented. DESCRY discovers clusters having different shape, size, and density and when data contains ...
详细信息
ISBN:
(纸本)3540228810
A novel algorithm, named DESCRY, for clustering very large multidimensional data sets with numerical attributes is presented. DESCRY discovers clusters having different shape, size, and density and when data contains noise by first finding and clustering a small set of points, called meta-points, that well depict the shape of clusters present in the data set. Final clusters are obtained by assigning each point to one of the partial clusters. the computational complexity of DESCRY is linear both in the data set size and in the data set dimensionality. Experiments show the very good qualitative results obtained comparable withthose obtained by state of the art clustering algorithms.
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machin...
详细信息
ISBN:
(纸本)3540228810
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machine learning techniques of support vector machines and neural networks to the problem of amalgamating catalogues of galaxies as objects from two disparate data sources: radio and optical. Formulating this as a classification problem presents several challenges, including dealing with a highly unbalanced data set. Unlike the conventional approach to the problem (which is based on a likelihood ratio) machine learning does not require density estimation and is shown here to provide a significant improvement in performance. We also report some experiments that explore the importance of the radio and optical data features for the matching problem.
It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. Toward this ends, this paper presents a topic-specific crawler computing the degree...
详细信息
ISBN:
(纸本)3540228810
It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. Toward this ends, this paper presents a topic-specific crawler computing the degree of relevance and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. In the experiments, we test our topic-specific crawler in terms of the accuracy of its classification, the crawling efficiency, and the crawling consistency. In case of using 51 representative terms, it turned out that the resulting accuracy of the classification was 97.8%.
As a vast number of services have been flooding into the Internet, it is more likely for the Internet resources to be exposed to various hacking activities such as Code Red and SQL Slammer worm. Since various worms qu...
详细信息
ISBN:
(纸本)3540228810
As a vast number of services have been flooding into the Internet, it is more likely for the Internet resources to be exposed to various hacking activities such as Code Red and SQL Slammer worm. Since various worms quickly spread over the Internet using self-propagation mechanism, it is crucial to detect worm propagation and protect them for secure network infrastructure. In this paper, we propose a mechanism to detect worm propagation using the computation of entropy of network traffic and the compilation of network traffic. In experiments, we tested our framework in simulated network settings and could successfully detect worm propagation.
In this paper, we present a personalized news reading prototype where latest news articles published by various on-line news providers are automatically collected, categorized and ranked in light of a user's habit...
详细信息
ISBN:
(纸本)3540228810
In this paper, we present a personalized news reading prototype where latest news articles published by various on-line news providers are automatically collected, categorized and ranked in light of a user's habits or interests. Moreover, our system can adapt itself towards a better performance. In order to develop such an adaptive system, we proposed a hybrid learning strategy;supervised learning is used to create an initial system configuration based on user's feedbacks during registration, while an unsupervised learning scheme gradually updates the configuration by tracing the user's behaviors as the system is being used. Simulation results demonstrate satisfactory performance.
Symbolization of time series is an important preprocessing subroutine for many data mining tasks. However, it is usually difficult, if not impossible, to apply the traditional static symbolization approach on streamin...
详细信息
ISBN:
(纸本)3540228810
Symbolization of time series is an important preprocessing subroutine for many data mining tasks. However, it is usually difficult, if not impossible, to apply the traditional static symbolization approach on streaming time series, because of either the low efficiency of re-computing the typical sub-series, or the low capability of representing the up-to-date series characters. this paper presents a novel symbolization method, in which the typical sub-series are dynamically adjusted to fit the up-to-date characters of streaming time series. It works in an incremental form without scanning the whole date set. Experiments on data set from stock market justify the superiority of the proposed method over the traditional ones.
暂无评论