With the rapid growth of the Internet, to make sure of the computer security has been a crucial problem, therefore, many techniques for Intrusion detection have been proposed in order to detect network attacks efficie...
详细信息
ISBN:
(纸本)9781424481262
With the rapid growth of the Internet, to make sure of the computer security has been a crucial problem, therefore, many techniques for Intrusion detection have been proposed in order to detect network attacks efficiently. On the other hand, data mining algorithms based on Genetic Network Programming (GNP) have been proposed recently. GNP is a graph-based evolutionary algorithm and can extract many important class association rules by making use of the distinguished representation ability of the graph structures. In this paper, a probabilistic classification is proposed and combined with the class association rule mining of GNP, and applied to Network intrusion detection for the performance evaluation. The proposed method creates a joint probability density function of normal and intrusion accesses and use it to efficiently classify new access data into normal, known intrusion or unknown intrusion. It is clarified from the experimental results that the proposed method shows high classification accuracy compared to the method without probabilistic classification.
The need for providing learners with web-based learning content that match their accessibility needs and preferences, as well as providing ways to match learning content to user's devices has been identified as an...
详细信息
The need for providing learners with web-based learning content that match their accessibility needs and preferences, as well as providing ways to match learning content to user's devices has been identified as an important issue in accessible educational environment. For a web-based open and dynamic learning environment, personalized support for learners becomes more important. In order to achieve optimal efficiency in a learning process, individual learner's cognitive learning style should be taken into account. Due to different types of learners using these systems, it is necessary to provide them with an individualized learning support system. However, the design and development of web-based learning environments for people with special abilities has been addressed so far by the development of hypermedia and multimedia based on educational content. In this paper a framework of individual web-based learning system is presented by focusing on learner's cognitive learning process, learning pattern and activities, as well as the technology support needed. Based on the learner-centered mode and cognitive learning theory, we demonstrate an online course design and development that supports the students with the learning flexibility and the adaptability. The proposed framework utilizes data mining algorithm for representing and extracting a dynamic learning process and learning pattern to support students' deep learning, efficient tutoring and collaboration in web-based learning environment. And experiments do prove that it is feasible to use the method to develop an individual web-based learning system, which is valuable for further study in more depth. (C) 2009 Elsevier B.V. All rights reserved.
In this paper the problem of Contiguous Item Sequential Pattern ( CISP) mining is presented as a sequential pattern mining problem under two constraints. First, each element in a sequence consists of only one item. Se...
详细信息
In this paper the problem of Contiguous Item Sequential Pattern ( CISP) mining is presented as a sequential pattern mining problem under two constraints. First, each element in a sequence consists of only one item. Second, items appearing in the sequences that contain a pattern must be adjacent with respect to the underlying order as they appear in the pattern. Even though the problem of CISP mining can be solved by using previous approaches on sequential pattern mining under a general constraint description framework, this may lead to poor performance due to the large searching space. To efficiently solve this problem, a new data structure, UpDown Tree, is proposed for CISP mining. UpDown Tree based approach can greatly improve the efficiency of CISP mining in terms of both time and memory comparing to previous approaches. An extensive experimental study has shown promising results with our approach.
Contiguous Sequential Pattern (CSP) mining is an important problem with many applications. Using general sequential pattern miningalgorithms for CSP mining may lead to poor performance due to the lack of consideratio...
详细信息
ISBN:
(纸本)9781424441150
Contiguous Sequential Pattern (CSP) mining is an important problem with many applications. Using general sequential pattern miningalgorithms for CSP mining may lead to poor performance due to the lack of consideration on the contiguous property of CSP. In this paper we present a two stage approach for CSP mining. We first detect frequent itemsets in a database, based on which we partition the CSPs into subsets and apply a special data structure, General UpDown Tree, to detect all the patterns in each subset. The General Updown Tree exploits the contiguous property of CSPs to achieve a compact representation of all the sequences that contain an item. Such compact representation enables us to apply a top down approach for CSP mining and eliminates unnecessary candidate evaluation. Experiment results show that our approach is more efficient compared to previous approaches in terms of both time and space.
BACKGROUND:Identifying gene functional modules is an important step towards elucidating gene functions at a global scale. Clustering algorithms mostly rely on co-expression of genes, that is group together genes havin...
详细信息
BACKGROUND:Identifying gene functional modules is an important step towards elucidating gene functions at a global scale. Clustering algorithms mostly rely on co-expression of genes, that is group together genes having similar expression profiles.
RESULTS:We propose to cluster genes by co-regulation rather than by co-expression. We therefore present an inference algorithm for detecting co-regulated groups from gene expression data and introduce a method to cluster genes given that inferred regulatory structure. Finally, we propose to validate the clustering through a score based on the GO enrichment of the obtained groups of genes.
CONCLUSION:We evaluate the methods on the stress response of S. Cerevisiae data and obtain better scores than clustering obtained directly from gene expression.
In this paper, a data mining algorithm based on Rough Set Theory (RS) is discussed, which is used to extract decision-making rule from data set. The basic concepts of RS are introduced and used in the algorithm. An ex...
详细信息
ISBN:
(纸本)0780386299
In this paper, a data mining algorithm based on Rough Set Theory (RS) is discussed, which is used to extract decision-making rule from data set. The basic concepts of RS are introduced and used in the algorithm. An example that uses the algorithm to acquire designing rule from the knowledge database of Relay Expert System is discussed in the paper.
Discovering contrasts between collections of data is an important task in datamining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal...
详细信息
Discovering contrasts between collections of data is an important task in datamining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in sequences of another class. It is a natural way of representing strong and succinct contrast information between two sequential datasets and can be useful in applications such as protein comparison, document comparison and building sequential classification models. mining MDS patterns is a challenging task and is significantly different from mining contrasts between relational/transactional data. One particularly important type of constraint that can be integrated into the mining process is the gap constraint. We present an efficient algorithm called ConSGapMiner (Contrast Sequences with Gap Miner), to mine all MDSs satisfying a minimum and maximum gap constraint, plus a maximum length constraint. It employs highly efficient bitset and boolean operations, for powerful gap-based pruning within a prefix growth framework. A performance evaluation with both sparse and dense datasets, demonstrates the scalability of ConSGapMiner and shows its ability to mine patterns from high dimensional datasets at low supports.
We propose a general mechanism to represent the spatial transactions in a way that allows the use of the existing datamining methods. Our proposal allows the analyst to exploit the layered structure of geographical i...
详细信息
We propose a general mechanism to represent the spatial transactions in a way that allows the use of the existing datamining methods. Our proposal allows the analyst to exploit the layered structure of geographical information systems in order to define the layers of interest and the relevant spatial relations among them. Given a reference object, it is possible to describe its neighborhood by considering the attribute of the object itself and the objects related by the chosen relations. The resulting spatial transactions may be either considered like "traditional" transactions, by considering only the qualitative spatial relations, or their spatial extension can be exploited during the datamining process. We explore both these cases. First we tackle the problem of classifying a spatial dataset, by taking into account the spatial component of the data to compute the statistical measure (i.e., the entropy) necessary to learn the model. Then, we consider the task of extracting spatial association rules, by focusing on the qualitative representation of the spatial relations. The feasibility of the process has been tested by implementing the proposed method on top of a GIS tool and by analyzing real world data.
The method of Concept Hierarchy Tree Classifiers(CHTC) has been widely applied in today's large-scaledatamining environments. This paper, after introducingthe basic idea of data-miningalgorithm based onCHTC, pro...
详细信息
ISBN:
(纸本)0780379411
The method of Concept Hierarchy Tree Classifiers(CHTC) has been widely applied in today's large-scaledatamining environments. This paper, after introducingthe basic idea of data-miningalgorithm based onCHTC, proposes an improved approach by making upfor the defection of the existing Concept Exaltationalgorithm to numerical attributes in the ***, a comparison of two algorithms in apractical case by testing on actual data has showed theeffectiveness of the new algorithm.
Frequent pattern mining on data streams is of interest recently. However, it is not easy for users to determine a proper frequency threshold. It is more reasonable to ask users to set a bound on the result size. We st...
详细信息
Frequent pattern mining on data streams is of interest recently. However, it is not easy for users to determine a proper frequency threshold. It is more reasonable to ask users to set a bound on the result size. We study the problem of mining top K frequent itemsets in data streams. We introduce a method based on the Chernoff bound with a guarantee of the output quality and also a bound on the memory usage. We also propose an algorithm based on the Lossy Counting algorithm. In most of the experiments of the two proposed algorithms, we obtain perfect solutions and the memory space occupied by our algorithms is very small. Besides, we also propose the adapted approach of these two algorithms in order to handle the case when we are interested in mining the data in a sliding window. The experiments show that the results are accurate.
暂无评论