咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >A binary decision diagram base... 收藏

A binary decision diagram based approach for mining frequent subsequences

一张二进制决定图为采矿基于途径经常的随后

作     者:Loekito, Elsa Bailey, James Pei, Jian 

作者机构:Univ Melbourne Natl ICT Australia NICTA Dept Comp Sci & Software Engn Melbourne Vic Australia Simon Fraser Univ Sch Comp Sci Burnaby BC V5A 1S6 Canada 

出 版 物:《KNOWLEDGE AND INFORMATION SYSTEMS》 (知识和信息系统季刊)

年 卷 期:2010年第24卷第2期

页      面:235-268页

核心收录:

学科分类:0711[理学-系统科学] 07[理学] 08[工学] 070105[理学-运筹学与控制论] 081101[工学-控制理论与控制工程] 0701[理学-数学] 071101[理学-系统理论] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:NICTA Department of Broadband, Communications and the Digital Economy Australian Research Council through the ICT Centre of Excellence 

主  题:Sequential pattern mining Frequent subsequences Binary decision diagram Sequence binary decision diagram SeqBDD 

摘      要:Sequential pattern mining is an important problem in data mining. State of the art techniques for mining sequential patterns, such as frequent subsequences, are often based on the pattern-growth approach, which recursively projects conditional databases. Explicitly creating database projections is thought to be a major computational bottleneck, but we will show in this paper that it can be beneficial when the appropriate data structure is used. Our technique uses a canonical directed acyclic graph as the sequence database representation, which can be represented as a binary decision diagram (BDD). In this paper, we introduce a new type of BDD, namely a sequence BDD (SeqBDD), and show how it can be used for efficiently mining frequent subsequences. A novel feature of the SeqBDD is its ability to share results between similar intermediate computations and avoid redundant computation. We perform an experimental study to compare the SeqBDD technique with existing pattern growth techniques, that are based on other data structures such as prefix trees. Our results show that a SeqBDD can be half as large as a prefix tree, especially when many similar sequences exist. In terms of mining time, it can be substantially more efficient when the support is low, the number of patterns is large, or the input sequences are long and highly similar.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分