版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Amazon Com Search & Discovery Grp Seattle WA 98109 USA Microsoft Res Bangalore 560080 Karnataka India Microsoft Res Redmond WA USA Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA
出 版 物:《KNOWLEDGE AND INFORMATION SYSTEMS》 (知识和信息系统季刊)
年 卷 期:2013年第37卷第3期
页 面:585-610页
核心收录:
学科分类:0711[理学-系统科学] 07[理学] 08[工学] 070105[理学-运筹学与控制论] 081101[工学-控制理论与控制工程] 0701[理学-数学] 071101[理学-系统理论] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:US National Science Foundation [IIS-0905313, CCF-0937133] Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) [D12PC000337]
主 题:Event sequences Data streams Frequent patterns Pattern discovery Streaming algorithms Approximation algorithms
摘 要:Discovering frequent patterns over event sequences is an important data mining problem. Existing methods typically require multiple passes over the data, rendering them unsuitable for streaming contexts. We present the first streaming algorithm for mining frequent patterns over a window of recent events in the stream. We derive approximation guarantees for our algorithm in terms of: (i) the separation of frequent patterns from the infrequent ones, and (ii) the rate of change of stream characteristics. Our parameterization of the problem provides a new sweet spot in the tradeoff between making distributional assumptions over the stream and algorithmic efficiencies of mining. We illustrate how this yields significant benefits when mining practical streams from neuroscience and telecommunications logs.