版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Peking Univ Sch Comp Sci Beijing 100871 Peoples R China
出 版 物:《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 (IEEE Trans Knowl Data Eng)
年 卷 期:2025年第37卷第3期
页 面:1311-1324页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:School of Computer Science Peking University
主 题:Data mining Accuracy Estimation Frequency estimation Distributed databases Scheduling Machine learning algorithms Entropy Data analysis Schedules Distributed data streams frequency-based mining tasks unbiased sketch scheduling policy
摘 要:In distributed data stream mining, we abstract a MIMO scenario where a stream of multiple items is mined by multiple nodes. We design a framework named MimoSketch for the MIMO-specific scenario, which improves the fundamental mining tasks of item frequency estimation, item size distribution estimation, heavy hitter detection, heavy change detection, and entropy estimation. MimoSketch consists of an algorithm design and a policy to schedule items to nodes. MimoSketch s algorithm applies random counting to preserve a mathematically proven unbiasedness property, which makes it friendly to the aggregate query on multiple nodes;its memory layout is dynamically adaptive to the runtime item size distribution, which maximizes the estimation accuracy by storing more items. MimoSketch s scheduling policy balances items among nodes, avoiding nodes being overloaded or underloaded, which improves the overall mining accuracy. Our prototype and evaluation show that our algorithm can improve the accuracy of five typical mining tasks by an order of magnitude compared with the state-of-the-art solutions, and the scheduling policy further promotes the performance in MIMO scenarios.