咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Structured Downsampling for Fa... 收藏
arXiv

Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams

作     者:Moreno, Matthew Andres Zaman, Luis Dolson, Emily 

作者机构:Department of Ecology and Evolutionary Biology Ann Arbor United States Center for the Study of Complex Systems Ann Arbor United States Michigan Institute for Data and AI in Society Ann Arbor United States University of Michigan Ann Arbor United States Department of Computer Science and Engineering East Lansing United States Program in Ecology Evolution and Behavior East Lansing United States Michigan State University East Lansing United States 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Data streams 

摘      要:Operations over data streams typically hinge on efficient mechanisms to aggregate or summarize history on a rolling basis. For high-volume data steams, it is critical to manage state in a manner that is fast and memory efficient - particularly in resource-constrained or real-time contexts. Here, we address the problem of extracting a fixed-capacity, rolling subsample from a data stream. Specifically, we explore data stream curation strategies to fulfill requirements on the composition of sample time points retained. Our DStream suite of algorithms targets three temporal coverage criteria: (1) steady coverage, where retained samples should spread evenly across elapsed data stream history;(2) stretched coverage, where early data items should be proportionally favored;and (3) tilted coverage, where recent data items should be proportionally favored. For each algorithm, we prove worst-case bounds on rolling coverage quality. In contrast to previous work by Moreno, Rodriguez Papa, and Dolson (2024), which dynamically scales memory use to guarantee a specified level of coverage quality, here we focus on the more practical, application-driven case of maximizing coverage quality given a fixed memory capacity. As a core simplifying assumption, we restrict algorithm design to a single update operation: writing from the data stream to a calculated buffer site - with data never being read back, no metadata stored (e.g., sample timestamps), and data eviction occurring only implicitly via overwrite. Drawing only on primitive, low-level operations and ensuring full, overhead-free use of available memory, this DStream framework ideally suits domains that are resource-constrained (e.g., embedded systems), performance-critical (e.g., real-time), and fine-grained (e.g., individual data items as small as single bits or bytes). In particular, proposed power-of-two-based buffer layout schemes support O(1) data ingestion via concise bit-level operations. To further practical applica

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分