版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Ecology and Evolutionary Biology Ann Arbor United States Center for the Study of Complex Systems Ann Arbor United States Michigan Institute for Data and AI in Society Ann Arbor United States University of Michigan Ann Arbor United States Department of Computer Science and Engineering East Lansing United States Program in Ecology Evolution and Behavior East Lansing United States Michigan State University East Lansing United States
出 版 物:《arXiv》 (arXiv)
年 卷 期:2024年
核心收录:
摘 要:Operations over data streams typically hinge on efficient mechanisms to aggregate or summarize history on a rolling basis. For high-volume data steams, it is critical to manage state in a manner that is fast and memory efficient - particularly in resource-constrained or real-time contexts. Here, we address the problem of extracting a fixed-capacity, rolling subsample from a data stream. Specifically, we explore data stream curation strategies to fulfill requirements on the composition of sample time points retained. Our DStream suite of algorithms targets three temporal coverage criteria: (1) steady coverage, where retained samples should spread evenly across elapsed data stream history;(2) stretched coverage, where early data items should be proportionally favored;and (3) tilted coverage, where recent data items should be proportionally favored. For each algorithm, we prove worst-case bounds on rolling coverage quality. In contrast to previous work by Moreno, Rodriguez Papa, and Dolson (2024), which dynamically scales memory use to guarantee a specified level of coverage quality, here we focus on the more practical, application-driven case of maximizing coverage quality given a fixed memory capacity. As a core simplifying assumption, we restrict algorithm design to a single update operation: writing from the data stream to a calculated buffer site - with data never being read back, no metadata stored (e.g., sample timestamps), and data eviction occurring only implicitly via overwrite. Drawing only on primitive, low-level operations and ensuring full, overhead-free use of available memory, this DStream framework ideally suits domains that are resource-constrained (e.g., embedded systems), performance-critical (e.g., real-time), and fine-grained (e.g., individual data items as small as single bits or bytes). In particular, proposed power-of-two-based buffer layout schemes support O(1) data ingestion via concise bit-level operations. To further practical applica