文献详情 >Structured Downsampling for Fa... 收藏

arXiv

Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams

作者：Moreno, Matthew Andres Zaman, Luis Dolson, Emily

作者机构：Department of Ecology and Evolutionary Biology Ann Arbor United States Center for the Study of Complex Systems Ann Arbor United States Michigan Institute for Data and AI in Society Ann Arbor United States University of Michigan Ann Arbor United States Department of Computer Science and Engineering East Lansing United States Program in Ecology Evolution and Behavior East Lansing United States Michigan State University East Lansing United States

出版物：《arXiv》 (arXiv)

年卷期：2024年

核心收录：

主　　题：Data streams

摘要：Operations over data streams typically hinge on efficient mechanisms to aggregate or summarize history on a rolling basis. For high-volume data steams, it is critical to manage state in a manner that is fast and memory efficient - particularly in resource-constrained or real-time contexts. Here, we address the problem of extracting a fixed-capacity, rolling subsample from a data stream. Specifically, we explore data stream curation strategies to fulfill requirements on the composition of sample time points retained. Our DStream suite of algorithms targets three temporal coverage criteria: (1) steady coverage, where retained samples should spread evenly across elapsed data stream history;(2) stretched coverage, where early data items should be proportionally favored;and (3) tilted coverage, where recent data items should be proportionally favored. For each algorithm, we prove worst-case bounds on rolling coverage quality. In contrast to previous work by Moreno, Rodriguez Papa, and Dolson (2024), which dynamically scales memory use to guarantee a specified level of coverage quality, here we focus on the more practical, application-driven case of maximizing coverage quality given a fixed memory capacity. As a core simplifying assumption, we restrict algorithm design to a single update operation: writing from the data stream to a calculated buffer site - with data never being read back, no metadata stored (e.g., sample timestamps), and data eviction occurring only implicitly via overwrite. Drawing only on primitive, low-level operations and ensuring full, overhead-free use of available memory, this DStream framework ideally suits domains that are resource-constrained (e.g., embedded systems), performance-critical (e.g., real-time), and fine-grained (e.g., individual data items as small as single bits or bytes). In particular, proposed power-of-two-based buffer layout schemes support O(1) data ingestion via concise bit-level operations. To further practical applica

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：