检索结果-内蒙古大学图书馆

An elastic reconfiguration strategy for operators in distributed stream computing systems

JOURNAL OF SUPERcomputing 2025年第5期81卷 1-31页

作者： Sun, Dawei Fan, Yinuo Guan, Chengjun Rong, Jia Gao, Shang Buyya, Rajkumar China Univ Geosci Sch Informat Engn Beijing 10083 Peoples R China Monash Univ Dept Data Sci & AI Clayton Vic 3800 Australia Deakin Univ Sch Informat Technol Geelong Vic 3216 Australia Univ Melbourne Sch Comp & Informat Syst Quantum Cloud Comp & Distributed Syst qCLOUDS Lab Parkville Vic Australia

Low latency and high throughput are crucial for distributed stream computing systems. Existing operator reconfiguration strategies often have poor performance under resource-limited and latency-constraint scenarios. The challenge lies in the elasticity of operator parallelism and reconfiguration of operators that balances performance constraints and performance improvement. To address these issues, we propose Er-stream, an elastic reconfiguration strategy for various application scenarios. This paper discusses the Er-stream from the following aspects: (1) We model task topology as a queuing network to evaluate system latency, and construct a communication cost model to formalize the reconfiguration problem;(2) we proposed an elastic strategy for operator parallelism to rationally utilize the available resources and reduce the processing latency of topology;(3) we proposed a reconfiguration strategy for operators to reduce the communication cost, and set thresholds added to control its trigger frequency;(4) we design and implement Er-stream and integrated it into Apache Storm. We evaluate key metrics such as latency, throughput, resource usage, and CPU utilization in a real-world distributed stream computing environment. Results demonstrate significant improvements achieved by Er-stream. In comparison with Storm's existing strategies, it reduces average system latency by up to 30%, increases average system throughput by 1.89 times, lowers average resource usage by 26.6%, and increases CPU utilization by 19.8%.

关键词： Operator parallelism Elastic strategy Operator reconfiguration distributed stream computing Storm

来源：评论

学校读者我要写书评

暂无评论

streamBench: Towards Benchmarking Modern distributed stream computing Frameworks 14

StreamBench: Towards Benchmarking Modern Distributed Stream ...

引用

IEEE/ACM 7th International Conference on Utility and Cloud computing (UCC)

作者： Lu, Ruirui Wu, Gang Xie, Bin Hu, Jingtong Shanghai Jiao Tong Univ Shanghai 200030 Peoples R China Nanjing Univ Sci & Technol Nanjing Jiangsu Peoples R China Oklahoma State Univ Stillwater OK 74078 USA

ISBN: (纸本)9781479978816

While big data is becoming ubiquitous, interest in handling data stream at scale is also gaining popularity, which leads to the sprout of many distributed stream computing systems. However, complexity of stream computing and diversity of workloads expose great challenges to benchmark these systems. Due to lack of standard criteria, evaluations and comparisons of these systems tend to be difficult. This paper takes an early step towards benchmarking modern distributed stream computing frameworks. After identifying the challenges and requirements in the field, we raise our benchmark definition streamBench regarding the requirements. streamBench proposes a message system functioning as a mediator between stream data generation and consumption. It also covers 7 benchmark programs that intend to address typical stream computing scenarios and core operations. Not only does it care about performance of systems under different data scales, but also takes fault tolerance ability and durability into account, which drives to incorporate four workload suites targeting at these various aspects of systems. Finally, we illustrate the feasibility of streamBench by applying it to two popular frameworks, Apache Storm and Apache Spark streaming. We draw comparisons from various perspectives between the two platforms with workload suites of streamBench. In addition, we also demonstrate performance improvement of Storm's latest version with the benchmark.

关键词： distributed stream computing benchmark big data

来源：评论

学校读者我要写书评

暂无评论

Latency Measurement of Fine-Grained Operations in Benchmarking distributed stream Processing Frameworks 7

Latency Measurement of Fine-Grained Operations in Benchmarki...

引用

IEEE International Congress on Big Data (IEEE BigData) Part of the IEEE World Congress on Services

作者： van Dongen, Giselle Steurtewagen, Bram Van den Poel, Dirk Univ Ghent Ghent Belgium

ISBN: (纸本)9781538672327

This paper describes a benchmark for stream processing frameworks allowing accurate latency benchmarking of fine-grained individual stages of a processing pipeline. By determining the latency of distinct common operations in the processing flow instead of the end-to-end latency, we can form guidelines for efficient processing pipeline design. Additionally, we address the issue of defining time in distributed systems by capturing time on one machine and defining the baseline latency. We validate our benchmark for Apache Flink using a processing pipeline comprising common stream processing operations. Our results show that joins are the most time consuming operation in our processing pipeline. The latency incurred by adding a join operation is 4.5 times higher than for a parsing operation, and the latency gradually becomes more dispersed after adding additional stages.

关键词： big data applications distributed stream computing benchmark Flink Kafka

来源：评论

学校读者我要写书评

暂无评论

Elastic Scaling of Stateful Operators Over Fluctuating Data streams

引用

IEEE TRANSACTIONS ON SERVICES computing 2024年第6期17卷 3555-3568页

作者： Wu, Minghui Sun, Dawei Gao, Shang Li, Keqin Buyya, Rajkumar China Univ Geosci Sch Informat Engn Beijing 100083 Peoples R China Deakin Univ Sch Informat Technol Waurn Ponds Vic 3216 Australia SUNY Coll New Paltz Dept Comp Sci New Paltz NY 12561 USA Univ Melbourne Sch Comp & Informat Syst Cloud Comp & Distributed Syst CLOUDS Lab Melbourne Vic 3010 Australia

Elastic scaling of parallel operators has emerged as a powerful approach to reduce response time in stream applications with fluctuating inputs. Many state-of-the-art works focus on stateless operators and change the operator parallelism from one aspect. They often lack efficient management of operator states and overlook the costs associated with resource over-provisioning. To overcome these limitations, we introduce Es-stream for elastic scaling of stateful operators over fluctuating data streams, which includes: 1) We observe that under-provisioning of operator parallelism leads to data pile-up, resulting in longer system latency, while over-provisioning of operator parallelism causes idle instances and additional resource consumption. 2) The Es-stream system scales in two dimensions: the parallelism of operators and the number of resources. It dynamically adjusts operators to an optimal parallelism while scaling the resources used by the stream application. 3) When the parallelism of stateful operators changes, upstream operators backup downstream operators' state and cache the emitted data tuples at dynamic time intervals, ensuring the operator parallelism is adjusted in a low-overhead way. 4) Experimental results demonstrate that Es-stream provides promising performance improvements, reducing the maximum system latency by 3x and saving the maximum state recovery time by 2x, compared to existing state-of-the-art works.

关键词： streams Parallel processing Topology Resource management Data models Computational modeling System performance distributed stream computing operator parallelism resource scaling state management stateful operator

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：