Low latency and high throughput are crucial for distributed stream computing systems. Existing operator reconfiguration strategies often have poor performance under resource-limited and latency-constraint scenarios. T...
详细信息
Low latency and high throughput are crucial for distributed stream computing systems. Existing operator reconfiguration strategies often have poor performance under resource-limited and latency-constraint scenarios. The challenge lies in the elasticity of operator parallelism and reconfiguration of operators that balances performance constraints and performance improvement. To address these issues, we propose Er-stream, an elastic reconfiguration strategy for various application scenarios. This paper discusses the Er-stream from the following aspects: (1) We model task topology as a queuing network to evaluate system latency, and construct a communication cost model to formalize the reconfiguration problem;(2) we proposed an elastic strategy for operator parallelism to rationally utilize the available resources and reduce the processing latency of topology;(3) we proposed a reconfiguration strategy for operators to reduce the communication cost, and set thresholds added to control its trigger frequency;(4) we design and implement Er-stream and integrated it into Apache Storm. We evaluate key metrics such as latency, throughput, resource usage, and CPU utilization in a real-world distributed stream computing environment. Results demonstrate significant improvements achieved by Er-stream. In comparison with Storm's existing strategies, it reduces average system latency by up to 30%, increases average system throughput by 1.89 times, lowers average resource usage by 26.6%, and increases CPU utilization by 19.8%.
While big data is becoming ubiquitous, interest in handling data stream at scale is also gaining popularity, which leads to the sprout of many distributed stream computing systems. However, complexity of stream comput...
详细信息
ISBN:
(纸本)9781479978816
While big data is becoming ubiquitous, interest in handling data stream at scale is also gaining popularity, which leads to the sprout of many distributed stream computing systems. However, complexity of streamcomputing and diversity of workloads expose great challenges to benchmark these systems. Due to lack of standard criteria, evaluations and comparisons of these systems tend to be difficult. This paper takes an early step towards benchmarking modern distributed stream computing frameworks. After identifying the challenges and requirements in the field, we raise our benchmark definition streamBench regarding the requirements. streamBench proposes a message system functioning as a mediator between stream data generation and consumption. It also covers 7 benchmark programs that intend to address typical streamcomputing scenarios and core operations. Not only does it care about performance of systems under different data scales, but also takes fault tolerance ability and durability into account, which drives to incorporate four workload suites targeting at these various aspects of systems. Finally, we illustrate the feasibility of streamBench by applying it to two popular frameworks, Apache Storm and Apache Spark streaming. We draw comparisons from various perspectives between the two platforms with workload suites of streamBench. In addition, we also demonstrate performance improvement of Storm's latest version with the benchmark.
This paper describes a benchmark for stream processing frameworks allowing accurate latency benchmarking of fine-grained individual stages of a processing pipeline. By determining the latency of distinct common operat...
详细信息
ISBN:
(纸本)9781538672327
This paper describes a benchmark for stream processing frameworks allowing accurate latency benchmarking of fine-grained individual stages of a processing pipeline. By determining the latency of distinct common operations in the processing flow instead of the end-to-end latency, we can form guidelines for efficient processing pipeline design. Additionally, we address the issue of defining time in distributed systems by capturing time on one machine and defining the baseline latency. We validate our benchmark for Apache Flink using a processing pipeline comprising common stream processing operations. Our results show that joins are the most time consuming operation in our processing pipeline. The latency incurred by adding a join operation is 4.5 times higher than for a parsing operation, and the latency gradually becomes more dispersed after adding additional stages.
Elastic scaling of parallel operators has emerged as a powerful approach to reduce response time in stream applications with fluctuating inputs. Many state-of-the-art works focus on stateless operators and change the ...
详细信息
Elastic scaling of parallel operators has emerged as a powerful approach to reduce response time in stream applications with fluctuating inputs. Many state-of-the-art works focus on stateless operators and change the operator parallelism from one aspect. They often lack efficient management of operator states and overlook the costs associated with resource over-provisioning. To overcome these limitations, we introduce Es-stream for elastic scaling of stateful operators over fluctuating data streams, which includes: 1) We observe that under-provisioning of operator parallelism leads to data pile-up, resulting in longer system latency, while over-provisioning of operator parallelism causes idle instances and additional resource consumption. 2) The Es-stream system scales in two dimensions: the parallelism of operators and the number of resources. It dynamically adjusts operators to an optimal parallelism while scaling the resources used by the stream application. 3) When the parallelism of stateful operators changes, upstream operators backup downstream operators' state and cache the emitted data tuples at dynamic time intervals, ensuring the operator parallelism is adjusted in a low-overhead way. 4) Experimental results demonstrate that Es-stream provides promising performance improvements, reducing the maximum system latency by 3x and saving the maximum state recovery time by 2x, compared to existing state-of-the-art works.
暂无评论