Distributed Stream Processing Systems (DSPSs) are attracting increasing industrial and academic interest as flexible tools to implement scalable and cost-effective on-line analytics applications over Big Data streams....
详细信息
ISBN:
(纸本)9781479978816
Distributed Stream Processing Systems (DSPSs) are attracting increasing industrial and academic interest as flexible tools to implement scalable and cost-effective on-line analytics applications over Big Data streams. Often hosted in private/public cloud deployment environments, DSPSs offer datastream processing services that transparently exploit the distributed computing resources made available to them at runtime. Given the volume of data of interest, possible (hard/soft) real-time processing requirements, and the time-variable characteristics of input datastreams, it is very important for DSPSs to use smart and innovative scheduling techniques that allocate computing resources properly and avoid static over-provisioning. In this paper, we originally investigate the suitability of exploiting application-level indications about differentiated priorities of different stream processing tasks to enable application-specific DSPS resource scheduling, e.g., capable of re-shaping processing resources in order to dynamically follow input data peaks of prioritized tasks, with no static over-provisioning. We originally propose a general and simple technique to design and implement priority-based resource scheduling in flow-graph-based DSPSs, by allowing application developers to augment DSPS graphs with priority metadata and by introducing an extensible set of priority schemas to be automatically handled by the extended DSPS. In addition, we show the effectiveness of our approach via its implementation and integration in our Quasit DSPS and through experimental evaluation of this prototype on a real-world stream processing application of Big Data vehicular traffic analysis.
暂无评论