Big data workflows are widely used in IoT, recommended systems, and real-time vision applications, and they continue to grow in complexity. These hybrid workflows consist of both resource-intensive batch jobs and late...
详细信息
ISBN:
(纸本)9798350387339
Big data workflows are widely used in IoT, recommended systems, and real-time vision applications, and they continue to grow in complexity. These hybrid workflows consist of both resource-intensive batch jobs and latency-sensitive stream jobs. Examples include the data analytics workflow, which incorporates batch data transformations and low-latency querying, and the machine learning workflow, which processes stream data feature extraction before performing batch training and low-latency inference. However, existing research on workflow scheduling primarily focuses on either stream or batchworkflows, neglecting the efficient scheduling of hybrid workflows that respect their diverse resource requirements and the costly data transfers between them. In this article, we propose a hybrid workflow model that defines the optimal placement of hybrid workflows (OHWP) as a bi-objective optimization problem. Our proposed model takes into account parameters related to inter-communication between stream and batch jobs, as well as the heterogeneous resources in JointCloud environment. Additionally, we present OHWP-PS (OHWP on a Pruned Space), a scheduling algorithm for hybrid workflows that minimizes both cost and latency by improving the initial population and dynamically updating the search space. The results demonstrate that the proposed OHWP-PS algorithm is effective and competitive across all experiments.
暂无评论