We propose SCoOL, a programming model and its corresponding parallel runtime systems for implementing optimization problem solvers. In SCoOL, users specify what task is performed for a point in a given search space, a...
详细信息
ISBN:
(纸本)9798350383225
We propose SCoOL, a programming model and its corresponding parallel runtime systems for implementing optimization problem solvers. In SCoOL, users specify what task is performed for a point in a given search space, and what global information should be maintained during the search. The resulting optimization program is then efficiently executed in a BSP-style on a shared or distributed memory computers by a parallel runtime provided with the model. In the paper, we show details of our scalable runtime for distributed memory clusters, including algorithms for work stealing and tasks rebalancing. To benchmark the platform, we implement solutions to several optimization problems and provide performance analysis for Quadratic Assignment Problem, Parent Set Assignment, and Bayesian Networks Structure Learning. Our solvers show strong scaling on a cluster with 1,280 cores, significantly outperforming the current state-of-the-art solvers in Bayesian networks learning.
Cloud-edge collaborative computing paradigm is a promising solution to high-resolution video analytics systems. The key lies in reducing redundant data and managing fluctuating inference workloads effectively. Previou...
详细信息
ISBN:
(纸本)9798350386066;9798350386059
Cloud-edge collaborative computing paradigm is a promising solution to high-resolution video analytics systems. The key lies in reducing redundant data and managing fluctuating inference workloads effectively. Previous work has focused on extracting regions of interest (RoIs) from videos and transmitting them to the cloud for processing. However, a naive Infrastructure as a Service (IaaS) resource configuration falls short in handling highly fluctuating workloads, leading to violations of Service Level Objectives (SLOs) and inefficient resource utilization. Besides, these methods neglect the potential benefits of RoIs batching to leverage parallel processing. In this work, we introduce Tangram, an efficient serverless cloud-edge video analytics system fully optimized for both communication and computation. Tangram adaptively aligns the RoIs into patches and transmits them to the scheduler in the cloud. The system employs a unique "stitching" method to batch the patches with various sizes from the edge cameras. Additionally, we develop an online SLO-aware batching algorithm that judiciously determines the optimal invoking time of the serverless function. Experiments on our prototype reveal that Tangram can reduce bandwidth consumption and computation cost up to 74.30% and 66.35%, respectively, while maintaining SLO violations within 5% and the accuracy loss negligible.
In recent years, in order to accelerate the construction of new power systems, power companies have carried out a series of information system construction, generating a massive amount of power related business data, ...
详细信息
As the world becomes more connected and new digital services emerge at a fast pace, the amount of network traffic increases rapidly. Consequently, processing requirements become more varied and drive the need for flex...
详细信息
ISBN:
(纸本)9798350363074;9798350363081
As the world becomes more connected and new digital services emerge at a fast pace, the amount of network traffic increases rapidly. Consequently, processing requirements become more varied and drive the need for flexible packet-processing designs, especially as in-network computing gains traction. Traditional approaches deploy hardware accelerators in a pipeline in the sequence that the associated tasks are supposed to be executed. Hence, they do not accommodate flows with different processing requirements and provide no possibility to remap flows to task sequences in runtime. In order to address these limitations, we propose FlexRoute, a fast, flexible and priority-aware packet-processing design that can process network traffic at a rate of over 100 Gbit/s on FPGAs. Our design consists of a reconfigurable parser and several processing engines that are arranged in a pipeline. The processing engines are equipped with processing units that execute specific tasks, flexible forwarding logic and priority-aware queuing/scheduling logic. We implement a prototype of FlexRoute in Verilog and evaluate it via cycle-accurate register-transfer level simulations. We also synthesize and implement our design on the Alveo U55C High Performance Compute Card and show its resource usage. The evaluation results demonstrate that FlexRoute can process packets of arbitrary size with different processing requirements at a traffic rate of about 70 Gbit/s significantly faster than two state-of-the-art flexible packet-processing designs.
Extreme climate change is the major ecological crisis which mankind encounters at present. The core of coping with climate change is to reduce greenhouse gas emissions, among which is mainly carbon dioxide emissions f...
详细信息
Extreme climate change is the major ecological crisis which mankind encounters at present. The core of coping with climate change is to reduce greenhouse gas emissions, among which is mainly carbon dioxide emissions from fossil energy combustion. Energy substitution of large-scale new energy power generation is an effective solution to the problems. How to simulate and calculate such a large number of wind power generations for planning or stability analysis is a technical challenge, especially the detailed research on the interactive characteristics for hundreds of models with IGBT converter. In this technical field, inspired by the decentralized and distributed ideas of Ecological Marxism theory, decoupling and grid division calculation is carried out for the electromagnetic transient model of a large number of DFIGs. Through the design and development of the data-driven large-scale parallelcomputing framework, multi-stage high-speed task pipelined parallel calculation is realized. It effectively solves the calculation difficulties in matrix dimension disaster caused by the number increase of wind turbines, theoretically breaks the ceiling effect of fine-grain simulation for large-scale wind power generations, and also waveforms comparison verifies the feasibility and correctness. (C) 2022 The Author(s). Published by Elsevier Ltd.
Load imbalance often occurs in particle-in-cell simulations on parallelcomputing, which seriously affects the efficiency of applications. Due to the characteristics of multilevel parallelism and communication asymmet...
详细信息
In modern society, the energy problem has become increasingly prominent. In order to achieve sustainable and efficient energy utilization, microgrid technology came into being. Microgrid is a small power system with a...
详细信息
The increasing quality and availability of Quantum Processing Units (QPUs) is fueling a growing interest in quantum computing across many technological areas. The resulting increase in demand for QPU resources necessi...
详细信息
ISBN:
(纸本)9798331541378
The increasing quality and availability of Quantum Processing Units (QPUs) is fueling a growing interest in quantum computing across many technological areas. The resulting increase in demand for QPU resources necessitates Quantum computing as a Service (QCaaS) providers to support a high throughput of quantum workloads. A major runtime bottleneck in current QCaaS software stacks is the computationally-intensive compilation step which requires significant compute. To address this, Oxford Quantum Circuits has introduced distributed compilation whereby quantum programs are compiled in parallel and stored until the QPU is available. This has replaced our previous serial compilation approach where each program was compiled immediately prior to execution. From experiments using our production compilers and a simulated backend representing the QPU, we show that distributed compilation has resulted in a 78% reduction in processing time as compared to serial compilation. This demonstrates that there are sizeable performance gains to program throughput attainable through the introduction of distributed compilation into a QCaaS architecture. We posit that the usefulness of this feature will only grow given the increasing complexity of quantum programs and the growing popularity of quantum -classical hybrid algorithms.
The overset grid method is widely employed to solve moving boundary problems in numerical simulations. However, the heavy and inevitable communication resulting from boundary movements severely impedes the improvement...
详细信息
ISBN:
(纸本)9798400717932
The overset grid method is widely employed to solve moving boundary problems in numerical simulations. However, the heavy and inevitable communication resulting from boundary movements severely impedes the improvement of parallel efficiency. This paper proposes a Motion Trace Decomposition (MTD) method to alleviate this issue. The MTD method minimizes communication overhead between processors by decomposing sub-grids and distributing them according to the object motion trajectory, negating the need to reproduce communication areas when boundaries move. Various tests were conducted to evaluate the MTD method, incorporating diverse motion types, such as displacement and rotation. Results from experimental simulations with 1.9 x 10(6) grid cells indicate that the proposed method enhances the parallel efficiency of the assembly process by up to 20.35% using 72 processors. These findings showcase the significant potential of the MTD method in alleviating communication challenges associated with simulating moving boundary problems using overset grids.
The electricity system with penetration of a massive number of renewable generation sources needs to consider various demands, such as power generation efficiency, the service life of energy storage devices, and its i...
详细信息
暂无评论