Recent years have witnessed the emergence of a new class of cooperative edge systems in which a large number of edge nodes can collaborate through local peer-to-peer connectivity. In this paper, we propose CoEdge, a n...
详细信息
ISBN:
(纸本)9798400701184
Recent years have witnessed the emergence of a new class of cooperative edge systems in which a large number of edge nodes can collaborate through local peer-to-peer connectivity. In this paper, we propose CoEdge, a novel cooperative edge system that can support concurrent data/compute-intensive deeplearning (DL) models for distributed real-time applications such as city-scale traffic monitoring and autonomous driving. First, CoEdge includes a hierarchical DL task scheduling framework that dispatches DL tasks to edge nodes based on their computational profiles, communication overhead, and real-time requirements. Second, CoEdge can dramatically increase the execution efficiency of DL models by batching sensor data and aggregating the inferences of the same model. Finally, we propose a new edge containerization approach that enables an edge node to execute concurrent DL tasks by partitioning the CPU and GPU workloads into different containers. We extensively evaluate CoEdge on a self-deployed smart lamppost testbed on a university campus. Our results show that CoEdge can achieve up to 82.32% reduction on deadline missing rate compared to baselines.
distributed training of deep neural networks (DNNs) suffers from efficiency declines in dynamic heterogeneous environments, due to the resource wastage brought by the straggler problem in data parallelism (DP) and pip...
详细信息
distributed training of deep neural networks (DNNs) suffers from efficiency declines in dynamic heterogeneous environments, due to the resource wastage brought by the straggler problem in data parallelism (DP) and pipeline bubbles in model parallelism (MP). Additionally, the limited resource availability requires a trade-off between training performance and long-term costs, particularly in online settings. To address these challenges, this article presents a novel online approach to maximize long-term training efficiency in heterogeneous environments through uneven data assignment and communication-aware model partitioning. A group-based hierarchical architecture combining DP and MP is developed to balance discrepant computation and communication capabilities, and offer a flexible parallel mechanism. In order to jointly optimize the performance and long-term cost of the online DL training process, we formulate this problem as a stochastic optimization with time-averaged constraints. By utilizing Lyapunov's stochastic network optimization theory, we decompose it into several instantaneous sub-optimizations, and devise an effective online solution to address them based on tentative searching and linear solving. We have implemented a prototype system and evaluated the effectiveness of our solution based on realistic experiments, reducing batch training time by up to 68.59% over state-of-the-art methods.
Building a distributeddeeplearning (DDL) system on HPC clusters that guarantees convergence speed and scalability for the training of DNNs is challenging. The HPC cluster, which consists of multiple high density mul...
详细信息
Building a distributeddeeplearning (DDL) system on HPC clusters that guarantees convergence speed and scalability for the training of DNNs is challenging. The HPC cluster, which consists of multiple high density multi-GPU servers connected by the Infiniband network (HDGib), compresses the computing and communication time for distributed DNNs' training but brings new challenges. The convergence time is far from linear scalability (with respect to the number of workers) for parallel DNNs training. We thus analyze the optimization process and identify three key issues that cause scalability degradation. First, the high-frequency update for parameters due to the compression of the computing and communication times exacerbates the stale gradient problem, which slows down the convergence. Second, the previous methods used to constrain the gradient noise (stochastic error) of the SGD are outdated, as HDGib clusters can support more strict constraints due to the Infiniband network connections, which can further constrain the stochastic error. Third, the same learning rate for all workers is inefficient due to the different training stages of each worker. We thus propose a momentum-driven adaptive synchronization model that focuses on solving the above issues and accelerating the training procedure on HDGib clusters. Our adaptive k-synchronization algorithm uses the momentum term to absorb the stale gradients and adaptively bind the stochastic error to provide an approximate optimal descent direction for the distributed SGD. Our model also includes an individual dynamic learning rate search method for each worker to further improve training performance. Compared with previous linear and exponent decay methods, it can provide a more precise descent distance for distributed SGD based on different training stages. Extensive experimental results indicate that the proposed model effectively improves the training performance of CNNs, which retains high accuracy with a speed-up o
暂无评论