pipelineparallelism is a distributed method used to train deep neural networks and is suitable for tasks that consume large amounts of memory. However, this method entails a large overhead because of the dependency b...
详细信息
pipelineparallelism is a distributed method used to train deep neural networks and is suitable for tasks that consume large amounts of memory. However, this method entails a large overhead because of the dependency between devices for performing forward and backward steps using multiple accelerator devices. Although a method to remove forward step dependency through the all-to-all approach has been proposed for training compute-intensive models, it incurs a large overhead when training with many devices and is inefficient with respect to weight memory consumption. Alternatively, we propose a pipelineparallelism method that reduces both network communication using a self-generation concept and overhead by minimizing the weight memory used for acceleration. In a DarkNet53 training throughput experiment using six devices, the proposed method outperforms a baseline by approximately 63.7% in reduction of overhead and communication costs and achieves less memory consumption by approximately 17.0%.
With the increasing proliferation of Internet -of -Things (IoT) devices, it is a growing trend toward training a deep neural network (DNN) model in pipelineparallelism across resource -constraint IoT devices. To ensu...
详细信息
With the increasing proliferation of Internet -of -Things (IoT) devices, it is a growing trend toward training a deep neural network (DNN) model in pipelineparallelism across resource -constraint IoT devices. To ensure the model convergence and accuracy, synchronous pipeline parallelism is usually adopted. However, the synchronouspipeline can incur a long waiting time due to its gradient aggregation of all microbatches. It is urgent for a DNN model to design an adaptive partitioning and efficient scheduling scheme in heterogeneous IoT environment. To address this problem, we propose a policy gradient based model partitioning and scheduling scheme (PG-MPSS) to minimize per -iteration training time. More specifically, we first design a double -network framework to divide and schedule a DNN model. Then, we adopt a policy gradient algorithm to update the double -network parameters, aiming at learning an optimal double -network model. We conduct extensive experiments to compare the DNN training time of the PG-MPSS scheme with that of Dynamic Programming (DP), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Average&Greedy (AG) and Proximal Policy Optimization (PPO) five baseline algorithms under different experimental settings. The related experimental results demonstrate that the PG-MPSS scheme can greatly expedite synchronouspipeline training of a DNN model.
暂无评论