With the development of artificial intelligence(AI) applications, a large number of data are generated from mobile or IoT devices at the edge of the network. deep learning tasks are executed to obtain effective inform...
详细信息
ISBN:
(纸本)9781665435741
With the development of artificial intelligence(AI) applications, a large number of data are generated from mobile or IoT devices at the edge of the network. deep learning tasks are executed to obtain effective information in the user data. However, the edge nodes are heterogeneous and the network bandwidth is limited in this case, which will cause general distributeddeep learning to be inefficient. In this paper, we propose Group Synchronous Parallel (GSP), which uses a density-based algorithm to group edge nodes with similar training speeds together. In order to eliminate stragglers, group parameter servers are responsible for coordinating communication of nodes in the group with Stale Synchronous Parallel and aggregating the gradients of these nodes. And a global parameter server is responsible for aggregating the gradients from the group parameter servers to update the global model. To save network bandwidth, we further propose Grouping Dynamic Sparsification (GDS). It adjusts the gradient sparsification rate of nodes dynamically based on GSP so as to differentiates the communication volume and makes the training speed of all nodes tend to be the same. We evaluate GSP and GDS's performance on LeNet-5, ResNet, VGG, and Seq2Seq with Attention. The experimental results show that GSP speedups the training by 45% similar to 120% with 16 nodes. GDS on top of GSP can make up for some test accuracy loss, up to 0.82% for LeNet-5.
暂无评论