版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Zhejiang Univ Hangzhou 310027 Peoples R China Univ Surrey 5GIC& 6G Inst Commun Syst ICS Guildford GU2 7XH England Khalifa Univ KU 6G Res Ctr Dept Comp & Informat Engn Abu Dhabi 127788 U Arab Emirates
出 版 物:《IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING》 (IEEE Trans. Netw. Sci. Eng.)
年 卷 期:2025年第12卷第2期
页 面:1080-1095页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 0701[理学-数学]
基 金:National Key Research and Development Project [2022YFB2901600] Zhejiang Provincial Natural Science Foundation of China [LZ22F010008] China Scholarship Council program
主 题:Tensors Training Processor scheduling Parallel processing Dynamic scheduling Computational modeling Artificial neural networks Computer architecture Mathematical models Energy consumption Distributed deep learning data parallelism communication scheduling tensor partitioning generative pre-trained transformer (GPT)
摘 要:Simultaneous tensor communication can effectively improve the scalability of distributed deep learning on large clusters. However, a fixed number of tensor blocks communicated concurrently violates the priority-based scheduling strategy and cannot minimize communication overheads. In this paper, we propose a novel simultaneous tensor communication framework, namely D-Credit, which transmits tensor blocks based on dynamic sliding windows to minimize per-iteration time in distributed DNN training. We build the mathematical model of D-Credit in two phases: (1) the overlap of gradient communication and backward propagation, and (2) the overlap of gradient communication and forward computation. We drive the optimal window sizes for the second phase analytically, and develop a greedy algorithm to efficiently determine the dynamic window sizes for the first phase of D-Credit. We implement the D-Credit architecture on PyTorch framework. Experimental results on two different GPU clusters demonstrate that at training speed, D-Credit can achieve up to 1.26x, 1.21x, 1.48x and 1.53x speedup compared to ByteScheduler, DeAR, PyTorch-DDP and WFBP, respectively. At energy consumption, D-Credit saves up to 17.8% and 25.1% of the training energy consumption compared to ByteScheduler and WFBP, respectively.