文献详情 >A Dynamic Sliding Window Based... 收藏

A Dynamic Sliding Window Based Tensor Communication Scheduling Framework for Distributed Deep Learning

作者：Gao, Yunqi Hu, Bing Mashhadi, Mahdi Boloursaz Wang, Wei Tafazolli, Rahim Debbah, Merouane

作者机构：Zhejiang Univ Hangzhou 310027 Peoples R China Univ Surrey 5GIC& 6G Inst Commun Syst ICS Guildford GU2 7XH England Khalifa Univ KU 6G Res Ctr Dept Comp & Informat Engn Abu Dhabi 127788 U Arab Emirates

出版物：《IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING》 (IEEE Trans. Netw. Sci. Eng.)

年卷期：2025年第12卷第2期

页面：1080-1095页

核心收录：

学科分类：0808[工学-电气工程] 08[工学] 0701[理学-数学]

基　　金：National Key Research and Development Project [2022YFB2901600] Zhejiang Provincial Natural Science Foundation of China [LZ22F010008] China Scholarship Council program

主　　题：Tensors Training Processor scheduling Parallel processing Dynamic scheduling Computational modeling Artificial neural networks Computer architecture Mathematical models Energy consumption Distributed deep learning data parallelism communication scheduling tensor partitioning generative pre-trained transformer (GPT)

摘要：Simultaneous tensor communication can effectively improve the scalability of distributed deep learning on large clusters. However, a fixed number of tensor blocks communicated concurrently violates the priority-based scheduling strategy and cannot minimize communication overheads. In this paper, we propose a novel simultaneous tensor communication framework, namely D-Credit, which transmits tensor blocks based on dynamic sliding windows to minimize per-iteration time in distributed DNN training. We build the mathematical model of D-Credit in two phases: (1) the overlap of gradient communication and backward propagation, and (2) the overlap of gradient communication and forward computation. We drive the optimal window sizes for the second phase analytically, and develop a greedy algorithm to efficiently determine the dynamic window sizes for the first phase of D-Credit. We implement the D-Credit architecture on PyTorch framework. Experimental results on two different GPU clusters demonstrate that at training speed, D-Credit can achieve up to 1.26x, 1.21x, 1.48x and 1.53x speedup compared to ByteScheduler, DeAR, PyTorch-DDP and WFBP, respectively. At energy consumption, D-Credit saves up to 17.8% and 25.1% of the training energy consumption compared to ByteScheduler and WFBP, respectively.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

A Dynamic Sliding Window Based Tensor Communication Scheduling Framework for Distributed Deep Learning

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

A Dynamic Sliding Window Based Tensor Communication Scheduling Framework for Distributed Deep Learning

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：