检索结果-内蒙古大学图书馆

pipeline parallelism with reduced network communication for efficient compute-intensive neural network training

JOURNAL OF SUPERCOMPUTING 2025年第5期81卷 1-22页

作者： Yu, Chanhee Park, Kyongseok Univ Sci & Technol Dept Big Data Sci 245 Daehak ro Daejeon 34112 South Korea Univ Sci & Technol Dept Appl AI 245 Daehak ro Daejeon 34112 South Korea Korea Inst Sci & Technol Informat Ctr Supercomp Technol Dev 245 Daehak ro Daejeon 34141 South Korea

pipeline parallelism is a distributed method used to train deep neural networks and is suitable for tasks that consume large amounts of memory. However, this method entails a large overhead because of the dependency between devices for performing forward and backward steps using multiple accelerator devices. Although a method to remove forward step dependency through the all-to-all approach has been proposed for training compute-intensive models, it incurs a large overhead when training with many devices and is inefficient with respect to weight memory consumption. Alternatively, we propose a pipeline parallelism method that reduces both network communication using a self-generation concept and overhead by minimizing the weight memory used for acceleration. In a DarkNet53 training throughput experiment using six devices, the proposed method outperforms a baseline by approximately 63.7% in reduction of overhead and communication costs and achieves less memory consumption by approximately 17.0%.

关键词： Distributed computing Model parallelism synchronous pipeline parallelism Deep learning

来源：评论

学校读者我要写书评

暂无评论

Adaptive partitioning and efficient scheduling for distributed DNN training in heterogeneous IoT environment

引用

COMPUTER COMMUNICATIONS 2024年 215卷 169-179页

作者： Huang, Binbin Huang, Xunqing Liu, Xiao Ding, Chuntao Yin, Yuyu Deng, Shuiguang Hangzhou Dianzi Univ Sch Comp Sci & Technol Hangzhou 310018 Peoples R China Deakin Univ Sch Informat Technol Geelong Australia Beijing Jiaotong Univ Sch Comp & Informat Technol Beijing Peoples R China Zhejiang Univ Coll Comp Sci & Technol Hangzhou Peoples R China

With the increasing proliferation of Internet -of -Things (IoT) devices, it is a growing trend toward training a deep neural network (DNN) model in pipeline parallelism across resource -constraint IoT devices. To ensure the model convergence and accuracy, synchronous pipeline parallelism is usually adopted. However, the synchronous pipeline can incur a long waiting time due to its gradient aggregation of all microbatches. It is urgent for a DNN model to design an adaptive partitioning and efficient scheduling scheme in heterogeneous IoT environment. To address this problem, we propose a policy gradient based model partitioning and scheduling scheme (PG-MPSS) to minimize per -iteration training time. More specifically, we first design a double -network framework to divide and schedule a DNN model. Then, we adopt a policy gradient algorithm to update the double -network parameters, aiming at learning an optimal double -network model. We conduct extensive experiments to compare the DNN training time of the PG-MPSS scheme with that of Dynamic Programming (DP), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Average&Greedy (AG) and Proximal Policy Optimization (PPO) five baseline algorithms under different experimental settings. The related experimental results demonstrate that the PG-MPSS scheme can greatly expedite synchronous pipeline training of a DNN model.

关键词： synchronous pipeline parallelism Policy gradient Heterogeneous IoT environment

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：