检索结果-内蒙古大学图书馆

A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2024年第4期35卷 577-591页

作者： Li, Dongsheng Li, Shengwei Lai, Zhiquan Fu, Yongquan Ye, Xiangyu Cai, Lei Qiao, Linbo Natl Univ Def Technol Coll Comp Natl Key Lab Parallel & Distributed Comp Changsha 410073 Peoples R China

With the increasing volumes of data samples and deep neural network (DNN) models, efficiently scaling the training of DNN models has become a significant challenge for server clusters with AI accelerators in terms of memory and computing efficiency. Existing parallelism schemes can be broadly classified into three categories: data parallelism (splitting data samples), model parallelism (splitting model parameters), and pipeline model parallelism (splitting model layers). Hybrid approaches split data and models, offering a comprehensive solution for parallel training. However, these methods encounter limitations in efficiently scaling larger models across more computing nodes, as they incur substantial memory constraints that affect training efficiency and overall throughput. In this paper, we propose HIPPIE, a hybrid parallel training framework designed to enhance memory efficiency and scalability of large DNN training. First, to evaluate the optimization effect more reasonably, we propose an index of Memory Efficiency (ME) to quantify the tradeoff between throughput and memory overhead. Second, driven by the informed ME optimization objective, we automatically partition the pipeline to balance the throughput and memory. Third, we optimize the model training process via a novel hybrid parallel scheduler that improves the throughput and scalability by informed pipeline scheduling and communication scheduling with gradient-hidden optimization. Experiments on various models show that HIPPIE achieves above 90% scaling efficiency on a 16-GPU platform. Moreover, HIPPIE increases throughput by up to 80%, while saving 57% of memory overhead and achieving 4.18x memory-efficiency improvement.

关键词： Data parallelism pipelined model parallelism hybrid parallelism memory efficiency deep neural network

来源：评论

学校读者我要写书评

暂无评论

GroPipe: A Grouped Pipeline Hybrid Parallel Method for Accelerating DCNNs Training

引用

IEEE Transactions on Computers 2025年第7期74卷 2487-2500页

作者： Liu, Bin Ma, Yongyao Hu, Zijian Ji, Zeyu He, Zhenli Li, Keqin Northwest A&F University College of Information Engineering Shaanxi Yangling712100 China Shaanxi Engineering Research Center of Agricultural Information Intelligent Perception and Analysis Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service Key Laboratory of Agricultural Internet of Things Ministry of Agriculture and Rural Affairs China Yunnan University Yunnan Key Laboratory of Software Engineering Kunming650091 China Yunnan University School of Software Kunming650091 China State University of New York Department of Computer Science New PaltzNY12561 United States

Training large Deep Convolutional Neural Networks (DCNNs) with increasingly large datasets to improve model accuracy has become extremely time-consuming. Distributed training methods, such as data parallelism (DP) and pipeline model parallelism (PMP), offer potential solutions but face challenges like load imbalance and significant communication overhead. This paper introduces GroPipe, a novel architecture that synergistically integrates PMP and DP, markedly improving training speeds. GroPipe employs an automatic model partitioning algorithm based on a performance projection technique, ensuring load balance and facilitating quantitative performance evaluation in PMP. Additionally, it adopts a group-based delayed asynchronous communication strategy to efficiently reduce communication overhead in DP. Using the ResNet and VGG models with the ImageNet dataset, extensive experiments are performed on an 8-GPU server and demonstrate GroPipe’s effectiveness. GroPipe achieves substantial improvements in time to accuracy, showing an average improvement of 42.2% and 14.0% on the ResNet series, and 79.2% and 43.9% on the VGG series, without compromising Top-1 accuracy. © 1968-2012 IEEE.

关键词： Training Computational modeling Parallel Processing Load modeling Pipelines Data models Computers Memory Management Asynchronous Communication Resource Management Delayed Asynchronous Communication Deep Convolutional Neural Networks Hybrid parallelism pipelined model parallelism Deep Convolutional Neural Network Training Methods Training Speed Load Balancing Image Net Dataset Parallel Data Communication Overhead Partition model Partitioning Algorithm Top 1 Accuracy Pipeline model Load Imbalance Learning Rate Training Time Resource Utilization Parallelization model Convergence Memory Usage Training Efficiency Balanced Distribution Backward Pass Multiple GP Us Forward Pass Multiple Partitions Vertical Dimension Parallel Efficiency VGG 19 model Throughput Experiments Parallel Training Idle Time

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：