检索结果-内蒙古大学图书馆

parallel and distributed Structured SVM training

IEEE TRANSACTIONS ON parallel AND distributed SYSTEMS 2022年第5期33卷 1084-1096页

作者： Jiang, Jiantong Wen, Zeyi Wang, Zeke He, Bingsheng Chen, Jian Univ Western Australia Crawley WA 6009 Australia Zhejiang Univ Zhenjiang Jiangsu 310027 Peoples R China Univ Western Australia Crawley WA 6009 Australia Zhejiang Univ Hangzhou 310027 Zhejiang Peoples R China Natl Univ Singapore Singapore 119077 Singapore South China Univ Technol Guangzhou 510006 Guangdong Peoples R China

Structured Support Vector Machines (structured SVMs) are a fundamental machine learning algorithm, and have solid theoretical foundation and high effectiveness in applications such as natural language parsing and computer vision. However, training structured SVMs is very time-consuming, due to the large number of constraints and inferior convergence rates, especially for large training data sets. The high cost of training structured SVMs has hindered its adoption to new applications. In this article, we aim to improve the efficiency of structured SVMs by proposing a parallel and distributed solution (namely FastSSVM) for training structured SVMs building on top of MPI and OpenMP. FastSSVM exploits a series of optimizations (e.g., optimizations on data storage and synchronization) to efficiently use the resources of the nodes in a cluster and the cores of the nodes. Moreover, FastSSVM tackles the large constraint set problem by batch processing and addresses the slow convergence challenge by adapting stop conditions based on the improvement of each iteration. We theoretically prove that our solution is guaranteed to converge to a global optimum. A comprehensive experimental study shows that FastSSVM can achieve at least four times speedup over the existing solutions, and in some cases can achieve two to three orders of magnitude speedup.

关键词： training Optimization Support vector machines Task analysis Convergence Synchronization Natural languages parallel and distributed training structured machine learning support vector machines

来源：评论

学校读者我要写书评

暂无评论

A Survey on Auto-parallelism of Large-Scale Deep Learning training

引用

IEEE TRANSACTIONS ON parallel AND distributed SYSTEMS 2023年第8期34卷 2377-2390页

作者： Liang, Peng Tang, Yu Zhang, Xiaoda Bai, Youhui Su, Teng Lai, Zhiquan Qiao, Linbo Li, Dongsheng Natl Univ Def Technol State Key Lab Parallel & Distributed Proc Changsha 410073 Peoples R China Huawei Technol Co Ltd Shenzhen 518100 Peoples R China

Deep learning (DL) has gained great success in recent years, leading to state-of-the-art performance in research community and industrial fields like computer vision and natural language processing. One of the reasons for this success is the huge amount parameters adopted in DL models. However, it is impractical to train a moderately large model with a large number of parameters on a typical single device. Thus, It is necessary to train DL models in clusters with distributed training algorithms. However, traditional distributed training algorithms are usually sub-optimal and highly customized, which owns the drawbacks to train large-scale DL models in varying computing clusters. To handle the above problem, researchers propose auto-parallelism, which is promising to train large-scale DL models efficiently and practically in various computing clusters. In this survey, we perform a broad and thorough investigation on challenges, basis, and strategy searching methods of auto-parallelism in DL training. First, we abstract basic parallelism schemes with their communication cost and memory consumption in DL training. Further, we analyze and compare a series of current auto-parallelism works and investigate strategies and searching methods which are commonly used in practice. At last, we discuss several trends in auto-parallelism which are promising in further research.

关键词： Auto-parallelism large-scale deep learning model training technique parallel and distributed training

来源：评论

学校读者我要写书评

暂无评论

RAPID: A Rapid Automatic parallelizer for Immense Deep Neural Networks

RAPID: A Rapid Automatic Parallelizer for Immense Deep Neura...

引用

2024 International Conference on Cluster Computing

作者： Tachon, Thibaut Wang, Haoran Li, Chong Huawei Technol France SASU Boulogne France Huawei Technol Co Ltd Beijing Peoples R China

ISBN: (纸本)9798350383461;9798350383454

Deep Neural Network (DNN) frameworks need parallelism plans to execute immense models. The computed plans often combine data, model, and pipeline parallelism. Unfortunately, due to the intractable property of the problem, the current parallelism planners often fail to derive plans for immense DNNs. They either rely on experts to generate plans manually or profiling for their evaluation, making planners expensive and sub-optimal. We propose RAPID, an automatic parallelism planner for immense DNNs driven by a hierarchical abstract machine model. This model enables the design of a symbolic-based cost model that achieves robust prediction of parallelism cost with symbolic simplification. RAPID divides the parallelization problem hierarchically and symmetrically into linear-time sub-problems. We prove that the composition of the sub-problem solutions is optimal. Large-scale cluster experiments show that RAPID can reduce the planning time of immense DNNs (e.g., BERT) by up to 67x compared to state-of-the-art planners;while exhibiting high performance that matches expert-optimized plans.

关键词： Auto-parallelism large-scale deep learning model parallel and distributed training

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：