检索结果-内蒙古大学图书馆

PACER: Accelerating distributed gnn training Using Communication-Efficient Partition Refinement and Caching

Proceedings of the ACM on Networking 2024年第CoNEXT4期2卷 1-18页

作者： Shohaib Mahmud Haiying Shen Anand Iyer University of Virginia Charottesville VA USA University of Virginia Charlottesville VA USA Georgia Institute of Technology Atlanta GA USA

Despite recent breakthroughs in distributed Graph Neural Network (gnn) training, large-scale graphs still generate significant network communication overhead, decreasing time and resource efficiency. Although recently proposed partitioning or caching methods try to reduce communication inefficiencies and overheads, they are not sufficiently effective due to their sampling pattern-agnostic nature. This paper proposes a Pipelined Partition Aware Caching and Communication Efficient Refinement System (Pacer), a communication-efficient distributed gnn training system. First, Pacer intelligently estimates each partition's access frequency to each vertex by jointly considering the sampling method and graph topology. Then, it uses the estimated access frequency to refine partitions and caching vertices in its two-level cache (CPU and GPU) to minimize data transfer latency. Furthermore, Pacer incorporates a pipeline-based minibatching method to mask the effect of the network communication. Experimental results on real-world graphs show that Pacer outperforms state-of-the-art distributed gnn training system in training time by 40% on average.

关键词： distributed gnn training graph neural network (gnn)

来源：评论

学校读者我要写书评

暂无评论

distributed Graph Neural Network training: A Survey

引用

ACM COMPUTING SURVEYS 2024年第8期56卷 1-39页

作者： Shao, Yingxia Li, Hongzheng Gu, Xizhi Yin, Hongbo Li, Yawen Miao, Xupeng Zhang, Wentao Cui, Bin Chen, Lei Beijing Univ Posts & Telecommun 10 Xitucheng Rd Haidian Dist Beijing 100876 Peoples R China Carnegie Mellon Univ 5000 Forbes Ave Pittsburgh PA 15213 USA HEC Montreal Mila Quebec AI Inst 6666 St Urbain St Montreal PQ H2S 3H1 Canada Peking Univ 5 Yiheyuan Rd Beijing 100871 Peoples R China Hong Kong Univ Sci & Technol Guangzhou 1 Du Xue Rd Guangzhou 511442 Peoples R China

Graph neural networks (gnns) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains. Despite the effectiveness of gnns, it is still challenging for gnns to efficiently scale to large graphs. As a remedy, distributed computing becomes a promising solution of training large-scale gnns, since it is able to provide abundant computing resources. However, the dependency of graph structure increases the difficulty of achieving high-efficiency distributed gnn training, which suffers from the massive communication and workload imbalance. In recent years, many efforts have been made on distributed gnn training, and an array of training algorithms and systems have been proposed. Yet, there is a lack of systematic review of the optimization techniques for the distributed execution of gnn training. In this survey, we analyze three major challenges in distributed gnn training: massive feature communication, the loss of model accuracy, and workload imbalance. Then, we introduce a new taxonomy for the optimization techniques in distributed gnn training that address the above challenges. The new taxonomy classifies existing techniques into four categories: gnn data partition, gnn batch generation, gnn execution model, and gnn communication protocol. We carefully discuss the techniques in each category. In the conclusion, we summarize existing distributed gnn systems for multi-graphics processing units (GPUs), GPU-clusters and central processing unit (CPU)-clusters, respectively, and present a discussion about the future direction of distributed gnn training.

关键词： Surveys and overviews distributed gnn training graph data management communication optimization distributed gnn systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：