检索结果-内蒙古大学图书馆

Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN

引用

JOURNAL OF SUPERCOMPUTING 2025年第4期81卷 1-48页

作者： Pasini, Massimiliano Lupo Choi, Jong Youl Mehta, Kshitij Zhang, Pei Rogers, David Bae, Jonghyun Ibrahim, Khaled Z. Aji, Ashwin M. Schulz, Karl W. Polo, Jorda Balaprakash, Prasanna Oak Ridge Natl Lab Computat Sci & Engn Div 1 Bethel Valley Rd Oak Ridge TN 37831 USA Oak Ridge Natl Lab Comp Sci & Math Div 1 Bethel Valley Rd Oak Ridge TN 37831 USA Oak Ridge Natl Lab Natl Ctr Computat Sci 1 Bethel Valley Rd Oak Ridge TN 37831 USA Lawrence Berkeley Natl Lab Comp Sci Dept 1 Cyclotron Rd Berkeley CA 94720 USA AMD Res Adv Micro Devices Austin TX USA Oak Ridge Natl Lab Comp & Computat Sci Directorate 1 Bethel Valley Rd Oak Ridge TN 37831 USA

We present our work on developing and training scalable, trustworthy, and energy-efficient predictive graph foundation models (GFMs) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) computations in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define nearest-neighbor convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFMs training to tens of thousands of GPUs on datasets consisting of hundreds of millions of graphs. Our GFMs use multitask learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures, such as energy and atomic forces. Using over 154 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two state-of-the-art US Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge Leadership Computing Facility. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2000 GPUs on Perlmutter and 16,000 GPUs on Frontier.

关键词： Machine learning Atomistic materials modeling distributed data parallelism Graph foundation models Graph neural networks Large-scale data processing for machine learning

来源：评论

学校读者我要写书评

暂无评论

Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules

引用

JOURNAL OF CHEMINFORMATICS 2022年第1期14卷 1-10页

作者： Choi, Jong Youl Zhang, Pei Mehta, Kshitij Blanchard, Andrew Pasini, Massimiliano Lupo Oak Ridge Natl Lab Comp Sci & Math Div 1 Bethel Valley Rd Oak Ridge TN 37831 USA Oak Ridge Natl Lab Computat Sci & Engn Div 1 Bethel Valley Rd Oak Ridge TN 37831 USA

Graph Convolutional Neural Network (GCNN) is a popular class of deep learning (DL) models in material science to predict material properties from the graph representation of molecular structures. Training an accurate and comprehensive GCNN surrogate for molecular design requires large-scale graph datasets and is usually a time-consuming process. Recent advances in GPUs and distributed computing open a path to reduce the computational cost for GCNN training effectively. However, efficient utilization of high performance computing (HPC) resources for training requires simultaneously optimizing large-scale data management and scalable stochastic batched optimization techniques. In this work, we focus on building GCNN models on HPC systems to predict material properties of millions of molecules. We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch. We use ADIOS, a high-performance data management framework for efficient storage and reading of large molecular graph data. We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap. We measure the scalability, accuracy, and convergence of our approach on two DOE supercomputers: the Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) and the Perlmutter system at the National Energy Research Scientific Computing Center (NERSC). We present our experimental results with HydraGNN showing (i) reduction of data loading time up to 4.2 times compared with a conventional method and (ii) linear scaling performance for training up to 1024 GPUs on both Summit and Perlmutter.

关键词： Graph neural networks distributed data parallelism Surrogate models Atomic modeling Molecular dynamics HOMO-LUMO gap

来源：评论

学校读者我要写书评

暂无评论

Instant Discovery of Moment Companion Vehicles from Big Streaming Traffic data

Instant Discovery of Moment Companion Vehicles from Big Stre...

引用

International Conference on Cloud Computing and Big data

作者： Zhu, Meiling Liu, Chen Wang, Jianwu Wang, Xiongbin Han, Yanbo North China Univ Technol Beijing Key Lab Integrat & Anal Large Scale Strea Beijing Peoples R China North China Univ Technol Cloud Comp Res Ctr Beijing Peoples R China Univ Maryland Baltimore Cty Dept Informat Syst Baltimore MD 21228 USA Tianjin Univ Sch Comp Sci & Technol Tianjin Peoples R China

ISBN: (纸本)9781467383509

With more and more traffic monitoring cameras installed in large cities, big streaming data are being generated continuously and provides a lot of new application opportunities. Among the new applications, companion vehicle discovery is to identify vehicle groups that move together. To quickly identify companion vehicles from a special type of streaming traffic data, called Automatic Number Plate Recognition (ANPR) data, this paper proposes a framework and two algorithms following distributed data-parallel programming model. The main challenge is how to handle the scale of ANPR data and detect companion vehicles as soon as possible. The proposed framework is designed to instantly output companion vehicles when they pass through monitoring cameras. Our framework can be used in many time-sensitive scenarios like surveillance on suspect trackers for specific vehicles. Experiments with real ANPR data in a distributed environment verify that our approach can process streaming ANPR data directly and discover the companion vehicles in nearly real time. We also analyze the performance affecting factors from the experiments.

关键词： moment companion streaming traffic data distributed data parallelism

来源：评论

学校读者我要写书评

暂无评论

Accelerating Real-Time Imaging for Radiotherapy: Leveraging Multi-GPU Training with PyTorch 22

Accelerating Real-Time Imaging for Radiotherapy: Leveraging ...

引用

22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023

作者： Obe, Ruth Kaufmann, Brandt Baird, Kaelen Kadel, Sam Soltani, Yasmin Cham, Mostafa Gobbert, Matthias K. Barajas, Carlos A. Jiang, Zhuoran Sharma, Vijay R. Ren, Lei Peterson, Stephen W. Poif, Jerimy C. U. of Houston-Clear Lake Dept. of Computer Science and of Software Engineering United States U. of San Francisco Dept. of Mathematics and Statistics United States Skidmore College Dept. of Computer Science and of Mathematics United States Mount Holyoke College Dept. of Computer Science and of Psychology United States U. of Houston Dept. of Biomedical Engineering United States U. of Maryland Dept. of Information Systems Baltimore County United States U. of Maryland Dept. of Mathematics and Statistics Baltimore County United States Medical Physics Graduate Program Duke University United States U. of Maryland School of Medicine Dept. of Radiation Oncology United States U. of Maryland School of Medicine Department of Radiation Oncology United States U. of Cape Town Dept. of Physics South Africa H3D Inc. United States

ISBN: (纸本)9798350345346

Proton beam therapy is an advanced form of cancer radiotherapy that uses high-energy proton beams to deliver precise and targeted radiation to tumors. This helps to mit-igate unnecessary radiation exposure in healthy tissues. Real-time imaging of prompt gamma rays with Compton cameras has been suggested to improve therapy efficacy. However, the camera's non-zero time resolution leads to incorrect interaction classifications and noisy images that are insufficient for accurately assessing proton delivery in patients. To address the challenges posed by the Compton camera's image quality, machine learning techniques are employed to classify and refine the generated data. These machine-learning techniques include recurrent and feedforward neural networks. A PyTorch model was designed to improve the data captured by the Compton camera. This decision was driven by PyTorch's flexibility, powerful capabilities in handling sequential data, and enhanced G PU usage. This accelerates the model's computations on large-scale radiotherapy data. Through hyperparameter tuning, the validation accuracy of our PyTorch model has been improved from an initial 7 % to over 60 %. Moreover, the PyTorch distributed data parallelism strategy was used to train the RNN models on multiple G PU s, which significantly reduced the training time with a minor impact on model accuracy. © 2023 IEEE.

关键词： Clas-sification Compton camera distributed data parallelism Proton beam therapy PyTorch Recurrent neural network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：