检索结果-内蒙古大学图书馆

Multi-GPU accelerated cellular automaton model for simulating the solidification structure of continuous casting bloom

引用

JOURNAL OF SUPERCOMPUTING 2023年第5期79卷 4870-4894页

作者： Wang, Jingjing Meng, Hongji Yang, Jian Xie, Zhi Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Peoples R China

The continuous casting bloom is characterized by a large size and long process, leading to tremendous calculation. It takes a long time to simulate the solidification structure by the traditional sequential algorithm on the CPU which cannot satisfy the industrial demand for guiding the process. This study developed a multi-GPU-based cellular automaton model to accelerate the calculation. Firstly, a heterogeneous GPU-CA parallel algorithm was developed to optimize the calculation parallelism by eliminating the data dependency and data race among cells, where the capture process adopted a random-principle-based arbitration mechanism to determine which neighbor obtains the final capture right. Then, the multi-stream communication scheme was developed to overlap the calculation of the inner region and the data transferring and calculation of the halo region, hiding the overhead of data exchange between GPUs. Finally, the present model was validated by the analytical LGK model value, and it was applied to simulate the solidification structure of GCr15 in a certain steel plant. The simulation result shows a clear solidification structure with different crystal zone of columnar, equiaxed, and where the columnar transfers into equiaxed (CET). The proportion of crystal zone agrees well with the low-power images from field experiments with relative errors of 0.032%, 0.013%, and 0.025%. Also, the multi-GPU application can calculate the temperature distribution during the solidification process with the maximum relative error of 0.013% compared to the field data. Furthermore, in the case of owning the almost same calculation precision as a single-core CPU, the speedup of the present model is up to 700x, whereas the speedup of the CPU with 20 cores is only about 14.2x.

关键词： Multi-GPU accelerate CA model heterogeneous parallel algorithm Communication scheme Solidification structure Continuous casting bloom

来源：评论

学校读者我要写书评

暂无评论

Large-Scale heterogeneous Computing for 3D Deterministic Particle Transport on Tianhe-2A Supercomputer

引用

FRONTIERS IN ENERGY RESEARCH 2021年 9卷

作者： Li, Biao Liu, Jie Zhu, Xiaoxiong Ding, Shengjie Natl Univ Def Technol Sci & Technol Parallel & Distributed Proc Lab Changsha Peoples R China Natl Univ Def Technol Lab Software Engn Complex Syst Changsha Peoples R China

Scalable parallel algorithm for particle transport is one of the main application fields in high-performance computing. Discrete ordinate method (S-n) is one of the most popular deterministic numerical methods for solving particle transport equations. In this paper, we introduce a new method of large-scale heterogeneous computing of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry (Sweep3D) on Tianhe-2A supercomputer. In heterogeneous programming, we use customized Basic Communication Library (BCL) and Accelerated Computing Library (ACL) to control and communicate between CPU and the Matrix2000 accelerator. We use OpenMP instructions to exploit the parallelism of threads based on Matrix 2000. The test results show that the optimization of applying OpenMP on particle transport algorithm modified by our method can get 11.3 times acceleration at most. On Tianhe-2A supercomputer, the parallel efficiency of 1.01 million cores compared with 170 thousand cores is 52%.

关键词： heterogeneous parallel algorithm HPC openmp particle transport SN method

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还