检索结果-内蒙古大学图书馆

Acceleration of iterative refinement for singular value decomposition

NUMERICAL ALGORITHMS 2024年第2期95卷 979-1009页

作者： Uchino, Yuki Terao, Takeshi Ozaki, Katsuhisa Shibaura Inst Technol Grad Sch Engn & Sci 307 FukasakuMinuma Ku Saitama Saitama 3378570 Japan RIKEN Ctr Computat Sci 7-1-26 Minatojima Minami MachiChuo Ku Kobe Hyogo 6500047 Japan Kyushu Univ Res Inst Informat Technol 744 MotookaNishi Ku Fukuoka 8190395 Japan Shibaura Inst Technol Dept Math Sci 307 FukasakuMinuma Ku Saitama Saitama 3378570 Japan

We propose fast numerical algorithms to improve the accuracy of singular vectors for a real matrix. Recently, Ogita and Aishima proposed an iterative refinement algorithm for singular value decomposition that is constructed with highly accurate matrix multiplications carried out six times per iteration. The algorithm runs for the problem that has no multiple and clustered singular values. In this study, we show that the same algorithm can be run with highly accurate matrix multiplications carried out five times. Also, we proposed four algorithms constructed with highly accurate matrix multiplications, two algorithms with the multiplications carried out four times, and the other two with the multiplications carried out five times. These algorithms adopt the idea of a mixed-precision iterative refinement method for linear systems. Numerical experiments demonstrate speed-up and quadratic convergence of the proposed algorithms. As a result, the fastest algorithm is 1.7 and 1.4 times faster than the Ogita-Aishima algorithm per iteration on a CPU and GPU, respectively.

关键词： Singular value decomposition Iterative refinement mixed-precision computation Accurate numerical computation

来源：评论

学校读者我要写书评

暂无评论

Boosting Earth System Model Outputs And Saving PetaBytes in Their Storage Using Exascale Climate Emulators 24

Boosting Earth System Model Outputs And Saving PetaBytes in ...

引用

2024 International Conference for High Performance Computing, Networking, Storage and Analysis

作者： Abdulahl, Sameh Baker, Allison H. Bosilca, George Cao, Qinglei Castruccio, Stefano Genton, Marc G. Keyes, David E. Khalid, Zubair Ltaief, Hatem Song, Yan Stenchikov, Georgiy L. Sun, Ying King Abdullah Univ Sci & Technol Extreme Comp & Stat & Earth Sci Thuwal Saudi Arabia NSF Natl Ctr Atmospher Res Computat & Informat Sci Lab Boulder CO USA NVIDIA Santa Clara CA USA St Louis Univ Dept Comp Sci St Louis MO USA Univ Notre Dame Dept Appl & Computat Math & Stat Notre Dame IN USA Lahore Univ Management Sci Dept Elect Engn Lahore Pakistan

ISBN: (数字)9798350352917

ISBN: (纸本)9798350352924;9798350352917

We present the design and scalable implementation of an exascale climate emulator for addressing the escalating computational and storage requirements of high-resolution Earth System Model simulations. We utilize the spherical harmonic transform to stochastically model spatio-temporal variations in climate data. This provides tunable spatio-temporal resolution and significantly improves the fidelity and granularity of climate emulation, achieving an ultra-high spatial resolution of 0.034 degrees (similar to 3.5 km) in space. Our emulator, trained on 318 billion hourly temperature data points from a 35-year and 31 billion daily data points from an 83-year global simulation ensemble, generates statistically consistent climate emulations. We extend linear solver software to mixed-precision arithmetic GPUs, applying different precisions within a single solver to adapt to different correlation strengths. The PaRSEC runtime system supports efficient parallel matrix operations by optimizing the dynamic balance between computation, communication, and memory requirements. Our BLAS3-rich code is optimized for systems equipped with four different families and generations of GPUs, scaling well to achieve 0.976 EFlop/s on 9,025 nodes (36,100 AMD MI250X multi-chip module (MCM) GPUs) of Frontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-Hopper Superchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) of Leonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.

关键词： Dynamic runtime systems High-performance computing mixed-precision computation Spatio-temporal climate emulation Spherical harmonic transform Task-based programming models

来源：评论

学校读者我要写书评

暂无评论

Unleashing the Low-precision computation Potential of Tensor Cores on GPUs 21

Unleashing the Low-Precision Computation Potential of Tensor...

引用

19th IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

作者： Li, Guangli Xue, Jingling Liu, Lei Wang, Xueying Ma, Xiu Dong, Xiao Li, Jiansong Feng, Xiaobing Chinese Acad Sci Inst Comp Technol SKL Comp Architecture Guangzhou Peoples R China Univ Chinese Acad Sci Sch Comp Sci & Technol Beijing Peoples R China Univ New South Wales Sch Comp Sci & Engn Sydney NSW Australia Jilin Univ Coll Comp Sci & Technol Changchun Peoples R China

ISBN: (纸本)9781728186139

Tensor-specialized hardware for supporting low-precision arithmetic has become an inevitable trend due to the ever-increasing demand on computational capability and energy efficiency in intelligent applications. The main challenge faced when accelerating a tensor program on tensor-specialized hardware is how to achieve the best performance possible in reduced precision by fully utilizing its computational resources while keeping the precision loss in a controlled manner. In this paper, we address this challenge by proposing QUANTENSOR, a new approach for accelerating general-purpose tensor programs by replacing its tensor computations with low-precision quantized tensor computations on NVIDIA Tensor Cores. The key novelty is a new residual-based precision refinement technique for controlling the quantization errors, allowing tradeoffs between performance and precision to be made. Evaluation with GEMM, deep neural networks, and linear algebra applications shows that QUANTENSOR can achieve remarkable performance improvements while reducing the precision loss incurred significantly at acceptable overheads.

关键词： mixed-precision computation Tensor Cores Linear Quantization precision Refinement

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：