检索结果-内蒙古大学图书馆

3rd International Conference on High-Performance Computing and Applications (HPCA)

作者： Liu, Zhixiang Fang, Yong Song, Anping Xu, Lei Wang, Xiaowei Zhou, Liping Zhang, Wu Shanghai Univ Sch Commun & Informat Engn Shanghai 200444 Peoples R China Shanghai Univ Ctr High Performance Comp Shanghai 200444 Peoples R China Shanghai Univ Sch Comp Engn & Sci Shanghai 200444 Peoples R China

ISBN: (纸本)9783319325576;9783319325569

The lattice Boltzmann Method (LBM), different from classical numerical methods of continuum mechanics, is derived from molecular dynamics. The LBM has the following main advantages: including a simple algorithm, the direct solver for pressure, easy treatment of complicated boundary conditions and particularly parallel suitability. The most common models include the Single-Relaxation-Time (SRT) and Multiple-Relaxation-Time (MRT) collision models. In a conventional parallel computing model of LBM, communication and computing are performed individually. When the communication is performed, the computing is waiting in MPI processes. This will waste some waiting time. Therefore, the communication and computing overlapping parallel model was proposed. By the architecture of "Ziqiang 4000" supercomputer at Shanghai University, the hybrid MPI and OpenMP parallel model is proposed. The numerical results show that the presented model has better computational efficiency.

关键词： Lattice Moltzmann Method Single-Relaxation-Time overlapping communication and computation Hybrid model Parallel model

来源：评论

学校读者我要写书评

暂无评论

Neighborhood communication paradigm to increase scalability in large-scale dynamic scientific applications

引用

PARALLEL COMPUTING 2012年第3期38卷 140-156页

作者： Ovcharenko, Aleksandr Ibanez, Daniel Delalondre, Fabien Sahni, Onkar Jansen, Kenneth E. Carothers, Christopher D. Shephard, Mark S. Rensselaer Polytech Inst Sci Computat Res Ctr SCOREC Troy NY 12180 USA Univ Colorado Boulder Dept Aerosp Engn Sci Boulder CO 80309 USA Rensselaer Polytech Inst Dept Comp Sci Troy NY 12180 USA

This paper introduces a general-purpose communication package built on top of MPI which is aimed at improving inter-processor communications independently of the supercomputer architecture being considered. The package is developed to support parallel applications that rely on computation characterized by large number of messages of various sizes, often small, that are focused within processor neighborhoods. In some cases, such as solvers having static mesh partitions, the number and size of messages are known a priori. However, in other cases such as mesh adaptation, the messages evolve and vary in number and size and include the dynamic movement of partition objects. The current package provides a utility for dynamic applications based on two key attributes that are: (i) explicit consideration of the neighborhood communication pattern to avoid many-to-many calls and also to reduce the number of collective calls to a minimum, and (ii) use of non-blocking MPI functions along with message packing to manage message flow control and reduce the number and time of communication calls. The test application demonstrated is parallel unstructured mesh adaptation. Results on IBM Blue Gene/P and Cray XE6 computers show that the use of neighborhood-based communication control leads to scalable results when executing generally imbalanced mesh adaptation runs. (C) 2011 Elsevier B.V. All rights reserved.

关键词： Asynchronous communication MPI Dynamic data migration Parallel algorithms overlapping communication and computation

来源：评论

学校读者我要写书评

暂无评论

3D DFT by block tensor-matrix multiplication via a modified Cannon's algorithm: Implementation and scaling on distributed-memory clusters with fat tree networks

引用

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2024年 193卷

作者： Malapally, Nitin Bolnykh, Viacheslav Suarez, Estela Carloni, Paolo Lippert, Thomas Mandelli, Davide Forschungszentrum Julich Computat Biomed IAS 5 INM 9 Wilhelm Johnen Str D-52428 Julich Germany Forschungszentrum Julich Julich Supercomp Ctr JSC Wilhelm Johnen Str D-52428 Julich Germany Univ Bonn Comp Sci Dept Bonn Germany Forschungszentrum Julich Mol Neurosci & Neuroimaging IN -1 1 Wilhelm Johnen Str D-52428 Julich Germany Goethe Univ Frankfurt Inst Adv Studies Frankfurt Germany

A known scalability bottleneck of the parallel 3D FFT is its use of all -to -all communications. Here, we present S3DFT, a library that circumvents this by using point-to-point communication - albeit at a higher arithmetic complexity. This approach exploits three variants of Cannon's algorithm with adaptations for block tensor -matrix multiplications. We demonstrate S3DFT's efficient use of hardware resources, and its scaling using up to 16,464 cores of the JUWELS Cluster. However, in a comparison with well -established 3D FFT libraries, its parallel efficiency and performance were found to fall behind. A detailed analysis identifies the cause in two of its component algorithms, which scale poorly owing to how their communication patterns are mapped in subsets of the fat tree topology. This result exposes a potential drawback of running block -wise parallel algorithms on systems with fat tree networks caused by increased communication latencies along specific directions of the mesh of processing elements.

关键词： 3D Discrete Fourier Transform (3D DFT) Block tensor matrix multiplication Volumetric decomposition Cannon's algorithm Shared-memory parallelism Distributed-memory parallelism overlapping communication and computation MPI Performance analysis Roofline model

来源：评论

学校读者我要写书评

暂无评论

Efficient overlapped FFT algorithms for hypercube-connected multicomputers

引用

Parallel Algorithms and Applications 1994年第1-2期4卷 91-110页

作者： Aykanat, Cevdet Dervis, Argun Department of Computer Engineering Bilkent University 06533 Bilkent Ankara Turkey

In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercube-connected multicomputers, which are more elegant and efficient than the existing ones. The proposed algorithms achieve perfect load-balance for the efficient simplified-butterfly scheme, minimize the communication overhead by decreasing both the number and the volume of concurrent communications. communication and computation cannot be overlapped easily due to the strong data dependencies in the FFT algorithm. In this paper, we propose a restructuring for the FFT algorithm which enables overlapping each communication with one fifth of the local computations involved in a stage. Two of the proposed parallel FET algorithms achieve overlapping by exploiting this restructuring while using the efficient table-lookup scheme for complex coefficients. The proposed algorithms are implemented on an Intel’s 32-node iPSC/2 hypercube multicomputer. High efficiency values are obtained even for small size FFT problems. © 1994, Taylor & Francis Group, LLC. All rights reserved.

关键词： FFT Hypercube Multicomputer overlapping communication and computation Parallel computing Perfect load balance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：