The lattice Boltzmann Method (LBM), different from classical numerical methods of continuum mechanics, is derived from molecular dynamics. The LBM has the following main advantages: including a simple algorithm, the d...
详细信息
ISBN:
(纸本)9783319325576;9783319325569
The lattice Boltzmann Method (LBM), different from classical numerical methods of continuum mechanics, is derived from molecular dynamics. The LBM has the following main advantages: including a simple algorithm, the direct solver for pressure, easy treatment of complicated boundary conditions and particularly parallel suitability. The most common models include the Single-Relaxation-Time (SRT) and Multiple-Relaxation-Time (MRT) collision models. In a conventional parallel computing model of LBM, communication and computing are performed individually. When the communication is performed, the computing is waiting in MPI processes. This will waste some waiting time. Therefore, the communication and computing overlapping parallel model was proposed. By the architecture of "Ziqiang 4000" supercomputer at Shanghai University, the hybrid MPI and OpenMP parallel model is proposed. The numerical results show that the presented model has better computational efficiency.
This paper introduces a general-purpose communication package built on top of MPI which is aimed at improving inter-processor communications independently of the supercomputer architecture being considered. The packag...
详细信息
This paper introduces a general-purpose communication package built on top of MPI which is aimed at improving inter-processor communications independently of the supercomputer architecture being considered. The package is developed to support parallel applications that rely on computation characterized by large number of messages of various sizes, often small, that are focused within processor neighborhoods. In some cases, such as solvers having static mesh partitions, the number and size of messages are known a priori. However, in other cases such as mesh adaptation, the messages evolve and vary in number and size and include the dynamic movement of partition objects. The current package provides a utility for dynamic applications based on two key attributes that are: (i) explicit consideration of the neighborhood communication pattern to avoid many-to-many calls and also to reduce the number of collective calls to a minimum, and (ii) use of non-blocking MPI functions along with message packing to manage message flow control and reduce the number and time of communication calls. The test application demonstrated is parallel unstructured mesh adaptation. Results on IBM Blue Gene/P and Cray XE6 computers show that the use of neighborhood-based communication control leads to scalable results when executing generally imbalanced mesh adaptation runs. (C) 2011 Elsevier B.V. All rights reserved.
A known scalability bottleneck of the parallel 3D FFT is its use of all -to -all communications. Here, we present S3DFT, a library that circumvents this by using point-to-point communication - albeit at a higher arith...
详细信息
A known scalability bottleneck of the parallel 3D FFT is its use of all -to -all communications. Here, we present S3DFT, a library that circumvents this by using point-to-point communication - albeit at a higher arithmetic complexity. This approach exploits three variants of Cannon's algorithm with adaptations for block tensor -matrix multiplications. We demonstrate S3DFT's efficient use of hardware resources, and its scaling using up to 16,464 cores of the JUWELS Cluster. However, in a comparison with well -established 3D FFT libraries, its parallel efficiency and performance were found to fall behind. A detailed analysis identifies the cause in two of its component algorithms, which scale poorly owing to how their communication patterns are mapped in subsets of the fat tree topology. This result exposes a potential drawback of running block -wise parallel algorithms on systems with fat tree networks caused by increased communication latencies along specific directions of the mesh of processing elements.
In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercube-connected multicomputers, which are more elegant and efficient than the existing ones. The proposed algorithms achieve perfect loa...
详细信息
暂无评论