The advent of massively parallel supercomputers, with their distributed-memory technology using many processing units, has favored the development of highly-scalable local low-order solvers at the expense of harder-to...
详细信息
The advent of massively parallel supercomputers, with their distributed-memory technology using many processing units, has favored the development of highly-scalable local low-order solvers at the expense of harder-to-scale global very high-order spectral methods. Indeed, FFT-based methods, which were very popular on shared memory computers, have been largely replaced by finite-difference (FD) methods for the solution of many problems, including plasmas simulations with electromagnetic Particle-In-Cell methods. For some problems, such as the modeling of so-called "plasma mirrors" for the generation of high-energy particles and ultra-short radiations, we have shown that the inaccuracies of standard FD-based PIC methods prevent the modeling on present supercomputers at sufficient accuracy. We demonstrate here that a new method, based on the use of local FFTs, enables ultrahigh-order accuracy with unprecedented scalability, and thus for the first time the accurate modeling of plasma mirrors in 3D. (C) 2018 Elsevier B.V. All rights reserved.
Computational electromagnetics plays a crucial role across diverse domains, notably in fields such as antenna design and radar signature prediction, owing to the omnipresence of electromagnetic phenomena. Numerical me...
详细信息
ISBN:
(纸本)9783031637537;9783031637513
Computational electromagnetics plays a crucial role across diverse domains, notably in fields such as antenna design and radar signature prediction, owing to the omnipresence of electromagnetic phenomena. Numerical methods have replaced traditional experimental approaches, expediting design iterations and scenario characterization. The emergence of GPU accelerators offers an efficient implementation of numerical methods that can significantly enhance the computational capabilities of partial differential equations (PDE) solvers with specific boundary-value conditions. This paper explores parallelization strategies for implementing a finite-differencetime-domain (FDTD) solver on GPUs, leveraging shared memory and optimizing memory access patterns to achieve performance gains. One notable innovation presented in this research involves utilizing strategies such as exploiting temporal locality and avoiding misaligned global memory accesses to enhance data processing efficiency. Additionally, we break down the computation process into multiple kernels, each focusing on computing different electromagnetic (EM) field components, to enhance shared memory utilization and GPU cache efficiency. We implement crucial design optimizations to exploit GPU's parallel processing capabilities fully. These include maintaining consistent block sizes, analyzing optimal configurations for field-updating kernels, and optimizing memory access patterns for CUDA threads within warps. Our experimental analysis verifies the effectiveness of these strategies, resulting in improvements in both reducing execution time and enhancing the GPU's effective memory bandwidth. Throughput evaluation demonstrates performance gains, with our CUDA implementation achieving up to 17 times higher throughput than CPU-based methods. Speedup gains and throughput comparisons illustrate the scalability and efficiency of our approach, showcasing its potential for developing large-scale electromagnetic simulations on
暂无评论