A nonlinear partial differential equation (PDE) based compartmental model of COVID-19 provides a continuous trace of infection over space and time. Finer resolutions in the spatial discretiza-tion, the inclusion of ad...
详细信息
A nonlinear partial differential equation (PDE) based compartmental model of COVID-19 provides a continuous trace of infection over space and time. Finer resolutions in the spatial discretiza-tion, the inclusion of additional model compartments and model stratifications based on clinically relevant categories contribute to an increase in the number of unknowns to the order of millions. We adopt a parallel scalable solver that permits faster solutions for these high fidelity models. The solver combines domain decomposition and algebraic multigrid preconditioners at multiple levels to achieve the desired strong and weak scalabilities. As a numerical illustration of this general methodology, a five-compartment susceptible-exposed-infected-recovered-deceased (SEIRD) model of COVID-19 is used to demonstrate the scalability and effectiveness of the proposed solver for a large geographical domain (Southern Ontario). It is possible to predict the infections for a period of three months for a system size of 186 million (using 3200 processes) within 12 hours saving months of computational effort needed for the conventional solvers.
Fast multipole methods (FMMs) based on the oscillatory Helmholtz kernel can reduce the cost of solving N-body problems arising from boundary integral equations (BIEs) in acoustics or electromagnetics. However, their c...
详细信息
Fast multipole methods (FMMs) based on the oscillatory Helmholtz kernel can reduce the cost of solving N-body problems arising from boundary integral equations (BIEs) in acoustics or electromagnetics. However, their cost strongly increases in the high-frequency regime. This paper introduces a new directional FMM for oscillatory kernels (defmm: directional equispaced interpolation-based fmm), whose precomputation and application are FFT-accelerated due to poly-nomial interpolations on equispaced grids. We demonstrate the consistency of our FFT approach and show how symmetries can be exploited in the Fourier domain. We also describe the algorithmic de-sign of defmm, well-suited for the BIE nonuniform particle distributions, and present performance optimizations on one CPU core. Finally, we exhibit important performance gains on all test cases for defmm over a state-of-the-art FMM library for oscillatory kernels.
In this paper we investigate the highperformancecomputing efficiency of the shallow water software package ANUGA. This package is developed as a collaborative project between the Australian National University (ANU)...
详细信息
ISBN:
(纸本)9780987214331
In this paper we investigate the highperformancecomputing efficiency of the shallow water software package ANUGA. This package is developed as a collaborative project between the Australian National University (ANU) and Geoscience Australia (GA) and is available as Free and Open Source Software (FOSS). ANUGA uses a shallow water model and approximates the model using the finite volume method based on unstructured meshes of triangles. The geometrical flexibility of unstructured meshes is convenient for tsunami inundation modeling where the tsunami wave source generally consists of long wavelength components, and waves around the coast consists of short wavelengths, which can both be modeled in the same simulation. ANUGA is written in the high level computer language PYTHON. We will present an overview of the model and the numerical method in the early sections of the paper. We will then present our work on parallelizing the ANUGA code, in particular our efforts to obtain efficient simulations using 100s of CPU cores. Our results demonstrate that our PYTHON based software can obtain high efficiency on highly parallel computers. The results presented in this paper demonstrate better than real time simulation of medium resolution (millions of triangles) tsunami models. Our ultimate goal is the solution of high resolution (tens of millions of triangles) simulations in better than real time.
Training feed-forward neural networks can take a long time when there is a large amount of data to be used, even when training with more efficient algorithms like Levenberg-Marquardt. Parallel architectures have been ...
详细信息
ISBN:
(纸本)9783642293467;9783642293474
Training feed-forward neural networks can take a long time when there is a large amount of data to be used, even when training with more efficient algorithms like Levenberg-Marquardt. Parallel architectures have been a common solution in the area of highperformancecomputing, since the technology used in current processors is reaching the limits of speed. An architecture that has been gaining popularity is the GPGPU (General-Purpose computing on Graphics Processing Units), which has received large investments from companies such as NVIDIA that introduced CUDA (compute Unified Device Architecture) technology. This paper proposes a faster implementation of neural networks training with Levenberg-Marquardt algorithm using CUDA. The results obtained demonstrate that the whole training time can be almost 30 times shorter than code using Intel Math Library (MKL). A case study for classifying electrical company customers is presented.
暂无评论