Inverse problems arise in various areas of science and engineering. These problems are not only difficult to solve numerically, but they also require a large amount of computer resources both in time and memory. It is...
详细信息
ISBN:
(纸本)9781479984480
Inverse problems arise in various areas of science and engineering. These problems are not only difficult to solve numerically, but they also require a large amount of computer resources both in time and memory. It is therefore not surprising that inverse problems are often solved using techniques from high-performance computing. We consider the parallelization of an inverse problem in the field of geothermal reservoir engineering. In this particular scientific application, the underlying software package is already parallelized using the shared-memory programming paradigm OpenMP. Here, we present an extension of this parallelization to distributedmemory enabling a hybrid OpenMP/MPI parallelization. The situation is different from the standard way of hybrid parallel programming because the data structures of the OpenMP-parallelized code differ from those in the serial implementation. We exploit this transformation of the data structures in our distributed-memory strategy for parallelizing an ensemble Kalman filter, a particular method for the solution of inverse problems. We describe this novel parallelization strategy, introduce a performance model, and present timing results on a compute cluster using nodes with 2 sockets, each equipped with Intel Xeon X5675 Westmere EP processors with 6 cores. All timing results are obtained with a pure MPI parallelization without using any OpenMP threads.
Numerical investigation of compressible flows faces two main challenges. In order to accurately describe the flow characteristics, high-resolution nonlinear numerical schemes are needed to capture discontinuities and ...
详细信息
Numerical investigation of compressible flows faces two main challenges. In order to accurately describe the flow characteristics, high-resolution nonlinear numerical schemes are needed to capture discontinuities and resolve wide convective, acoustic and interfacial scale ranges. The simulation of realistic three-dimensional (3D) problems with state-of-the-art finite-volume method (FVM) based on approximate Riemann solvers with weighted nonlinear reconstruction schemes requires the usage of high-performance computing (HPC) architectures. Efficient compression algorithms reduce computational and memory load. Fully adaptive multiresolution (MR) algorithms with LTS have proven their potential for such applications. While modern central processing units (CPUs) requires multiple levels of parallelism to achieve peak performance, the fine-grained MR mesh adaptivity results in challenging compute/communication patterns. Moreover, local time stepping (LTS) incurs strong data dependencies which challenge a parallelization strategy.& nbsp;We address these challenges with a block-based MR algorithm, where arbitrary cuts in the underlying octree are possible. This allows for a parallelization on distributed-memory machines via the Message Passing Interface (MPI). We obtain neighbor relations by a simple bit-logic in a modified Morton Order. The block-based concept allows for a modular setup of the source code framework in which the building blocks of the algorithm, such as the choice of the Riemann solver or the reconstruction stencil, are interchangeable without loss of parallel performance. We present the capabilities of the modular framework with a range of test cases and scaling analysis with effective resolutions beyond one billion cells using O(10(4)) cores. (C)& nbsp;2021 Elsevier B.V. All rights reserved.
To date, there has been a lack of efficient and practical distributed- and shared-memoryparallelizations of the data association problem for multitarget tracking. Filling this gap is one of the primary focuses of the...
详细信息
To date, there has been a lack of efficient and practical distributed- and shared-memoryparallelizations of the data association problem for multitarget tracking. Filling this gap is one of the primary focuses of the present work. We begin by describing our data association algorithm in terms of an Interacting Multiple Model (IMM) state estimator embedded into an optimization framework, namely, a two-dimensional (2D) assignment problem (i.e., weighted bipartite matching). Contrary to conventional wisdom, we show that the data association (or optimization) problem is not the major computational bottleneck;instead, the interface to the optimization problem, namely, computing the rather numerous gating tests and IMM state estimates, covariance calculations, and likelihood function evaluations (used as cost coefficients in the 2D assignment problem), is the primary source of the workload. Hence, for both a general-purpose shared-memory MIMD (Multiple Instruction Multiple Data) multiprocessor system and a distributed-memory Intel Paragon high-performance computer, we developed parallelizations of the data association problem that focus on the interface problem. For the former, a coarse-grained dynamic parallelization was developed that realizes excellent performance (i.e., superlinear speedups) independent of numerous factors influencing problem size (e.g., many models in the IMM, denseycluttered environments, contentious target-measurement data, etc.). For the latter, an SPMD (Single Program Multiple Data) parallelization was developed that realizes near-linear speedups using relatively simple dynamic task allocation algorithms. Using a real measurement database based on two FAA air traffic control radars, we show that the parallelizations developed in this work offer great promise in practice.
暂无评论