The current evolution of computer architectures towards increasing parallelism requires a corresponding evolution towards more parallel data assimilation algorithms. In this article, we consider parallelization of wea...
详细信息
The current evolution of computer architectures towards increasing parallelism requires a corresponding evolution towards more parallel data assimilation algorithms. In this article, we consider parallelization of weak-constraint four-dimensional variational data assimilation (4D-Var) in the time dimension. We categorize algorithms according to whether or not they admit such parallelization and introduce a new, highly parallel weak-constraint 4D-Var algorithm based on a saddle-point representation of the underlying optimization problem. The potential benefits of the new saddle-point formulation are illustrated with a simple two-level quasi-geostrophic model.
Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, eq...
详细信息
Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters and one class of FSMs are studied.
Numerical approximations and computational modeling of problems from Biology and Materials Science often deal with partial differential equations with varying coefficients and domains with irregular geometry. The chal...
详细信息
Numerical approximations and computational modeling of problems from Biology and Materials Science often deal with partial differential equations with varying coefficients and domains with irregular geometry. The challenge here is to design an efficient and accurate numerical method that can resolve properties of solutions in different domains/subdomains, while handling the arbitrary geometries of the domains. In this work, we consider 2D elliptic models with material interfaces and develop efficient high-order accurate methods based on Difference Potentials for such problems. (C) 2016 IMACS. Published by Elsevier B.V. All rights reserved.
A numerical approach for solving gas dynamics on Cartesian grids is considered which employs an implicit time marching scheme with the matrix-free Lower-Upper Symmetric Gauss-Seidel (LU-SGS) method for solving discret...
详细信息
A numerical approach for solving gas dynamics on Cartesian grids is considered which employs an implicit time marching scheme with the matrix-free Lower-Upper Symmetric Gauss-Seidel (LU-SGS) method for solving discrete equations. Boundary conditions are treated with an embedded-boundary method. The method has two attractive features-(1) algorithmic uniformity of calculations and (2) structured memory accesses that well fit massively parallel architectures with GPU accelerators. We propose a novel CUDA+MPI computational algorithm scalable up to hundreds of GPUs and give in-depth analysis of its implementation (interoperability issues, libraries tuning).
Content delivery networks have been providing content delivery services for the last two decades using their own infrastructure. Now-a-days content delivery networks have the better option of using storage cloud sites...
详细信息
The HEVC video coding standard requires nearly 70 % more time than H.264/AVC to encode a video sequence. Manycore architectures can considerably help to reduce the coding time. In this paper, we propose the use of GPU...
详细信息
The HEVC video coding standard requires nearly 70 % more time than H.264/AVC to encode a video sequence. Manycore architectures can considerably help to reduce the coding time. In this paper, we propose the use of GPUs to perform the intra-picture prediction without any R/D loss. We have evaluated our proposal and compared the results with the ones obtained when running on a CPU. The results show that a time reduction of up to 85 % can be obtained without any R/D loss.
The second generation of a parallel algorithm for generalized latent variable models, including MIRT models and extensions, on the basis of the general diagnostic model (GDM) is presented. This new development further...
详细信息
In the present work, the initial-boundary problem with non-local contact condition for heat (diffusion) equation is considered. For the stated problem, the existence and uniqueness of the solution is proved. The const...
详细信息
The mechanisms which lead to high tree species diversity in forests are not yet fully understood. One of the leading theories is that the natural enemies' interaction can give rise to a survival advantage for rare...
详细信息
ISBN:
(纸本)9781618397881
The mechanisms which lead to high tree species diversity in forests are not yet fully understood. One of the leading theories is that the natural enemies' interaction can give rise to a survival advantage for rare tree species over more common species. One way of exploring such observations is through the use of individual based modeling. An individual-based model (IBM) is a bottom up simulation where the bulk dynamics emerge from the interaction of individual constituents. Due to their emergent nature, IBMs are population sensitive where achieving a high degree of accuracy is synonymous with matching system population sizes. Consequently such models may run into the millions of individuals and become computationally intensive. Here the computing power of graphics processing units (GPUs) is used to overcome this computation limitation. The algorithms developed here for GPUs allow this model to be scaled into the millions of individuals and run on standard desktop computers. This effectively puts supercomputing power at the finger-tips of researchers, students, and forest management services alike. The parallel implementation developed here was compared against a serial implementation running on the central processing unit. The results show a significant perfomance gain for the parallel implementation while maintaining statistical accuracy. This shows that realistically sized models can be efficiently executed on inexpensive mass-market desktop computer hardware.
The increasing computational load required by most applications and the limits in hardware performances affecting scientific computing contributed in the last decades to the development of parallel software and archit...
详细信息
The increasing computational load required by most applications and the limits in hardware performances affecting scientific computing contributed in the last decades to the development of parallel software and architectures. In fluid-structure interaction (FSI) for haemodynamic applications, parallelization and scalability are key issues (see [L. Formaggia, A. Quarteroni, and A. Veneziani, eds., Cardiovascular Mathematics: Modeling and Simulation of the Circulatory System, Modeling, Simulation and Applications 1, Springer, Milan, 2009]). In this work we introduce a class of parallel preconditioners for the FSI problem obtained by exploiting the block-structure of the linear system. We stress the possibility of extending the approach to a general linear system with a block-structure, then we provide a bound in the condition number of the preconditioned system in terms of the conditioning of the preconditioned diagonal blocks, and finally we show that the construction and evaluation of the devised preconditioner is modular. The preconditioners are tested on a benchmark three-dimensional (3D) geometry discretized in both a coarse and a fine mesh, as well as on two physiological aorta geometries. The simulations that we have performed show an advantage in using the block preconditioners introduced and confirm our theoretical results.
暂无评论