This paper presents a generic approach to highly efficient image registration in two and three dimensions. Both monomodal and multimodal registration problems are considered. We focus on the important class of affine-...
详细信息
This paper presents a generic approach to highly efficient image registration in two and three dimensions. Both monomodal and multimodal registration problems are considered. We focus on the important class of affine-linear transformations in a derivative-based optimization framework. Our main contribution is an explicit formulation of the objective function gradient and Hessian approximation that allows for very efficient, parallel derivative calculation with virtually no memory requirements. The flexible parallelism of our concept allows for direct implementation on various hardware platforms. Derivative calculations are fully matrix free and operate directly on the input data, thereby reducing the auxiliary space requirements from to . The proposed approach is implemented on multicore CPU and GPU. Our GPU code outperforms a conventional matrix-based CPU implementation by more than two orders of magnitude, thus enabling usage in real-time scenarios. The computational properties of our approach are extensively evaluated, thereby demonstrating the performance gain for a variety of real-life medical applications.
Numerical approximations and computational modeling of problems from Biology and Materials Science often deal with partial differential equations with varying coefficients and domains with irregular geometry. The chal...
详细信息
Numerical approximations and computational modeling of problems from Biology and Materials Science often deal with partial differential equations with varying coefficients and domains with irregular geometry. The challenge here is to design an efficient and accurate numerical method that can resolve properties of solutions in different domains/subdomains, while handling the arbitrary geometries of the domains. In this work, we consider 2D elliptic models with material interfaces and develop efficient high-order accurate methods based on Difference Potentials for such problems. (C) 2016 IMACS. Published by Elsevier B.V. All rights reserved.
Through a simple proposal, the charge transfer obtained from the cornerstone theory of Parr and Pearson is partitioned, for each reactant, in two channels: an electrophilic, through which the species accepts electrons...
详细信息
Through a simple proposal, the charge transfer obtained from the cornerstone theory of Parr and Pearson is partitioned, for each reactant, in two channels: an electrophilic, through which the species accepts electrons, and the other, a nucleophilic, where the species donates electrons. It is shown that this global model allows us to determine unambiguously the charge-transfer mechanism prevailing in a given reaction. The partitioning is extended to include local effects through the Fukui functions of the reactants. This local model is applied to several emblematic reactions in organic and inorganic chemistry, and we show that besides improving the correlations obtained with the global model it provides valuable information concerning the atoms in the reactants playing the most important roles in the reaction and thus improving our understanding of the reaction under study.
Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, eq...
详细信息
Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters and one class of FSMs are studied.
A numerical approach for solving gas dynamics on Cartesian grids is considered which employs an implicit time marching scheme with the matrix-free Lower-Upper Symmetric Gauss-Seidel (LU-SGS) method for solving discret...
详细信息
A numerical approach for solving gas dynamics on Cartesian grids is considered which employs an implicit time marching scheme with the matrix-free Lower-Upper Symmetric Gauss-Seidel (LU-SGS) method for solving discrete equations. Boundary conditions are treated with an embedded-boundary method. The method has two attractive features-(1) algorithmic uniformity of calculations and (2) structured memory accesses that well fit massively parallel architectures with GPU accelerators. We propose a novel CUDA+MPI computational algorithm scalable up to hundreds of GPUs and give in-depth analysis of its implementation (interoperability issues, libraries tuning).
Content delivery networks have been providing content delivery services for the last two decades using their own infrastructure. Now-a-days content delivery networks have the better option of using storage cloud sites...
详细信息
The second generation of a parallel algorithm for generalized latent variable models, including MIRT models and extensions, on the basis of the general diagnostic model (GDM) is presented. This new development further...
详细信息
The HEVC video coding standard requires nearly 70 % more time than H.264/AVC to encode a video sequence. Manycore architectures can considerably help to reduce the coding time. In this paper, we propose the use of GPU...
详细信息
The HEVC video coding standard requires nearly 70 % more time than H.264/AVC to encode a video sequence. Manycore architectures can considerably help to reduce the coding time. In this paper, we propose the use of GPUs to perform the intra-picture prediction without any R/D loss. We have evaluated our proposal and compared the results with the ones obtained when running on a CPU. The results show that a time reduction of up to 85 % can be obtained without any R/D loss.
In the present work, the initial-boundary problem with non-local contact condition for heat (diffusion) equation is considered. For the stated problem, the existence and uniqueness of the solution is proved. The const...
详细信息
The mechanisms which lead to high tree species diversity in forests are not yet fully understood. One of the leading theories is that the natural enemies' interaction can give rise to a survival advantage for rare...
详细信息
ISBN:
(纸本)9781618397881
The mechanisms which lead to high tree species diversity in forests are not yet fully understood. One of the leading theories is that the natural enemies' interaction can give rise to a survival advantage for rare tree species over more common species. One way of exploring such observations is through the use of individual based modeling. An individual-based model (IBM) is a bottom up simulation where the bulk dynamics emerge from the interaction of individual constituents. Due to their emergent nature, IBMs are population sensitive where achieving a high degree of accuracy is synonymous with matching system population sizes. Consequently such models may run into the millions of individuals and become computationally intensive. Here the computing power of graphics processing units (GPUs) is used to overcome this computation limitation. The algorithms developed here for GPUs allow this model to be scaled into the millions of individuals and run on standard desktop computers. This effectively puts supercomputing power at the finger-tips of researchers, students, and forest management services alike. The parallel implementation developed here was compared against a serial implementation running on the central processing unit. The results show a significant perfomance gain for the parallel implementation while maintaining statistical accuracy. This shows that realistically sized models can be efficiently executed on inexpensive mass-market desktop computer hardware.
暂无评论