Multi-scale spin dynamics of systems of nanomagnets is investigated by numerical simulation using parallel algorithms. A FORTRAN program was developed using an application programming interface OpenMP. The parallel co...
详细信息
Multi-scale spin dynamics of systems of nanomagnets is investigated by numerical simulation using parallel algorithms. A FORTRAN program was developed using an application programming interface OpenMP. The parallel code provides following areas of research: study of the possibility of regulation time of switching of magnetization of the nanostructure;study of the role of nanocrystal geometry of coherent relaxation of 1-, 2- and 3-dimensional objects;study of magnetodynamics of spin system coupled with the passive resonator (radiation damping (RD));application of RD to ultra-fast relaxation in an assembly of single-domain ferromagnetic particles;study of the role of long distant dipole-dipole fields as the origin of the extremely random behavior in hyperpolarized NMR maser, etc. Estimates of speedup and efficiency of implemented algorithms in comparison with sequential algorithms have been obtained. It is shown that the use of supercomputing technology for study of spin dynamics provides simulation power for spin systems which include thousands of magnetic voxels. (C) 2015 Elsevier B.V. All rights reserved.
Tomographic image reconstruction has a wide variety of applications ranging from engineering applications to medical applications. Algebraic reconstruction methods, used to obtain the solutions of tomographic image re...
详细信息
Tomographic image reconstruction has a wide variety of applications ranging from engineering applications to medical applications. Algebraic reconstruction methods, used to obtain the solutions of tomographic image reconstruction problems, are very slow in nature. This performance bottleneck has been discussed in detail in the present work. This paper encompasses a parallel (multi-processor based and multi-processor multi-GPU based) single-view coded multiplicative algebraic reconstruction technique. It has been found that parallel implementation of this algorithm helps in removing the performance bottleneck without compromising with quality of reconstruction. It has been also found that if one uses four processors to reconstruct an image of 512 x 512 x 512 volume size, then the multi-processor based algorithm takes 1997 s to perform one swap of 200 projections taken over a span of 360 degrees. The use of four processors leads to an increase in speed of 2.39 in comparison with a single processor. Further, the proposed multi-processor multi-GPU based algorithm takes 186 s to perform the same reconstruction by using four GPUs, resulting in an increase in speed of 25.7 in comparison with a single processor. We are able to process 42 projections per minute by using the multi-processor multi-GPU based algorithm. The algorithm is applicable to online laminographic applications.
Numerical approximations and modeling of many physical, biological, and biomedical problems often deal with equations with highly varying coefficients, heterogeneous models (described by different types of partial dif...
详细信息
Numerical approximations and modeling of many physical, biological, and biomedical problems often deal with equations with highly varying coefficients, heterogeneous models (described by different types of partial differential equations (PDEs) in different domains), and/or have to take into consideration the complex structure of the computational subdomains. The major challenge here is to design an efficient numerical method that can capture certain properties of analytical solutions in different domains/subdomains (such as positivity, different regularity/smoothness of the solutions, etc.), while handling the arbitrary geometries and complex structures of the domains. In this work, we employ one-dimensional elliptic type models as the starting point to develop and numerically test high-order accurate Difference Potentials Method (DPM) for variable coefficient elliptic problems in heterogeneous media. While the method and analysis are simple in the one-dimensional settings, they illustrate and test several important ideas and capabilities of the developed approach. (C) 2014 IMACS. Published by Elsevier B.V. All rights reserved.
When significant communication costs arise in the solution of multidimensional problems on parallel computers, optimal performance cannot always be achieved by perfectly balancing the computational load across cores. ...
详细信息
When significant communication costs arise in the solution of multidimensional problems on parallel computers, optimal performance cannot always be achieved by perfectly balancing the computational load across cores. Modest sacrifices in the computational load balance may facilitate substantial overall performance improvements by achieving large savings in the costs associated with communications. This general approach is illustrated by application to GS2, an initial value gyrokinetic simulation code developed to study low-frequency turbulence in magnetized plasma. GS2 is parallelised using MPI with the simulation domain decomposed across tasks. The optimal domain decomposition is non-trivial, and is complicated by the fact that several domain decompositions are needed and that these do not all optimise at the chosen task count. Application to GS2, of the novel approach outlined in this paper, has improved performance by up to 17 percent for a representative simulation. Similar strategies may be beneficial in a broader class of problems.
This paper exploits the parallelism potential on a Clonal Selection Algorithm (CSA) as a parallel metaheuristic algorithm, due the lack of explanation detail of the stages of designing parallel algorithms. To parallel...
详细信息
ISBN:
(纸本)9781509012756
This paper exploits the parallelism potential on a Clonal Selection Algorithm (CSA) as a parallel metaheuristic algorithm, due the lack of explanation detail of the stages of designing parallel algorithms. To parallelise population-based algorithms, we need to exploit and define their granularity for each stage; do data or functional partition; and choose the communication model. Using a library for a message-passing model, such as MPJExpress, we define appropriate methods to implement process communication. This research results pseudo-code for the two communication message-passing models, using MPJExpress. We implemented this pseudo-codes using Java Language with a dataset from the Travelling Salesman Problem (TSP). The experiments showed that multicommunication model using alltogether method gained better performance that master-slave model that using send-and receive method.
Many problems of interest in plasma modeling are subject to the tyranny of scales, specifically, problems that encompass physical processes that operate on timescales that are separated by many orders of magnitude. In...
详细信息
Many problems of interest in plasma modeling are subject to the tyranny of scales, specifically, problems that encompass physical processes that operate on timescales that are separated by many orders of magnitude. Investigating such problems, therefore, requires the use of implicit time-integration schemes, which advance problem solutions on the timescale of interest, while incorporating the physics of the fast timescales. One promising route to develop these implicit solvers is the combination of Jacobian-free Newton-Krylov (JFNK) methods, but adapting these methods to work in ultrascale computing environments is a formidable challenge. Here, we describe research on new approaches to adapt algebraic mulgrid-based solvers (that can be used for providing efficient preconditioners for JFNK methods) to ultrascale computing environments, the development and testing of JFNK solvers for coupled plasma electromagnetics within the USIM framework and the application of these methods to modeling H- ion sources for the spallation neutron source at ORNL.
A recently proposed scenario decomposition algorithm for stochastic 0-1 programs finds an optimal solution by evaluating and removing individual solutions that are discovered by solving scenario subproblems. In this w...
详细信息
ISBN:
(纸本)9781509036837
A recently proposed scenario decomposition algorithm for stochastic 0-1 programs finds an optimal solution by evaluating and removing individual solutions that are discovered by solving scenario subproblems. In this work, we develop an asynchronous, distributed implementation of the algorithm which has computational advantages over existing synchronous implementations of the algorithm. Improvements to both the synchronous and asynchronous algorithm are proposed. We test the results on well known stochastic 0-1 programs from the SIPLIB test library and is able to solve one previously unsolved instance from the test set.
We present scalable parallel algorithms with sublinear per-processor communication volume and low latency for several fundamental problems related to finding the most relevant elements in a set, for various notions of...
详细信息
ISBN:
(纸本)9781509021413
We present scalable parallel algorithms with sublinear per-processor communication volume and low latency for several fundamental problems related to finding the most relevant elements in a set, for various notions of relevance: We begin with the classical selection problem with unsorted input. We present generalizations with sorted inputs, dynamic content (bulk-parallel priority queues), and multiple criteria. Then we move on to finding frequent objects and top-k sum aggregation.
Various sequential algorithms for the shortest path problem on time dependent graphs are appearing in the literature. However, these algorithms mostly suffer from long running times and huge memory requirements. These...
详细信息
Various sequential algorithms for the shortest path problem on time dependent graphs are appearing in the literature. However, these algorithms mostly suffer from long running times and huge memory requirements. These problems are making them unsuitable for navigation applications which need to run on real time data with fast response times. For the shortest path problem with time dependent flow speed model, we propose parallel algorithms based on Modified Dykstra algorithm in order to speed-up the running time of the sequential algorithm without requiring much more memory. We develop three different parallel implementations by using Cuda and OpenMP: These are (i) a Cuda based version, (ii) an OpenMP based version and (iii) a hybrid Cuda and OpenMP based version. We get up to 10-fold speedup in the OpenMP version, and 17-fold speed up in the other two versions.
The main contribution of this paper is to show an implementation of the parallel convex hull algorithm on the parallella architecture. parallella is a single-board computer with 16 mesh-connected cores. We have consid...
详细信息
ISBN:
(纸本)9781509026562
The main contribution of this paper is to show an implementation of the parallel convex hull algorithm on the parallella architecture. parallella is a single-board computer with 16 mesh-connected cores. We have considered the memory architecture and mesh-connected network of the parallella architecture. We evaluated the computing time and the energy-efficiency by comparing with various computing platforms such as Raspberry Pi, desktop PC, and multicore server. The experimental results show that for 16384 points, although the computing time of parallella is 17.50 times longer than that of 24-core multicore server, its energy-efficiency is 7.12 times higher.
暂无评论