检索结果-内蒙古大学图书馆

Use of supercomputer for modeling coherent processes in magnetic nano-structures

COMPUTATIONAL MATERIALS SCIENCE 2015年 102卷 228-233页

作者： Belozerova, T. S. Demenev, A. G. Henner, V. K. Kharebov, P. V. Khenner, E. K. Sumanasekera, G. U. Perm State Univ Dept Mech & Math Perm 614990 Russia Perm State Univ Dept Phys Perm 614990 Russia Univ Louisville Dept Phys Louisville KY 40292 USA

Multi-scale spin dynamics of systems of nanomagnets is investigated by numerical simulation using parallel algorithms. A FORTRAN program was developed using an application programming interface OpenMP. The parallel code provides following areas of research: study of the possibility of regulation time of switching of magnetization of the nanostructure;study of the role of nanocrystal geometry of coherent relaxation of 1-, 2- and 3-dimensional objects;study of magnetodynamics of spin system coupled with the passive resonator (radiation damping (RD));application of RD to ultra-fast relaxation in an assembly of single-domain ferromagnetic particles;study of the role of long distant dipole-dipole fields as the origin of the extremely random behavior in hyperpolarized NMR maser, etc. Estimates of speedup and efficiency of implemented algorithms in comparison with sequential algorithms have been obtained. It is shown that the use of supercomputing technology for study of spin dynamics provides simulation power for spin systems which include thousands of magnetic voxels. (C) 2015 Elsevier B.V. All rights reserved.

关键词： High-performance computing parallel algorithms Mathematical modeling Coherent effects in spin systems Magnetic nano-structures

来源：评论

学校读者我要写书评

暂无评论

Fast multi-processor multi-GPU based algorithm of tomographic inversion for 3D image reconstruction

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2015年第1期29卷 64-72页

作者： Bajpai, Manish Gupta, Phalguni Munshi, Prabhat Indian Inst Technol Kanpur Nucl Engn & Technol Programme Kanpur Uttar Pradesh India Indian Inst Technol Kanpur Dept Comp Sci & Engn Kanpur Uttar Pradesh India

Tomographic image reconstruction has a wide variety of applications ranging from engineering applications to medical applications. Algebraic reconstruction methods, used to obtain the solutions of tomographic image reconstruction problems, are very slow in nature. This performance bottleneck has been discussed in detail in the present work. This paper encompasses a parallel (multi-processor based and multi-processor multi-GPU based) single-view coded multiplicative algebraic reconstruction technique. It has been found that parallel implementation of this algorithm helps in removing the performance bottleneck without compromising with quality of reconstruction. It has been also found that if one uses four processors to reconstruct an image of 512 x 512 x 512 volume size, then the multi-processor based algorithm takes 1997 s to perform one swap of 200 projections taken over a span of 360 degrees. The use of four processors leads to an increase in speed of 2.39 in comparison with a single processor. Further, the proposed multi-processor multi-GPU based algorithm takes 186 s to perform the same reconstruction by using four GPUs, resulting in an increase in speed of 25.7 in comparison with a single processor. We are able to process 42 projections per minute by using the multi-processor multi-GPU based algorithm. The algorithm is applicable to online laminographic applications.

关键词： Multi-processor architecture multi-GPU parallel algorithms computed tomography 3D reconstruction

来源：评论

学校读者我要写书评

暂无评论

High-order difference potentials methods for 1D elliptic type models

引用

APPLIED NUMERICAL MATHEMATICS 2015年 93卷 69-86页

作者： Epshteyn, Yekaterina Phippen, Spencer Univ Utah Dept Math Salt Lake City UT 84112 USA

Numerical approximations and modeling of many physical, biological, and biomedical problems often deal with equations with highly varying coefficients, heterogeneous models (described by different types of partial differential equations (PDEs) in different domains), and/or have to take into consideration the complex structure of the computational subdomains. The major challenge here is to design an efficient numerical method that can capture certain properties of analytical solutions in different domains/subdomains (such as positivity, different regularity/smoothness of the solutions, etc.), while handling the arbitrary geometries and complex structures of the domains. In this work, we employ one-dimensional elliptic type models as the starting point to develop and numerically test high-order accurate Difference Potentials Method (DPM) for variable coefficient elliptic problems in heterogeneous media. While the method and analysis are simple in the one-dimensional settings, they illustrate and test several important ideas and capabilities of the developed approach. (C) 2014 IMACS. Published by Elsevier B.V. All rights reserved.

关键词： Difference potentials Boundary projections Cauchy's type integral Boundary value problems Variable coefficients Heterogeneous media High-order finite difference schemes Difference Potentials Method Immersed Interface Method Interface problems parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimising Performance through Unbalanced Decompositions

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2015年第10期26卷 2863-2873页

作者： Jackson, Adrian Hein, Joachim Roach, Colin Univ Edinburgh EPCC Edinburgh EH9 3JZ Midlothian Scotland Lund Univ S-22100 Lund Sweden EURATOM CCFE Fus Assoc Culham Sci Ctr Abingdon OX14 3DB Oxon England

When significant communication costs arise in the solution of multidimensional problems on parallel computers, optimal performance cannot always be achieved by perfectly balancing the computational load across cores. Modest sacrifices in the computational load balance may facilitate substantial overall performance improvements by achieving large savings in the costs associated with communications. This general approach is illustrated by application to GS2, an initial value gyrokinetic simulation code developed to study low-frequency turbulence in magnetized plasma. GS2 is parallelised using MPI with the simulation domain decomposed across tasks. The optimal domain decomposition is non-trivial, and is complicated by the fact that several domain decompositions are needed and that these do not all optimise at the chosen task count. Application to GS2, of the novel approach outlined in this paper, has improved performance by up to 17 percent for a representative simulation. Similar strategies may be beneficial in a broader class of problems.

关键词： Distributed parallel algorithms applications nonlinear programming linear programming physics

来源：评论

学校读者我要写书评

暂无评论

Clonal Selection Algorithm parallelization with MPJExpress

Clonal Selection Algorithm parallelization with MPJExpress

引用

Computer Science and Electronic Engineering Conference (CEEC)

作者： Ayi Purbasari Informatics Departement Universitas Pasundan Bandung Indonesia

ISBN: (纸本)9781509012756

This paper exploits the parallelism potential on a Clonal Selection Algorithm (CSA) as a parallel metaheuristic algorithm, due the lack of explanation detail of the stages of designing parallel algorithms. To parallelise population-based algorithms, we need to exploit and define their granularity for each stage; do data or functional partition; and choose the communication model. Using a library for a message-passing model, such as MPJExpress, we define appropriate methods to implement process communication. This research results pseudo-code for the two communication message-passing models, using MPJExpress. We implemented this pseudo-codes using Java Language with a dataset from the Travelling Salesman Problem (TSP). The experiments showed that multicommunication model using alltogether method gained better performance that master-slave model that using send-and receive method.

关键词： Sociology Statistics Cloning Algorithm design and analysis Partitioning algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fully Implicit Ultrascale Physics Solvers and Application to Ion Source Modeling

引用

IEEE TRANSACTIONS ON PLASMA SCIENCE 2015年第4期43卷 957-964页

作者： Beckwith, Kris Veitzer, Seth A. McCormick, Stephen Ruge, John Olson, Luke N. Cahoun, Jon C. Tech X Corp Boulder CO 80303 USA Front Range Sci Computat Inc Boulder CO 80305 USA Univ Illinois Dept Comp Sci Champaign IL 61820 USA

Many problems of interest in plasma modeling are subject to the tyranny of scales, specifically, problems that encompass physical processes that operate on timescales that are separated by many orders of magnitude. Investigating such problems, therefore, requires the use of implicit time-integration schemes, which advance problem solutions on the timescale of interest, while incorporating the physics of the fast timescales. One promising route to develop these implicit solvers is the combination of Jacobian-free Newton-Krylov (JFNK) methods, but adapting these methods to work in ultrascale computing environments is a formidable challenge. Here, we describe research on new approaches to adapt algebraic mulgrid-based solvers (that can be used for providing efficient preconditioners for JFNK methods) to ultrascale computing environments, the development and testing of JFNK solvers for coupled plasma electromagnetics within the USIM framework and the application of these methods to modeling H- ion sources for the spallation neutron source at ORNL.

关键词： Fault tolerance ion sources magnetohydrodynamics parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Scenario Decomposition for 0-1 Stochastic Programs: Improvements and Asynchronous Implementation

Scenario Decomposition for 0-1 Stochastic Programs: Improvem...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Kevin Ryan Deepak Rajan Shabbir Ahmed Georgia Institute of Technology Atlanta Georgia Lawrence Livermore National Laboratory Livermore California

ISBN: (纸本)9781509036837

A recently proposed scenario decomposition algorithm for stochastic 0-1 programs finds an optimal solution by evaluating and removing individual solutions that are discovered by solving scenario subproblems. In this work, we develop an asynchronous, distributed implementation of the algorithm which has computational advantages over existing synchronous implementations of the algorithm. Improvements to both the synchronous and asynchronous algorithm are proposed. We test the results on well known stochastic 0-1 programs from the SIPLIB test library and is able to solve one previously unsolved instance from the test set.

关键词： Upper bound Electronic mail parallel algorithms Algorithm design and analysis Convergence Synchronization Distributed processing

来源：评论

学校读者我要写书评

暂无评论

Communication Efficient algorithms for Top-k Selection Problems

Communication Efficient Algorithms for Top-k Selection Probl...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： Lorenz Hübschle-Schneider Peter Sanders Institute of Theoretical Informatics Karlsruhe Institute of Technology Germany

ISBN: (纸本)9781509021413

We present scalable parallel algorithms with sublinear per-processor communication volume and low latency for several fundamental problems related to finding the most relevant elements in a set, for various notions of relevance: We begin with the classical selection problem with unsorted input. We present generalizations with sorted inputs, dynamic content (bulk-parallel priority queues), and multiple criteria. Then we move on to finding frequent objects and top-k sum aggregation.

关键词： Approximation algorithms Data structures Sorting Memory management parallel algorithms Resource management Bandwidth

来源：评论

学校读者我要写书评

暂无评论

parallelizing shortest path algorithm for time dependent graphs with flow speed model

Parallelizing shortest path algorithm for time dependent gra...

引用

International Conference on Application of Information and Communication Technologies (AICT)

作者： Mehmet Akif Ersoy Can Özturan Bogazici Universitesi Istanbul TR Dept. of Computer Engineering Bogazici University Istanbul Turkey

Various sequential algorithms for the shortest path problem on time dependent graphs are appearing in the literature. However, these algorithms mostly suffer from long running times and huge memory requirements. These problems are making them unsuitable for navigation applications which need to run on real time data with fast response times. For the shortest path problem with time dependent flow speed model, we propose parallel algorithms based on Modified Dykstra algorithm in order to speed-up the running time of the sequential algorithm without requiring much more memory. We develop three different parallel implementations by using Cuda and OpenMP: These are (i) a Cuda based version, (ii) an OpenMP based version and (iii) a hybrid Cuda and OpenMP based version. We get up to 10-fold speedup in the OpenMP version, and 17-fold speed up in the other two versions.

关键词： Shortest path problem Algorithm design and analysis parallel algorithms Heuristic algorithms US Department of Transportation Graphics processing units Memory management

来源：评论

学校读者我要写书评

暂无评论

An Evaluation of the parallella Architecture for the Convex Hull Computation

An Evaluation of the Parallella Architecture for the Convex ...

引用

International Symposium on Computing and Networking (CANDAR)

作者： Keisuke Nakata Yasuaki Ito Department of Information Engineering Hiroshima University Hiroshima Japan

ISBN: (纸本)9781509026562

The main contribution of this paper is to show an implementation of the parallel convex hull algorithm on the parallella architecture. parallella is a single-board computer with 16 mesh-connected cores. We have considered the memory architecture and mesh-connected network of the parallella architecture. We evaluated the computing time and the energy-efficiency by comparing with various computing platforms such as Raspberry Pi, desktop PC, and multicore server. The experimental results show that for 16384 points, although the computing time of parallella is 17.50 times longer than that of 24-core multicore server, its energy-efficiency is 7.12 times higher.

关键词： Multicore processing Servers Program processors parallel algorithms Computers Throughput

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：