检索结果-内蒙古大学图书馆

On distributed memory MPI-based parallelization of SPH codes in massive HPC context

COMPUTER PHYSICS COMMUNICATIONS 2016年 200卷 1-14页

作者： Oger, G. Le Touze, D. Guibert, D. de Leffe, M. Biddiscombe, J. Soumagne, J. Piccinali, J. -G. CNRS ECN LHEEA Lab Nantes France HydrOcean Nantes France Swiss Natl Supercomp Ctr Manno Switzerland

Most of particle methods share the problem of high computational cost and in order to satisfy the demands of solvers, currently available hardware technologies must be fully exploited. Two complementary technologies are now accessible. On the one hand, CPUs which can be structured into a multi-node framework, allowing massive data exchanges through a high speed network. In this case, each node is usually comprised of several cores available to perform multithreaded computations. On the other hand, GPUs which are derived from the graphics computing technologies, able to perform highly multithreaded calculations with hundreds of independent threads connected together through a common shared memory. This paper is primarily dedicated to the distributed memory parallelization of particle methods, targeting several thousands of CPU cores. The experience gained clearly shows that parallelizing a particle-based code on moderate numbers of cores can easily lead to an acceptable scalability, whilst a scalable speedup on thousands of cores is much more difficult to obtain. The discussion revolves around speeding up particle methods as a whole, in a massive HPC context by making use of the MPI library. We focus on one particular particle method which is Smoothed Particle Hydrodynamics (SPH), one of the most widespread today in the literature as well as in engineering. (C) 2015 Published by Elsevier B.V.

关键词： SPH Particle methods distributed memory parallelization MPI Parallel HDF5 IO

来源：评论

学校读者我要写书评

暂无评论

Chunks and Tasks: A programming model for parallelization of dynamic algorithms

引用

PARALLEL COMPUTING 2014年第7期40卷 328-343页

作者： Rubensson, Emanuel H. Rudberg, Elias Uppsala Univ Dept Informat Technol Div Comp Sci SE-75105 Uppsala Sweden

We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and Tasks library maps the chunks and tasks to physical resources. In this way we seek to combine user friendliness with high performance. An application programmer can express a parallel algorithm using a few simple building blocks, defining data and work objects and their relationships. No explicit communication calls are needed;the distribution of both work and data is handled by the Chunks and Tasks library. This makes efficient implementation of complex applications that require dynamic distribution of work and data easier. At the same time, Chunks and Tasks imposes restrictions on data access and task dependencies that facilitate the development of high performance parallel back ends. We discuss the fundamental abstractions underlying the programming model, as well as performance, determinism, and fault resilience considerations. We also present a pilot C++ library implementation for clusters of multicore machines and demonstrate its performance for irregular block-sparse matrix-matrix multiplication. (C) 2013 Elsevier B.V. All rights reserved.

关键词： distributed memory parallelization Dynamic data distribution Dynamic load balancing Fault tolerance Parallel programming model Determinism

来源：评论

学校读者我要写书评

暂无评论

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

引用

JOURNAL OF SUPERCOMPUTING 2024年第7期80卷 8702-8718页

作者： Liu, Xin-Duo He, Wei-Jia Yang, Ming-Lin Sheng, Xin-Qing Beijing Inst Technol Ctr Electromagnet Simulat Beijing 100081 Peoples R China

This paper presents a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on homegrown many-core SW26010 cluster of China, noted as (SW-PMLFMA), for 3-D electromagnetic scattering problems. In this approach, the multilevel fast multipole algorithm (MLFMA) octree is first partitioned among management processing elements (MPEs) of SW26010 processors following the ternary partitioning scheme using the message passing interface (MPI). Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation are accelerated by using all the 64 computing processing elements (CPEs) in the same core group of the MPE via the Athread parallel programming model. Different parallelization strategies are designed for many-core accelerators to ensure a high computational throughput. In coincidence with the special characteristic of local Lagrange interpolation, the compressed sparse row (CSR) and the compressed sparse column (CSC) sparse matrix storage format is used for storing interpolation and anterpolation matrices, respectively, together with a specially designed cache mechanism of hybrid dynamic and static buffers using the scratchpad memory (SPM) to improve data access efficiency. Numerical results are included to demonstrate the efficiency and versatility of the proposed method. The proposed parallel scheme is shown to have excellent speedup.

关键词： Multilevel fast multipole algorithm distributed memory parallelization Many-core acceleration Electromagnetic scattering SW26010 processor

来源：评论

学校读者我要写书评

暂无评论

Scalable nuclear density functional theory with Sky3D

引用

COMPUTER PHYSICS COMMUNICATIONS 2018年 223卷 34-44页

作者： Afibuzzaman, Md Schuetrumpf, Bastian Aktulga, Hasan Metin Michigan State Univ Comp Sci & Engn E Lansing MI 48824 USA Michigan State Univ FRIB Lab E Lansing MI 48824 USA

In nuclear astrophysics, quantum simulations of large inhomogeneous dense systems as they appear in the crusts of neutron stars present big challenges. The number of particles in a simulation with periodic boundary conditions is strongly limited due to the immense computational cost of the quantum methods. In this paper, we describe techniques for an efficient and scalable parallel implementation of Sky3D, a nuclear density functional theory solver that operates on an equidistant grid. Presented techniques allow Sky3D to achieve good scaling and high performance on a large number of cores, as demonstrated through detailed performance analysis on a Cray XC40 supercomputer. (C) 2017 Elsevier B.V. All rights reserved.

关键词： Nuclear density functional theory distributed memory parallelization MPI Performance analysis and optimization

来源：评论

学校读者我要写书评

暂无评论

ParaXpress: an experimental extension of the FICO Xpress-Optimizer to solve hard MIPs on supercomputers

引用

OPTIMIZATION METHODS & SOFTWARE 2018年第3期33卷 530-539页

作者： Shinano, Yuji Berthold, Timo Heinz, Stefan Zuse Inst Berlin Berlin Germany Fair Isaac Germany GmbH Berlin Germany

The Ubiquity Generator (UG) is a general framework for the external parallelization of mixed integer programming (MIP) solvers. In this paper, we present ParaXpress, a distributed memory parallelization of the powerful commercial MIP solver FICO Xpress. Besides sheer performance, an important feature of Xpress is that it provides an internal parallelization for shared memory systems. When aiming for a best possible performance of ParaXpress on a supercomputer, the question arises how to balance the internal Xpress parallelization and the external parallelization by UG against each other. We provide computational experiments to address this question and we show computational results for running ParaXpress on a Top500 supercomputer, using up to 43,344 cores in parallel.

关键词： mixed-integer programming distributed memory parallelization

来源：评论

学校读者我要写书评

暂无评论

A First Implementation of ParaXpress: Combining Internal and External parallelization to Solve MIPs on Supercomputers

A First Implementation of ParaXpress: Combining Internal and...

引用

5th International Congress on Mathematical Software (ICMS)

作者： Shinano, Yuji Berthold, Timo Heinz, Stefan Zuse Inst Berlin Berlin Germany Fair Isaac Germany GmbH Berlin Germany

ISBN: (纸本)9783319424323;9783319424316

The Ubiquity Generator (UG) is a general framework for the external parallelization of mixed integer programming (MIP) solvers. It has been used to develop ParaSCIP, a distributed memory, massively parallel version of the open source solver SCIP, running on up to 80,000 cores. In this paper, we present a first implementation of ParaXpress, a distributed memory parallelization of the powerful commercial MIP solver FICO Xpress. Besides sheer performance, an important difference between SCIP and Xpress is that Xpress provides an internal parallelization for shared memory systems. When aiming for a best possible performance of ParaXpress on a supercomputer, the question arises how to balance the internal Xpress parallelization and the external parallelization by UG against each other. We provide computational experiments to address this question and we show preliminary computational results for running a first version of ParaXpress on 6,144 cores in parallel.

关键词： Mixed integer programming distributed memory parallelization

来源：评论

学校读者我要写书评

暂无评论

The Parallel Processing Approach to the Dynamic Programming Algorithm of Knapsack Problem

The Parallel Processing Approach to the Dynamic Programming ...

引用

IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus)

作者： Sin, Si Thu Thant Natl Res Univ Elect Technol MIET Inst Microdevices & Control Syst Moscow Russia

ISBN: (纸本)9781665404761

This paper aims at comparing the serial, shared memory parallelization, and distributed memory parallelization of the dynamic programming algorithm for the Knapsack Problem. Knapsack Problem is one of the most popular optimization problems. This is the decision-making problem and uses for real-world situations such as business projects, airline cargo business, cryptography, and decision-making industry processes, etc. The algorithm under consideration is the table-based dynamic programming algorithm based on Bellman's optimality principle. We used the C-HF programming language. To solve this problem on shared memory systems, we used the OpenMP. For the distributed memory parallelization, we employed the MPL The structure of the algorithm, the data distribution, synchronization, and communication schemes are explained in detail. Extensive experiments for the developed algorithms were carried out. The obtained results helped to make a comparative analysis of the developed algorithms.

关键词： parallel computing OpenMP MPI dynamic programming discrete optimization Knapsack Problem shared memory parallelization distributed memory parallelization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：