检索结果-内蒙古大学图书馆

Efficient and Scalable Initialization of Partitioned Coupled Simulations with preCICE

ALGORITHMS 2021年第6期14卷 166-166页

作者： Totounferoush, Amin Simonis, Frederic Uekermann, Benjamin Schulte, Miriam Univ Stuttgart Inst Parallel & Distributed Syst IPVS D-70569 Stuttgart Germany Tech Univ Munich TUM Sci Comp Comp Sci D-85748 Garching Germany

preCICE is an open-source library, that provides comprehensive functionality to couple independent parallelized solver codes to establish a partitioned multi-physics multi-code simulation environment. For data communication between the respective executables at runtime, it implements a peer-to-peer concept, which renders the computational cost of the coupling per time step negligible compared to the typical run time of the coupled codes. To initialize the peer-to-peer coupling, the mesh partitions of the respective solvers need to be compared to determine the point-to-point communication channels between the processes of both codes. This initialization effort can become a limiting factor, if we either reach memory limits or if we have to re-initialize communication relations in every time step. In this contribution, we remove two remaining bottlenecks: (i) We base the neighborhood search between mesh entities of two solvers on a tree data structure to avoid quadratic complexity, and (ii) we replace the sequential gather-scatter comparison of both mesh partitions by a two-level approach that first compares bounding boxes around mesh partitions in a sequential manner, subsequently establishes pairwise communication between processes of the two solvers, and finally compares mesh partitions between connected processes in parallel. We show, that the two-level initialization method is fives times faster than the old one-level scheme on 24,567 CPU-cores using a mesh with 628,898 vertices. In addition, the two-level scheme is able to handle much larger computational meshes, since the central mesh communication of the one-level scheme is replaced with a fully point-to-point mesh communication scheme.

关键词： parallel programming high performance computing multi-physics simulation

来源：评论

学校读者我要写书评

暂无评论

Deterministic Atomic Buffering 53

Deterministic Atomic Buffering

引用

53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

作者： Chou, Yuan Hsi Ng, Christopher Cattell, Shaylin Intan, Jeremy Sinclair, Matthew D. Devietti, Joseph Rogers, Timothy G. Aamodt, Tor M. Univ British Columbia Vancouver BC Canada Univ Wisconsin Madison WI 53706 USA Univ Penn Philadelphia PA 19104 USA Purdue Univ W Lafayette IN 47907 USA AMD Res Santa Clara CA USA

ISBN: (纸本)9781728173832

Deterministic execution for GPUs is a desirable property as it helps with debuggability and reproducibility. It is also important for safety regulations, as safety critical workloads are starting to be deployed onto GPUs. Prior deterministic architectures, such as GPUDet, attempt to provide strong determinism for all types of workloads, incurring significant performance overheads due to the many restrictions that are required to satisfy determinism. We observe that a class of reduction workloads, such as graph applications and neural architecture search for machine learning, do not require such severe restrictions to preserve determinism. This motivates the design of our system, Deterministic Atomic Buffering (DAB), which provides deterministic execution with low area and performance overheads by focusing solely on ordering atomic instructions instead of all memory instructions. By scheduling atomic instructions deterministically with atomic buffering, the results of atomic operations are isolated initially and made visible in the future in a deterministic order. This allows the GPU to execute deterministically in parallel without having to serialize its threads for atomic operations as opposed to GPUDet. Our simulation results show that, for atomic-intensive applications, DAB performs 4x better than GPUDet and incurs only a 23% slowdown on average compared to a non-deterministic GPU architecture. We also characterize the bottlenecks and provide insights for future optimizations.

关键词： GPU architecture determinism performance parallel programming

来源：评论

学校读者我要写书评

暂无评论

1.5GBIT/S 4.9W HYPERSPECTRAL IMAGE ENCODERS ON A LOW-POWER parallel HETEROGENEOUS PROCESSING PLATFORM

1.5GBIT/S 4.9W HYPERSPECTRAL IMAGE ENCODERS ON A LOW-POWER P...

引用

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Ferraz, Oscar Silva, Vitor Falcao, Gabriel Univ Coimbra Dept Elect & Comp Engn Inst Telecomunicacoes Coimbra Portugal

ISBN: (纸本)9781509066315

This work explores the utilization of low-power heterogeneous devices for parallelizing the compute-intensive hyperspectral and multispectral image compression CCSDS-123 entropy encoders. Multithread processing allows for the near-optimal system's bandwidth to be exploited increasing the system overall performance. The experimental platform consists of a low-power Jetson TX2 GPU equipped with an ARM Cortex-A57 and Denver 2 host processors, reporting more than 1552 Mb/s and, more importantly, 315 Mb/s/W, all running under a global 5 W power budget, which makes it a good candidate for onboard image compression.

关键词： Low Power Graphics Processing Units parallel programming Multispectral Image Compression Hyperspectral Image Compression Lossless Compression

来源：评论

学校读者我要写书评

暂无评论

Application of Hybrid MPI+TBB parallel programming Model for Traveling Salesman Problem

Application of Hybrid MPI+TBB Parallel Programming Model for...

引用

IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom) Green Computing and Communications (GreenCom)

作者： Jinke Zhu Qing Li School of Computer Engineering and Science Shanghai University Shanghai China

A parallel algorithm for solving TSP(traveling salesman problem) is presented in this paper. Combining 2-opt local search optimization with genetic algorithm is the main ideal of this algorithm. In this paper, MPI+TBB hybrid parallel programming model is employed in implement of our algorithm. Numerical results indicate that it is possible to arrive at high quality solutions in reasonable time. With the increase in the scale of solving problem, the speedup of parallel algorithm is improved. Moreover, with the growth in the number of cores, the speedup of the parallel algorithm presents nearly linear growth.

关键词： Genetic algorithms Message systems parallel programming Traveling salesman problems parallel algorithms Computational modeling Educational institutions

来源：评论

学校读者我要写书评

暂无评论

Addressing Logical Deadlocks through Task-parallel Language Design

Addressing Logical Deadlocks through Task-Parallel Language ...

引用

作者： Voss, Caleb A. Georgia Institute of Technology

学位级别：博士

Task-parallel programming languages offer a variety of high-level mechanisms for synchronization that trade off between flexibility and deadlock safety. Some approaches are deadlock-free by construction but support limited synchronization patterns, while other approaches are trivial to deadlock. In high-level task-parallel programming, it is imperative that language features offer both flexibility to avoid over-synchronization and also sufficient protection against logical deadlocks. Lack of flexibility leads to code that does not take full advantage of the available parallelism in the computation. Lack of deadlock protection leads to error-prone code in which a single bug can involve arbitrarily many tasks, making it difficult to reason about. We make advances in both flexibility and deadlock protection for existing synchronization mechanisms by carefully designing dynamically verifiable usage policies and language constructs. We first define a deadlock-freedom policy for futures. The rules of the policy follow naturally from the semantics of asynchronous task closures and correspond to a preorder traversal of the task tree. The policy admits an additional class of deadlock-free programs compared to past work. Each blocking wait for a future can be verified by a stateless, lock-free algorithm, resulting in low time and memory overheads at runtime. In order to define and identify deadlocks for promises, we introduce a mechanism for promises to be owned by tasks. Simple annotations make it possible to ensure that each promise is eventually fulfilled by the responsible task or handed off to another task. Ownership semantics allows us to formally define two kinds of promise bugs: omitted sets and deadlock cycles. We present novel detection algorithms for both bugs. We further introduce an approximate deadlock-freedom policy for promises that, instead of precisely detecting cycles, raises an alarm when synchronization dependences occurring between trees of tasks are a

关键词： parallel programming Synchronization Deadlock detection Language design Runtime verification

来源：评论

学校读者我要写书评

暂无评论

Development of parallel software code for calculating the problem of radiation magnetic gas dynamics and the study of plasma dynamics in the channel of plasma accelerator 21

Development of parallel software code for calculating the pr...

引用

21st Conference on Scientific Services and Internet, SSI 2019

作者： Bakhtin, Vladimir Zakharov, Dmitry Kozlov, Andrey Konovalov, Venyamin Keldysh Institute of Applied Mathematics Miusskaya sq. 4 Moscow125047 Russia Lomonosov Moscow State University GSP-1 Leninskie Gory Moscow11999 Russia Bauman Moscow State Technical University ul. Baumanskaya 2-ya 5/1 Moscow105005 Russia

DVM-system is designed for the development of parallel programs of scientific and technical calculations in C-DVMH and Fortran-DVMH languages. These languages use a single parallel programming model (DVMH model) and are extensions of the standard C and Fortran languages with parallelism specifications, written in the form of directives to the compiler. The DVMH model makes it possible to create efficient parallel programs for heterogeneous computing clusters, in the nodes of which accelerators (graphic processors or Intel Xeon Phi coprocessors) can be used as computing devices along with universal multi-core processors. The article describes the experience of successful use of DVM-system for the development of parallel software code for calculating the problem of radiation magnetic hydrodynamics and the study of plasma dynamics in the channel of plasma accelerator. Copyright © 2020 for this paper by its authors.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

ParlayLib - A Toolkit for parallel Algorithms on Shared-Memory Multicore Machines 20

ParlayLib - A Toolkit for Parallel Algorithms on Shared-Memo...

引用

32nd ACM Symposium on parallelism in Algorithms and Architectures, SPAA 2020

作者： Blelloch, Guy E. Anderson, Daniel Dhulipala, Laxman Carnegie Mellon University PittsburghPA United States

ISBN: (纸本)9781450369350

ParlayLib is a C++ library for developing efficient parallel algorithms and software on shared-memory multicore machines. It provides additional tools and primitives that go beyond what is available in the C++ standard library, and simplifies the task of programming provably efficient and scalable parallel algorithms. It consists of a sequence data type (analogous to std::vector), many parallel routines and algorithms, a work-stealing scheduler to support nested parallelism, and a scalable memory allocator. It has been developed over a period of seven years and used in a variety of software including the PBBS benchmark suite, the Ligra, Julienne, and Aspen graph processing frameworks, the Graph Based Benchmark Suite, and the PAM library for parallel balanced binary search trees, and an implementation of the TPC-H benchmark suite. © 2020 Owner/Author.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

An Execution Time Comparison of parallel Computing Algorithms for Solving Heat Equation 3rd

An Execution Time Comparison of Parallel Computing Algorithm...

引用

3rd International Conference on Smart Applications and Data Analysis for Smart Cyber Physical Systems

作者： Belhaous, Safa Hidila, Zineb Baroud, Sohaib Chokri, Soumia Mestari, Mohammed Hassan II Univ ENSET SSDIA Lab Mohammadia Morocco

ISBN: (纸本)9783030451820;9783030451837

parallel Computing contributes significantly to most disciplines for solving several scientific problems such as partial differential equations (PDEs), load balancing, and deep learning. The primary characteristic of parallelism is its ability to ameliorate performance on many different sets of computers. Consequently, many researchers are continually expending their efforts to produce efficient parallel solutions for various problems such as heat equation. Heat equation is a natural phenomenon used in many fields like mathematics and physics. Usually, its associated model is defined by a set of partial differential equations (PDEs). This paper is primarily aimed at showing two parallel programs for solving the heat equation which has been discrete-sized using the finite difference method (FDM). These programs have been implemented through different parallel platforms such as SkelGIS and Compute Unified Device Architecture (CUDA).

关键词： parallel computing parallel programming Heat equation CUDA SkelGIS library GPU Finite difference method

来源：评论

学校读者我要写书评

暂无评论

Eventify: Event-Based Task parallelism for Strong Scaling 20

Eventify: Event-Based Task Parallelism for Strong Scaling

引用

7th Annual Platform for Advanced Scientific Computing Conference, PASC 2020

作者： Haensel, David Morgenstern, Laura Beckmann, Andreas Kabadshow, Ivo Dachsel, Holger Jülich Supercomputing Centre Jülich Germany Chemnitz University of Technology Chemnitz Germany

ISBN: (纸本)9781450379939

Today's processors become fatter, not faster. However, the exploitation of these massively parallel compute resources remains a challenge for many traditional HPC applications regarding scalability, portability and programmability. To tackle this challenge, several parallel programming approaches such as loop parallelism and task parallelism are researched in form of languages, libraries and frameworks. Task parallelism as provided by OpenMP, HPX, StarPU, Charm++ and Kokkos is the most promising approach to overcome the challenges of ever increasing parallelism. The aforementioned parallel programming technologies enable scalability for a broad range of algorithms with coarse-grained tasks, e. g. in linear algebra and classical N-body simulation. However, they do not fully address the performance bottlenecks of algorithms with fine-grained tasks and the resultant large task graphs. Additionally, we experienced the description of large task graphs to be cumbersome with the common approach of providing in-, out-and inout-dependencies. We introduce event-based task parallelism to solve the performance and programmability issues for algorithms that exhibit fine-grained task parallelism and contain repetitive task patterns. With user-defined event lists, the approach provides a more convenient and compact way to describe large task graphs. Furthermore, we show how these event lists are processed by a task engine that reuses user-defined, algorithmic data structures. As use case, we describe the implementation of a fast multipole method for molecular dynamics with event-based task parallelism. The performance analysis reveals that the event-based implementation is 52 % faster than a classical loop-parallel implementation with OpenMP. © 2020 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Time Improvement of Smith-Waterman Algorithm Using OpenMP and SIMD 2nd

Time Improvement of Smith-Waterman Algorithm Using OpenMP an...

引用

2nd International Conference on Futuristic Trends in Networks and Computing Technologiess, FTNCT 2019

作者： Malik, Mehak Malhotra, Srijan Prasanth, Narayanan School of CSE Vellore Institute of Technology VelloreTamil Nadu India

ISBN: (纸本)9789811544507

Sequence alignment is a problem in bioinformatics that involves arranging sequences of proteins, RNA or DNA so that similar regions between two or more sequences may be determined. The Smith-Waterman algorithm is a key algorithm for aligning sequences. This paper uses the OpenMP application-programming interface along with the Single-Instruction Multiple-Data (SIMD) instructions. Advanced Vector Instructions 2 (AVX2) is used to implement the SIMD paradigm. It utilizes both fine-level and coarse-level parallelism to improve resource utilization without requiring support from multiple nodes in a distributed memory system. The algorithm shows a multifold decrease in execution time in comparison to an implementation that is sequentially executed. © 2020, Springer Nature Singapore Pte Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：