检索结果-内蒙古大学图书馆

Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream 13

Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using Ba...

13th IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

作者： Hammond, Jeff R. Deakin, Tom Cownie, James McIntosh-Smith, Simon NVIDIA Helsinki Oy Helsinki Finland Univ Bristol Dept Comp Sci HPC Res Grp Bristol England

ISBN: (纸本)9781665451857

Fortran DO CONCURRENT has emerged as a new way to achieve parallel execution of loops on CPUs and GPUs. This paper studies the performance portability of this construct on a range of processors and compares it with the incumbent models: OpenMP, OpenACC and CUDA. To do this study fairly, we implemented the BabelStream memory bandwidth benchmark from scratch, entirely in modern Fortran, for all of the models considered, which include Fortran DO CONCURRENT, as well as two variants of OpenACC, four variants of OpenMP (2 CPU and 2 GPU), CUDA Fortran, and both loop- and array-based references. BabelStream Fortran matches the C++ implementation as closely as possible, and can be used to make language-based comparisons. This paper represents one of the first detailed studies of the performance of Fortran support on heterogeneous architectures;we include results for AArch64 and x86_64 CPUs as well as AMD, Intel and NVIDIA GPU platforms.

关键词： Fortran parallel programming GPUs multi-core memory bandwidth

来源：评论

学校读者我要写书评

暂无评论

parallelism Detection Using Graph Labelling

引用

LOBACHEVSKII JOURNAL OF MATHEMATICS 2022年第10期43卷 2893-2900页

作者： Telegin, P. N. Baranov, A. V. Shabanov, B. M. Tikhomirov, A. I. Russian Acad Sci Joint Supercomp Ctr Branch Fed State Inst Sci Res Inst Syst Anal Moscow 119334 Russia

Usage of multiprocessor and multicore computers implies parallel programming. Tools for preparing parallel programs include parallel languages and libraries as well as parallelizing compilers and convertors that can perform automatic parallelization. The basic approach for parallelism detection is analysis of data dependencies and properties of program components, including data use and predicates. In this article a suite of used data and predicates sets for program components is proposed and an algorithm for computing these sets is suggested. The algorithm is based on wave propagation on graphs with cycles and labelling. This method allows analysing complex program components, improving data localization and thus providing enhanced data parallelism detection.

关键词： parallel programming program parallelization graph wave algorithm graph labelling.

来源：评论

学校读者我要写书评

暂无评论

Abstractions for C plus plus code optimizations in parallel high-performance applications

引用

parallel COMPUTING 2024年 121卷

作者： Klepl, Jiri Smelko, Adam Rozsypal, Lukas Krulis, Martin Charles Univ Prague Dept Distributed & Dependable Syst Malostranske Nam 25 Prague 11800 Czech Republic

Many computational problems consider memory throughput a performance bottleneck, especially in the domain of parallel computing. Software needs to be attuned to hardware features like cache architectures or concurrent memory banks to reach a decent level of performance efficiency. This can be achieved by selecting the right memory layouts for data structures or changing the order of data structure traversal. In this work, we present an abstraction for traversing a set of regular data structures (e.g., multidimensional arrays) that allows the design of traversal-agnostic algorithms. Such algorithms can easily optimize for memory performance and employ semi-automated parallelization or autotuning without altering their internal code. We also add an abstraction for autotuning that allows defining tuning parameters in one place and removes boilerplate code. The proposed solution was implemented as an extension of the Noarr library that simplifies a layout-agnostic design of regular data structures. It is implemented entirely using C++ template meta-programming without any nonstandard dependencies, so it is fully compatible with existing compilers, including CUDA NVCC or Intel DPC++. We evaluate the performance and expressiveness of our approach on the Polybench-C benchmarks.

关键词： Regular data structure Traversal Plain C plus plus parallel programming Code optimization Autotuning

来源：评论

学校读者我要写书评

暂无评论

Performant Portable OpenMP 2022

Performant Portable OpenMP

引用

31st ACM SIGPLAN International Conference on Compiler Construction (CC)

作者： Ozen, Guray Wolfe, Michael NVIDIA Corp Berlin Germany NVIDIA Corp Hillsboro OR USA

ISBN: (纸本)9781450391832

Accelerated computing has increased the need to specialize how a program is parallelized depending on the target. Fully exploiting a highly parallel accelerator, such as a GPU, demands more parallelism and sometimes more levels of parallelism than a multicore CPU. OpenMP has a directive for each level of parallelism, but choosing directives for each target can incur a significant productivity cost. We argue that using the new OpenMP loop directive with an appropriate compiler decision process can achieve the same performance benefits of target-specific parallelization with the productivity advantage of a single directive for all targets. In this paper, we introduce a fully descriptive model and demonstrate its benefits with an implementation of the loop directive, comparing performance, productivity, and portability against other production compilers using the SPEC ACCEL benchmark suite. We provide an implementation of our proposal in NVIDIA's HPC compiler. It yields up to 56X speedup and an average of 1.91x-1.79x speedup compared to the baseline performance (depending on the host system) on GPUs, and preserves CPU performance. In addition, our proposal requires 60% fewer parallelism directives.

关键词： Compilers parallel programming GPUs OpenMP

来源：评论

学校读者我要写书评

暂无评论

Developing performance portable plasma edge simulations: A survey

引用

COMPUTER PHYSICS COMMUNICATIONS 2024年 298卷

作者： Wright, Steven A. Ridgers, Christopher P. Mudalige, Gihan R. Lantra, Zaman Williams, Josh Sunderland, Andrew Thorne, H. Sue Arter, Wayne Univ York Dept Comp Sci York YO10 5GH North Yorkshire England Univ York York Plasma Inst York YO10 5DQ North Yorkshire England Univ Warwick Dept Comp Sci Coventry CV4 7AL Warwickshire England Scitech Daresbury Hartree Ctr STFC Daresbury Lab Warrington WA4 4AD England UK Atom Energy Author Culham Sci Ctr Abingdon OX14 3DB Oxfordshire England

Heterogeneous architectures are increasingly common in modern High -Performance Computing (HPC) systems. Achieving high-performance on such heterogeneous systems requires new approaches to application development that are able to achieve the three Ps: Performance, Portability, and Productivity. In this paper, we provide an overview of the state-of-the-art for developing high-performance, portable and productive multi -physics applications with particular focus on the simulation of a plasma fusion reactor. Simulating such a complex system relies on both fluid- and particle -based simulations, and coupling interfaces between these two domains. We also review the current state-of-the-art in reasoning about the performance, portability and productivity of HPC applications.

关键词： High-performance parallel programming Portability Coupling Plasma simulation Reactor design

来源：评论

学校读者我要写书评

暂无评论

Efficient heterogeneous programming with FPGAs using the Controller model

引用

JOURNAL OF SUPERCOMPUTING 2021年第12期77卷 13995-14010页

作者： Rodriguez-Canal, Gabriel Torres, Yuri Andujar, Francisco J. Gonzalez-Escribano, Arturo Univ Edinburgh Bayes Ctr 47 Potterrow Edinburgh EH8 9BT Midlothian Scotland Univ Valladolid Dept Informat Escuela Ingn Informat Campus Miguel Delibes S-N Valladolid 47011 Spain

The Controller model is a heterogeneous parallel programming model implemented as a library. It transparently manages the coordination, communication and kernel launching details on different heterogeneous computing devices. It exploits native or vendor specific programming models and compilers, such as OpenMP, CUDA or OpenCL, thus enabling the potential performance obtained by using them. This work discusses the integration of FPGAs in the Controller model, using high-level synthesis tools and OpenCL. A new Controller backend for FPGAs is presented based on a previous OpenCL backend for GPUs. We discuss new configuration parameters for FPGA kernels and key ideas to adapt the original OpenCL backend while maintaining the portability of the original model. We present an experimental study to compare performance and development effort metrics obtained with the Controller model, Intel oneAPI and reference codes directly programmed with OpenCL. The results show that using the Controller library has advantages and drawbacks compared with Intel oneAPI, while compared with OpenCL it highly reduces the programming effort with negligible performance overhead.

关键词： parallel programming FPGA OpenCL Heterogeneous computing

来源：评论

学校读者我要写书评

暂无评论

High Performance Computing for Power Flow Analysis: A Way Forward for Indian Power Sector 22

High Performance Computing for Power Flow Analysis: A Way Fo...

引用

22nd National Power Systems Conference (NPSC)

作者： Jain, Jainendra Sudhakar, Palem Benny Mohan, Katta Jagan Kumar, Senthil R. Bapu, Bindhumadhava S. Real Time Syst Grp Ctr Dev Adv Comp Bengaluru India

ISBN: (纸本)9781665462020

With the advent of renewable energy, smart grids, and cutting-edge measurement technologies, modern power systems are becoming more complex. As a result, analyzing modern power systems requires more computational power. High Performance Computing (HPC) is the most viable option for meeting this demand. In India's power sector, the use of HPC is minimal. Hence, we are introducing HPC-based power flow analysis, which is the highly used application to analysis the system. The paper demonstrates the importance of HPC for power flow analysis. This paper also discusses a modified Gaussian Elimination method to utilize the sparse nature of Jacobian matrix to speedup the computation. Open-Multi Processing (OpenMP) is used to implement parallel computing. parallel power flow analysis is simulated on the Central Processing Unit (CPU) and Graphics Processing Unit (GPU) nodes of the C-DAC's PARAM Utkarsh supercomputer for various power system networks. The speedup obtained with HPC for the Polish 9241 bus network is 216.14 times the sequential computation.

关键词： Power systems power flow analysis high performance computing parallel programming openmp newton raphson method

来源：评论

学校读者我要写书评

暂无评论

Real-time Low Vision Simulation in Mixed Reality 16

Real-time Low Vision Simulation in Mixed Reality

引用

16th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS)

作者： Acevedo, Valeria Colantoni, Philippe Dinet, Eric Tremeau, Alain Univ Jean Monnet Univ Lyon Lab Hubert Curien UMR 5516 St Etienne France

ISBN: (纸本)9781665464956

Visual impairments are a global health issue with profound socioeconomic ramifications in both the developing and the developed world. There exist ongoing research projects, that aim to investigate the influence of light in the perception of low vision individuals. But as of today, there is neither clear knowledge nor extensive data regarding the influence of light in low vision situations. This research will address these issues by introducing a methodology and a system to simulate visual impairments. A pipeline based on eye anatomy coupled with real-time image processing algorithms allows to dynamically simulate low vision specific characteristics of selected impairments in mixed reality. An original new approach based on massively parallelized processing combined with an efficient modeling of eye refractive errors aims to improve the accuracy of the low vision simulation.

关键词： mixed reality low-vision simulation parallel programming image processing

来源：评论

学校读者我要写书评

暂无评论

Establishing the Integro-Differential Scheme as an Unsteady Navier-Stokes Solver

Establishing the Integro-Differential Scheme as an Unsteady ...

引用

作者： Feng, Dehua North Carolina Agricultural and Technical State University

学位级别：Ph.D., Doctor of Philosophy

Computational fluid dynamics (CFD) has emerged as a very important scientific and engineering research tool of the 21st century. At its core is the use of numerical methods and data structures to represent and predict 'real-world' physics of a given fluid flow-fields. This is accomplished by applying the numerical form of the Navier-Stokes equations on modern computer platforms. The use and significance of CFD as a research and development tool has gained momentum in the field of Aerospace Engineering. CFD has and can be used to understand fluid behavior over a wide range of flow conditions, ranging from simple to extreme. While it may be feasible to conduct simple fluid flow experiments to understand the fluid flow fields; experiments designed to understand complex fluid fields are less feasible, difficult to set up and often very costly. Computational Fluid Dynamics has empowered today’s scientist and engineers with the ability represent the complex air conditions in high altitude, which is difficult to achieve with a physical experiment. Although there have been significant developments with CFD methods there still remains several challenges. Among these is the fact that with current CFD methods it is difficult to predict transition to turbulence. The propose of this research effort is to improve both the efficiency and accuracy of CFD tools. This will be accomplished by focusing on developing a robust and accurate numerical scheme that is capable of solving the Navier-Stokes Equations under a wide variety of fluid flow fields. A well-established scheme, which was initially described and referred as the Integro-Differential Scheme (IDS), is developed based on a unique combination of differential and integral forms of the complete Navier-Stocks Equations. In IDS scheme, integration form of Navier-Stocks Equations will be applied based on assumptions and used for explicit time marching. The IDS procedure confirms its predictive capability and supports its potential

关键词： Computational fluid dynamics Integro-differential scheme MEDTA parallel programming Time stepping

来源：评论

学校读者我要写书评

暂无评论

Configuration of parallel Real-Time Applications on Multi-Core Processors 20

Configuration of Parallel Real-Time Applications on Multi-Co...

引用

20th IEEE International Conference on Industrial Informatics (INDIN)

作者： Gharajeh, Mohammad Samadi Carvalho, Tiago Pinho, Luis Miguel Polytech Inst Porto Sch Engn Porto Portugal

ISBN: (数字)9781728175683

ISBN: (纸本)9781728175683

parallel programming models (e.g., OpenMP) are more and more used to improve the performance of real-time applications in modern processors. Nevertheless, these processors have complex architectures, being very difficult to understand their timing behavior. The main challenge with most of existing works is that they apply static timing analysis for simpler models or measurement-based analysis using traditional platforms (e.g., single core) or considering only sequential algorithms. How to provide an efficient configuration for the allocation of the parallel program in the computing units of the processor is still an open challenge. This paper studies the problem of performing timing analysis on complex multi-core platforms, pointing out a methodology to understand the applications' timing behavior, and guide the configuration of the platform. As an example, the paper uses an OpenMP-based program of the Heat benchmark on a NVIDIA Jetson AGX Xavier. The main objectives are to analyze the execution time of OpenMP tasks, specify the best configuration of OpenMP directives, identify critical tasks, and discuss the predictability of the system/application. A Linux perf based measurement tool, which has been extended by our team, is applied to measure each task across multiple executions in terms of total CPU cycles, the number of cache accesses, and the number of cache misses at different cache levels, including L1, L2 and L3. The evaluation process is performed using the measurement of the performance metrics by our tool to study the predictability of the system/application.

关键词： real-time systems multi-core processors timing analysis parallel programming OpenMP

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：