检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11 篇 会议
2 册 图书
1 篇 期刊文献

馆藏范围

14 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7 篇 工学
- 7 篇 计算机科学与技术...
- 6 篇 软件工程

主题

3 篇 performance port...
3 篇 gpu
2 篇 openacc
2 篇 cuda
2 篇 v100
2 篇 programming tech...
2 篇 programming lang...
2 篇 hpc
2 篇 computer systems...
1 篇 emerging archite...
1 篇 conferences
1 篇 performance
1 篇 mi50
1 篇 parallelism
1 篇 c plus plus meta...
1 篇 thunderx2
1 篇 cfd
1 篇 simulation
1 篇 operating system...
1 篇 programming prod...

机构

2 篇 university of de...
2 篇 univ delaware ne...
1 篇 lawrence berkele...
1 篇 sandia natl labs...
1 篇 huawei zurich re...
1 篇 codeplay softwar...
1 篇 brookhaven natl ...
1 篇 oak ridge nation...
1 篇 univ bristol dep...
1 篇 rwth aachen univ...
1 篇 lawrence berkele...
1 篇 lawrence berkele...
1 篇 rhein westfal th...
1 篇 oak ridge natl l...
1 篇 oak ridge natl l...
1 篇 oak ridge natl l...
1 篇 barcelona superc...
1 篇 rhein westfal th...
1 篇 oak ridge nation...
1 篇 old dominion uni...

作者

2 篇 daley christophe...
2 篇 denny joel
2 篇 sunita chandrase...
2 篇 guido juckeland
2 篇 chandrasekaran s...
1 篇 terboven christi...
1 篇 vergara verónica...
1 篇 gonzalez-tallada...
1 篇 goli mehdi
1 篇 pophale swaroop
1 篇 sridutt bhalacha...
1 篇 mueller matthias...
1 篇 huber thomas
1 篇 pflug hans joach...
1 篇 jeffrey s. vette...
1 篇 wright nicholas ...
1 篇 horta daniel
1 篇 de gonzalo simon...
1 篇 hahnfeld jonas
1 篇 pena antonio j.

语言

14 篇 英文

检索条件"任意字段=9th Workshop on Accelerator Programming Using Directives, WACCPD 2022"

共 14 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Proceedings of waccpd 2022: 9th workshop on accelerator programming using directives, Held in conjunction with SC 2022: the International Conference for High Performance Computing, Networking, Storage and Analysis

Proceedings of WACCPD 2022: 9th Workshop on Accelerator Prog...

引用

9th workshop on accelerator programming using directives, waccpd 2022

ISBN: (纸本)9781665490191

the proceedings contain 6 papers. the topics discussed include: analysis of validating and verifying OpenACC compilers 3.0 and above;OmpSs-2 and OpenACC interoperation;extending MAGMA portability with OneAPI;KokkACC: enhancing Kokkos with OpenACC;SPEL: software tool for porting E3SM land model with OpenACC in a function unit test framework;and GPU-accelerated sparse matrix vector product based on element-by-element method for unstructured FEM using OpenACC.

关键词：

来源：评论

学校读者我要写书评

暂无评论

8th workshop on accelerator programming using directives, waccpd 2021

8th Workshop on Accelerator Programming using Directives, WA...

引用

8th workshop on accelerator programming using directives, waccpd 2021

ISBN: (纸本)9783030977580

the proceedings contain 7 papers presendted at a virtual meeting. the special focus in this conference is on accelerator programming using directives. the topics include: GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP;accelerating Quantum Many-Body Configuration Interaction with directives;challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading;GPU Porting of Scalable Implicit Solver with Green’s Function-Based Neural Networks by OpenACC;Extending OpenMP for Machine Learning-Driven Adaptation.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Analysis of Validating and Verifying OpenACC Compilers 3.0 and Above 9

Analysis of Validating and Verifying OpenACC Compilers 3.0 a...

引用

9th workshop on accelerator programming using directives (waccpd)

作者： Jarmusch, Aaron Liu, Aaron Munley, Christian Horta, Daniel Ravichandran, Vaidhyanathan Denny, Joel Friedline, Kyle Chandrasekaran, Sunita Univ Delaware Newark DE 19716 USA Oak Ridge Natl Lab Oak Ridge TN USA

ISBN: (纸本)9781665490191

OpenACC is a high-level directive-based parallel programming model that can manage the sophistication of heterogeneity in architectures and abstract it from the users. the portability of the model across CPUs and accelerators has gained the model a wide variety of users. this means it is also crucial to analyze the reliability of the compilers' implementations. To address this challenge, the OpenACC Validation and Verification team has proposed a validation testsuite to verify the OpenACC implementations across various compilers with an infrastructure for a more streamlined execution. this paper will cover the following aspects: (a) the new developments since the last publication on the testsuite, (b) outline the use of the infrastructure, (c) discuss tests that highlight our workflow process, (d) analyze the results from executing the testsuite on various systems, and (e) outline future developments.

关键词： Performance programming Model Testsuite Validation Conformance

来源：评论

学校读者我要写书评

暂无评论

OmpSs-2 and OpenACC Interoperation 9

OmpSs-2 and OpenACC Interoperation

引用

9th workshop on accelerator programming using directives (waccpd)

作者： Korakitis, Orestis de Gonzalo, Simon Garcia Guidotti, Nicolas Barreto, Joao Monteiro, Jose Pena, Antonio J. Huawei Zurich Res Ctr Comp Syst Lab Zurich Switzerland Sandia Natl Labs Albuquerque NM USA Univ Lisbon INESC ID Lisbon Portugal Barcelona Supercomp Ctr BSC Barcelona Spain

ISBN: (纸本)9781665490191

We propose an interoperation mechanism to enable novel composability across pragma-based programming models. We study and propose a clear separation of duties and implement our approach by augmenting the OmpSs-2 programming model, compiler and runtime system to support OmpSs-2 + OpenACC programming. To validate our proposal we port ZPIC, a kinetic plasma simulator, to leverage our hybrid OmpSs-2 + OpenACC implementation. We compare our approach against OpenACC versions of ZPIC on a multi-GPU HPC system. We show that our approach manages to provide automatic asynchronous and multi-GPU execution, removing significant burden from the application's developer, while also being able to outperform manually programmed versions, thanks to a better utilization of the hardware.

关键词： Code Transformation Data Flow Paradigm GPU Parallelism programming Productivity Runtime Scheduling Task based

来源：评论

学校读者我要写书评

暂无评论

KokkACC: Enhancing Kokkos with OpenACC 9

KokkACC: Enhancing Kokkos with OpenACC

引用

9th workshop on accelerator programming using directives (waccpd)

作者： Valero-Lara, Pedro Lee, Seyong Gonzalez-Tallada, Marc Denny, Joel Vetter, Jeffrey S. Oak Ridge Natl Lab ORNL Oak Ridge TN 37830 USA

ISBN: (纸本)9781665490191

Template metaprogramming is gaining popularity as a high-level solution for achieving performance portability on heterogeneous computing resources. Kokkos is a representative approach that offers programmers high-level abstractions for generic programming while most of the device-specific code generation and optimizations are delegated to the compiler through template specializations. For this, Kokkos provides a set of device-specific code specializations in multiple back ends, such as CUDA and HIP. Unlike CUDA or HIP, OpenACC is a high-level and directive-based programming model. this descriptive model allows developers to insert hints (pragmas) into their code that help the compiler to parallelize the code. the compiler is responsible for the transformation of the code, which is completely transparent to the programmer. this paper presents an OpenACC back end for Kokkos: KokkACC. As an alternative to Kokkos's existing device-specific back ends, KokkACC is a multi-architecture back end providing a high-productivity programming environment enabled by OpenACC's high-level and descriptive programming model. Moreover, we have observed competitive performance;in some cases, KokkACC is faster (up to 9x) than NVIDIA's CUDA back end and much faster than OpenMP's GPU offloading back end. this work also includes implementation details and a detailed performance study conducted with a set of mini-benchmarks (AXPY and DOT product) and three mini-apps (LULESH, miniFE and SNAP, a LAMMPS proxy mini-app).

关键词： OpenACC C plus plus Metaprogramming Kokkos CUDA OpenMP Target Parallel programming Models

来源：评论

学校读者我要写书评

暂无评论

Message from the waccpd22 workshop Chairs

Proceedings of WACCPD 2022: 9th Workshop on Accelerator Prog...

引用

Proceedings of waccpd 2022: 9th workshop on accelerator programming using directives, Held in conjunction with SC 2022: the International Conference for High Performance Computing, Networking, Storage and Analysis 2022年 V-VI页

作者： Daley, Christopher Díaz, José M. Monsalve Vergara, Verónica G. Melesse Lawrence Berkeley National Laboratory United States Argonne National Laboratory United States Oak Ridge National Laboratory United States

来源：评论

学校读者我要写书评

暂无评论

Achieving Near-Native Runtime Performance and Cross-Platform Performance Portability for Random Number Generation through SYCL Interoperability 8th

Achieving Near-Native Runtime Performance and Cross-Platform...

引用

8th International workshop on accelerator programming using directives (waccpd)

作者： Pascuzzi, Vincent R. Goli, Mehdi Brookhaven Natl Lab Upton NY 11973 USA Codeplay Software Ltd Edinburgh EH3 9DR Midlothian Scotland

ISBN: (纸本)9783030977597;9783030977580

High-performance computing (HPC) is a major driver accelerating scientific research and discovery, from quantum simulations to medical therapeutics. While the increasing availability of HPC resources is in many cases pivotal to successful science, even the largest collaborations lack the computational expertise required for maximal exploitation of current hardware capabilities. the need to maintain multiple platformspecific codebases further complicates matters, potentially adding constraints on machines that can be utilized. Fortunately, numerous programming models are under development that aim to facilitate portable codes for heterogeneous computing. One in particular is SYCL, an open standard, C++-based single-source programming paradigm. Among the new features available in the most recent specification, SYCL 2020, is interoperability, a mechanism through which applications and third-party libraries coordinate sharing data and execute collaboratively. In this paper, we leverage the SYCL programming model to demonstrate cross-platform performance portability across heterogeneous resources. We detail our NVIDIA and AMD random number generator extensions to the oneMKL open-source interfaces library. Performance portability is measured relative to platform-specific baseline applications executed on four major hardware platforms using two different compilers supporting SYCL. the utility of our extensions are exemplified in a real-world setting via a high-energy physics simulation application. We show the performance of implementations that capitalize on SYCL interoperability are at par with native implementations, attesting to the cross-platform performance portability of a SYCL-based approach to scientific codes.

关键词： performance portability HPC SYCL random number generators high energy physics simulation

来源：评论

学校读者我要写书评

暂无评论

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs 7th

Performance Assessment of OpenMP Compilers Targeting NVIDIA ...

引用

7th International workshop on accelerator programming using directives (waccpd)

作者： Davis, Joshua Hoke Daley, Christopher Pophale, Swaroop Huber, thomas Chandrasekaran, Sunita Wright, Nicholas J. Univ Delaware Newark DE 19716 USA Lawrence Berkeley Natl Lab Natl Energy Res Sci Comp Ctr Berkeley CA 94720 USA Oak Ridge Natl Lab Oak Ridge TN 37830 USA

ISBN: (纸本)9783030742232;9783030742249

Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today's systems to tomorrow's. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. this work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other;a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x speedup for the su3 proxy application on NERSC's Cori system when using the Clang compiler, and a 15.7x speedup by switching max reductions to add reductions in the laplace mini-app when using the Cray-llvm compiler on Cori.

关键词： Directive-based programming Performance portability Heterogeneous systems OpenMP GPU NVIDIA V100

来源：评论

学校读者我要写书评

暂无评论

KokkACC: Enhancing Kokkos with OpenACC

KokkACC: Enhancing Kokkos with OpenACC

引用

workshop on accelerator programming using directives (waccpd)

作者： Pedro Valero-Lara Seyong Lee Marc Gonzalez-Tallada Joel Denny Jeffrey S. Vetter Oak Ridge National Laboratory (ORNL)

ISBN: (纸本)9781665490207

Template metaprogramming is gaining popularity as a high-level solution for achieving performance portability on heterogeneous computing resources. Kokkos is a representative approach that offers programmers high-level abstractions for generic programming while most of the device-specific code generation and optimizations are delegated to the compiler through template specializations. For this, Kokkos provides a set of device-specific code specializations in multiple back ends, such as CUDA and HIP. Unlike CUDA or HIP, OpenACC is a high-level and directive-based programming model. this descriptive model allows developers to insert hints (pragmas) into their code that help the compiler to parallelize the code. the compiler is responsible for the transformation of the code, which is completely transparent to the programmer. this paper presents an OpenACC back end for Kokkos: KokkACC. As an alternative to Kokkos’s existing device-specific back ends, KokkACC is a multi-architecture back end providing a high-productivity programming environment enabled by OpenACC’s high-level and descriptive programming model. Moreover, we have observed competitive performance; in some cases, KokkACC is faster (up to 9×) than NVIDIA’s CUDA back end and much faster than OpenMP’s GPU offloading back end. this work also includes implementation details and a detailed performance study conducted with a set of mini-benchmarks (AXPY and DOT product) and three mini-apps (LULESH, miniFE and SNAP, a LAMMPS proxy mini-app).

关键词： Performance evaluation Codes Parallel programming Conferences Graphics processing units US Department of Transportation Heterogeneous networks

来源：评论

学校读者我要写书评

暂无评论

Performance and Portability of a Linear Solver Across Emerging Architectures 7th

Performance and Portability of a Linear Solver Across Emergi...

引用

7th International workshop on accelerator programming using directives (waccpd)

作者： Walden, Aaron C. Zubair, Mohammad Nielsen, Eric J. NASA Langley Res Ctr Hampton VA 23665 USA Old Dominion Univ Norfolk VA USA

ISBN: (纸本)9783030742232;9783030742249

A linear solver algorithm used by a large-scale unstructured-grid computational fluid dynamics application is examined for a broad range of familiar and emerging architectures. Efficient implementation of a linear solver is challenging on recent CPUs offering vector architectures. Vector loads and stores are essential to effectively utilize available memory bandwidth on CPUs, and maintaining performance across different CPUs can be difficult in the face of varying vector lengths offered by each. A similar challenge occurs on GPU architectures, where it is essential to have coalesced memory accesses to utilize memory bandwidth effectively. In this work, we demonstrate that restructuring a computation, and possibly data layout, with regard to architecture is essential to achieve optimal performance by establishing a performance benchmark for each target architecture in a low level language such as vector intrinsics or CUDA. In doing so, we demonstrate how a linear solver kernel can be mapped to Intel((R)) Xeon (TM) and Xeon Phi (TM), Marvell((R)) thunderX2((R)), NEC (R) SX-Aurora (TM) TSUBASA Vector Engine, and NVIDIA((R)) and AMD((R)) GPUs. We further demonstrate that the required code restructuring can be achieved in higher level programming environments such as OpenACC, OCCA, and Intel((R)) OneAPI (TM)/SYCL, and that each generally results in optimal performance on the target architecture. Relative performance metrics for all implementations are shown, and subjective ratings for ease of implementation and optimization are suggested.

关键词： programming models Performance portability Emerging architecture CFD HPC CUDA OpenACC OCCA AVX-512 intrinsics Neon intrinsics Arm GPU V100 A100 MI50 Xeon Phi SX-Aurora thunderX2

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共2页 << < 1 2 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：