检索结果-内蒙古大学图书馆

A Taxonomy of Modern GPGPU programming Methods: On the Benefits of a Unified Specification

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2022年第6期41卷 1649-1662页

作者： Capodieci, Nicola Cavicchioli, Roberto Marongiu, Andrea Univ Modena & Reggio Emilia Dept Phys Informat & Math I-41121 Modena Italy

Several application programming interfaces (APIs) and frameworks have been proposed to simplify the development of general-purpose GPU (GPGPU) applications. GPGPU application development typically involves specific customization for the target operating systems and hardware devices. The effort to port applications from one API to the other (or to develop multitarget applications) is complicated by the availability of a plethora of specifications, which in essence offers very similar underlying functionality. In this work we provide an in-depth study of six state-of-the-art GPGPU APIs. From these we derive a taxonomy of the common semantics and propose a unified specification. We describe a methodology to translate this unified specification into different target APIs. This simplifies cross-platform application development and provides a clean framework for benchmarking. Our proposed unified specification is called GPGPU unified specification and translation (GUST) and it captures common functionality found in compute-only APIs (e.g., compute unified device architecture and open computing language), in the compute pipeline of traditional graphic-oriented APIs (e.g., open graphic language and Direct3D11) and in last-generation bare-metal APIs (e.g., Vulkan and Direct3D12). The proposed translation methodology solves differences between specific APIs in a transparent manner, without hiding available tuning knobs for compute kernel optimizations and fostering best programming practices in a simple manner.

关键词： Graphics processing units programming Hardware Kernel Standards Semantics Encoding General-purpose GPU (GPGPU) parallel programming tools

来源：评论

学校读者我要写书评

暂无评论

High-Level Performance Modeling of Task-Based Algorithms A Blueprint for Understanding the Performance of TBB Algorithms

High-Level Performance Modeling of Task-Based Algorithms A B...

引用

IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

作者： Alexandrov, Alexei Armstrong, Douglas Rajic, Hrabri Voss, Michael Hayes, Donald Intel Corp Performance Anal & Threading Lab Santa Clara CA 95051 USA

ISBN: (纸本)9781424460229

Performing modeling and visualization of task-based parallel algorithms is challenging. Libraries such as Intel Threading Building Blocks (TBB) and Microsoft's parallel Patterns Library provide high-level algorithms that are implemented using low-level tasks. Current tools present performance at this lower level. Developers like to tune and debug at the same level as the coding abstraction, so in this paper we propose tools and a two step methodology that target this level of abstraction. In the first step, the system level metrics of utilization and overhead are collected to determine if performance is acceptable. If a problem is suspected, the second step of our methodology projects these metrics on to the algorithms contained in the application. Using these projections many common performance issues can be quickly diagnosed. We demonstrate our methodology using a prototype implementation that is integrated with the Intel Threading Building Blocks library. We show the flexibility of the approach by analyzing three applications, including a client-server benchmark that uses a parallel_for nested within a parallel pipeline.

关键词： Intel (R) Threading Building Blocks task-based algorithms parallel programming tools

来源：评论

学校读者我要写书评

暂无评论

A Classification-Based Approach to Fault-Tolerance Support in parallel Programs

A Classification-Based Approach to Fault-Tolerance Support i...

引用

10th International Conference on parallel and Distributed Computing, Applications and Technologies

作者： Jakadeesan, Gopinatha Goswami, Dhrubajyoti Concordia Univ Dept Comp Sci & Software Engn Montreal PQ Canada

ISBN: (纸本)9781424452910

Fault tolerance is an important requirement for long-running parallel programs. This paper presents a different approach to fault-tolerance support in message-passing parallel programs based on their structural and behavioral characteristics, commonly known as patterns. A classification of these patterns and their applicable fault-tolerance strategies is aimed to facilitate an application developer to incorporate appropriate fault-tolerance strategies to an application. Fault-tolerance strategies for two of the patterns are discussed, and one specific strategy is elaborated and analyzed. The presented strategies have been incorporated into a fault-tolerance support framework called FT-PAS. One objective of the framework is to separate the fault tolerance related details from an application developer's main objectives (separation-of-concerns). The paper presents the additional key features of the framework, and concludes with a discussion on current and future research directions.

关键词： Fault tolerance parallel programming tools design patterns

来源：评论

学校读者我要写书评

暂无评论

Monitoring cache behavior on parallel SMP architectures and related programming tools

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE 2005年第8期21卷 1298-1311页

作者： Brandes, T Schwamborn, H Gerndt, M Jeitner, J Kereku, E Schulz, M Brunst, H Nagel, W Neumann, R Müller-Pfefferkorn, R Trenkler, B Karl, W Tao, J Hoppe, HC Fraunhofer Gesellsch FhG Inst Algorithmen & Wissensch Rechnen SCAI D-53754 At Augustin Germany Tech Univ Munich LRR D-85748 Garching Germany Tech Univ Dresden Zentrum Hochleistungsrechnen ZHR D-01062 Dresden Germany Univ Karlsruhe Inst Rechnerentwurf & Fehlertoleranz D-76128 Karlsruhe Germany Intel GmbH Software & Solut Grp D-50321 Bruhl Germany

This paper describes the ideas and developments of the project EP-CACHE. Within this project new methods and tools are developed to improve the analysis and the optimization of programs for cache architectures, especially for SMP clusters. The tool set comprises the semi-automatic instrumentation of user programs, the monitoring of the cache behavior, the visualization of the measured data, and optimization techniques for improving the user program for better cache usage. As current hardware performance counters do not give sufficient user relevant information, new hardware monitors are designed that provide more detailed information about the cache utilization related to the data structures and code blocks in the user program. The expense of the hardware and software realization will be assessed to minimize the risk of a real implementation of the investigated monitors. The usefulness of the hardware monitors is evaluated by a cache simulator. (c) 2004 Published by Elsevier B.V.

关键词： hardware cache monitoring performance analysis cache optimizations parallel programming tools SMP cluster

来源：评论

学校读者我要写书评

暂无评论

Monitoring cache behavior on parallel SMP architectures and related programming tools

Monitoring cache behavior on parallel SMP architectures and ...

引用

European Grid Conference

关键词： hardware cache monitoring performance analysis cache optimizations parallel programming tools SMP cluster

来源：评论

学校读者我要写书评

暂无评论

Correctness checking of MPI one-sided communication using marmot

Correctness checking of MPI one-sided communication using ma...

引用

13th European parallel-Virtual-Machine-and-Message-Passing-Interface-Users-Group Meeting (PVM/MPI)

作者： Krammer, Bettina Resch, Michael M. Ctr High Performance Comp D-70550 Stuttgart Germany

ISBN: (纸本)354039110X

The MPI-2 standard defines functions for Remote Memory Access (RMA) by allowing one process to specify all communication parameters both for the sending and the receiving side, which is also referred to as one-sided communication. Having experienced parallel programming as a complex and error-prone task, we have developed the MPI correctness checking tool MARMOT covering the MPI-1.2 standard and are now aiming at extending it to support application developers also for the more frequently used parts of MPI-2 such as one-sided communication. In this paper we describe our tool, which is designed to check the correct usage of the MPI API automatically at run-time, and we also analyse to what extent it is possible to do so for RMA.

关键词： MPI parallel programming tools analysis one-sided communication RMA

来源：评论

学校读者我要写书评

暂无评论

Tool gear: Infrastructure for parallel tools

Tool gear: Infrastructure for parallel tools

引用

International Conference on parallel and Distributed Processing Techniques and Applications

作者： May, J Gyllenhaal, J Lawrence Livermore Natl Lab Livermore CA 94550 USA

ISBN: (纸本)1892512459

Tool Gear is a software infrastructure for developing performance analysis and other tools. Unlike existing integrated toolkits, which focus on providing a suite of capabilities, Tool Gear is designed to help toot developers create new tools quickly. It combines dynamic instrumentation capabilities with an efficient database and a sophisticated and extensible graphical user interface. This paper describes the design of Tool Gear and presents examples of tools that have been built with it.

关键词： parallel programming tools performance analysis automatic instrumentation tool infrastructure

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2001年第4期61卷 520-535页

作者： Kalinov, A Lastovetsky, A Russian Acad Sci Inst Syst Programming Moscow 109004 Russia

This paper presents and analyzes two different strategies of heterogeneous distribution of computations solving dense linear algebra problems on heterogeneous networks of computers. The first strategy is based on heterogeneous distribution of processes over processors and homogeneous block cyclic distribution of data over the processes. The second is based on homogeneous distribution of processes over processors and heterogeneous block cyclic distribution of data over the processes. Both strategies were implemented in the mpC language-a dedicated parallel extension of ANSI C for efficient and portable programming of heterogeneous networks of computers. The first strategy was implemented using calls to ScaLAPACK;the second strategy was implemented with calls to LAPACK and BLAS. Cholesky factorization on a heterogeneous network of workstations is used to demonstrate that the heterogeneous distributions have an advantage over the traditional homogeneous distribution (C) 2001 Academic Press.

关键词： parallel programming tools parallel linear algebra software ScaLAPACK heterogeneous computing parallel languages

来源：评论

学校读者我要写书评

暂无评论

Performance tools for today's HPC: Are we addressing the right issues?

引用

parallel COMPUTING 2001年第11期27卷 1403-1415页

作者： Pancake, CM Oregon State Univ NW Alliance Computat Sci & Engn Corvallis OR 97331 USA

High-performance computing (HPC) application developers can no longer afford the luxury of programming to just one platform. The usefulness and longevity of software now depend on portability as well as performance. This paper examines the appropriateness of existing tools in developing and tuning applications that must be both efficient and portable. A series of design tradeoffs have been made, with the result that today's tools are slanted toward one goal or the other but do not respond adequately to both. These tradeoffs are explored through a series of examples drawn from current tools. Suggestions are presented for how tools might better support the evolving needs of HPC programmers. (C) 2001 Elsevier Science B.V. All rights reserved.

关键词： program portability performance analysis tools parallel programming tools

来源：评论

学校读者我要写书评

暂无评论

Relative debugging for data parallel programs: A ZPL case study

引用

IEEE CONCURRENCY 2000年第4期8卷 42-52页

作者： Watson, G Abramson, D Monash Univ Clayton Vic 3168 Australia

Relative debugging is a powerful paradigm that lets us locate errors in programs that result from porting or rewriting code. The authors describe their experience using relative debugging to compare a program written ... 详细信息

关键词： Debugging parallel programming tools Program Development Data parallel Language ZPL Relative Debugging

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：