检索结果-内蒙古大学图书馆

Monitoring cache behavior on parallel SMP architectures and related programming tools

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE 2005年第8期21卷 1298-1311页

作者： Brandes, T Schwamborn, H Gerndt, M Jeitner, J Kereku, E Schulz, M Brunst, H Nagel, W Neumann, R Müller-Pfefferkorn, R Trenkler, B Karl, W Tao, J Hoppe, HC Fraunhofer Gesellsch FhG Inst Algorithmen & Wissensch Rechnen SCAI D-53754 At Augustin Germany Tech Univ Munich LRR D-85748 Garching Germany Tech Univ Dresden Zentrum Hochleistungsrechnen ZHR D-01062 Dresden Germany Univ Karlsruhe Inst Rechnerentwurf & Fehlertoleranz D-76128 Karlsruhe Germany Intel GmbH Software & Solut Grp D-50321 Bruhl Germany

This paper describes the ideas and developments of the project EP-CACHE. Within this project new methods and tools are developed to improve the analysis and the optimization of programs for cache architectures, especially for SMP clusters. The tool set comprises the semi-automatic instrumentation of user programs, the monitoring of the cache behavior, the visualization of the measured data, and optimization techniques for improving the user program for better cache usage. As current hardware performance counters do not give sufficient user relevant information, new hardware monitors are designed that provide more detailed information about the cache utilization related to the data structures and code blocks in the user program. The expense of the hardware and software realization will be assessed to minimize the risk of a real implementation of the investigated monitors. The usefulness of the hardware monitors is evaluated by a cache simulator. (c) 2004 Published by Elsevier B.V.

关键词： hardware cache monitoring performance analysis cache optimizations parallel programming tools SMP cluster

来源：评论

学校读者我要写书评

暂无评论

Monitoring cache behavior on parallel SMP architectures and related programming tools

Monitoring cache behavior on parallel SMP architectures and ...

引用

European Grid Conference

关键词： hardware cache monitoring performance analysis cache optimizations parallel programming tools SMP cluster

来源：评论

学校读者我要写书评

暂无评论

Relative debugging for data parallel programs: A ZPL case study

引用

IEEE CONCURRENCY 2000年第4期8卷 42-52页

作者： Watson, G Abramson, D Monash Univ Clayton Vic 3168 Australia

Relative debugging is a powerful paradigm that lets us locate errors in programs that result from porting or rewriting code. The authors describe their experience using relative debugging to compare a program written ... 详细信息

关键词： Debugging parallel programming tools Program Development Data parallel Language ZPL Relative Debugging

来源：评论

学校读者我要写书评

暂无评论

Finding, expressing and managing parallelism in programs executed on clusters of workstations

引用

COMPUTER COMMUNICATIONS 1999年第11期22卷 998-1016页

作者： Goscinski, AM Deakin Univ Sch Comp & Math Geelong Vic 3217 Australia

The goal of this paper to identify and discuss the basic issues of and solutions to parallel processing on clusters of workstations (COWs). Firstly, identification and expressing parallelism in application programs are discussed. The following approaches to finding and expressing parallelism are characterized: parallel programming languages, parallel programming tools, sequential programming supported by distributed shared memory (DSM), and parallelising compilers. Secondly, efficient management of available parallelism is discussed. As parallel execution requires an efficient management of processes and computational resources, a parallel execution environment proposed here is to be built based on a distributed operating system. This system, in order to allow parallel programs to achieve high performance and transparency, should provide services such as global scheduling, process migration, local and remote process creation, computation coordination, group communication and distributed shared memory. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： parallel programming languages parallel programming tools DSM parallelizing compilers parallelism management distributed operating systems supporting parallelism management

来源：评论

学校读者我要写书评

暂无评论

Compiler-based tools for analyzing parallel programs

引用

parallel COMPUTING 1998年第3-4期24卷 401-420页

作者： Armstrong, B Kim, SW Park, I Voss, M Eigenmann, R Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

In this paper, we present several tools for analyzing parallel programs. The tools are built on top of a compiler infrastructure, which provides advanced capabilities for symbolic program analysis and manipulation. The tools can display characteristics of a program and relate this information to data gathered from instrumented program runs and other performance analysis tools. They also support an interactive compilation scenario, giving the user feedback on how the compilation process performed and how to improve it. We will present case studies demonstrating the tool use. These include the characterization of an industrial application and the study of new compiler techniques and portable parallel languages. (C) 1998 Elsevier Science B.V. All rights reserved.

关键词： parallelizing compilers parallel programming tools integrated tools program characterization performance analysis

来源：评论

学校读者我要写书评

暂无评论

Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2001年第4期61卷 520-535页

作者： Kalinov, A Lastovetsky, A Russian Acad Sci Inst Syst Programming Moscow 109004 Russia

This paper presents and analyzes two different strategies of heterogeneous distribution of computations solving dense linear algebra problems on heterogeneous networks of computers. The first strategy is based on heterogeneous distribution of processes over processors and homogeneous block cyclic distribution of data over the processes. The second is based on homogeneous distribution of processes over processors and heterogeneous block cyclic distribution of data over the processes. Both strategies were implemented in the mpC language-a dedicated parallel extension of ANSI C for efficient and portable programming of heterogeneous networks of computers. The first strategy was implemented using calls to ScaLAPACK;the second strategy was implemented with calls to LAPACK and BLAS. Cholesky factorization on a heterogeneous network of workstations is used to demonstrate that the heterogeneous distributions have an advantage over the traditional homogeneous distribution (C) 2001 Academic Press.

关键词： parallel programming tools parallel linear algebra software ScaLAPACK heterogeneous computing parallel languages

来源：评论

学校读者我要写书评

暂无评论

Performance tools for today's HPC: Are we addressing the right issues?

引用

parallel COMPUTING 2001年第11期27卷 1403-1415页

作者： Pancake, CM Oregon State Univ NW Alliance Computat Sci & Engn Corvallis OR 97331 USA

High-performance computing (HPC) application developers can no longer afford the luxury of programming to just one platform. The usefulness and longevity of software now depend on portability as well as performance. This paper examines the appropriateness of existing tools in developing and tuning applications that must be both efficient and portable. A series of design tradeoffs have been made, with the result that today's tools are slanted toward one goal or the other but do not respond adequately to both. These tradeoffs are explored through a series of examples drawn from current tools. Suggestions are presented for how tools might better support the evolving needs of HPC programmers. (C) 2001 Elsevier Science B.V. All rights reserved.

关键词： program portability performance analysis tools parallel programming tools

来源：评论

学校读者我要写书评

暂无评论

A Taxonomy of Modern GPGPU programming Methods: On the Benefits of a Unified Specification

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2022年第6期41卷 1649-1662页

作者： Capodieci, Nicola Cavicchioli, Roberto Marongiu, Andrea Univ Modena & Reggio Emilia Dept Phys Informat & Math I-41121 Modena Italy

Several application programming interfaces (APIs) and frameworks have been proposed to simplify the development of general-purpose GPU (GPGPU) applications. GPGPU application development typically involves specific customization for the target operating systems and hardware devices. The effort to port applications from one API to the other (or to develop multitarget applications) is complicated by the availability of a plethora of specifications, which in essence offers very similar underlying functionality. In this work we provide an in-depth study of six state-of-the-art GPGPU APIs. From these we derive a taxonomy of the common semantics and propose a unified specification. We describe a methodology to translate this unified specification into different target APIs. This simplifies cross-platform application development and provides a clean framework for benchmarking. Our proposed unified specification is called GPGPU unified specification and translation (GUST) and it captures common functionality found in compute-only APIs (e.g., compute unified device architecture and open computing language), in the compute pipeline of traditional graphic-oriented APIs (e.g., open graphic language and Direct3D11) and in last-generation bare-metal APIs (e.g., Vulkan and Direct3D12). The proposed translation methodology solves differences between specific APIs in a transparent manner, without hiding available tuning knobs for compute kernel optimizations and fostering best programming practices in a simple manner.

关键词： Graphics processing units programming Hardware Kernel Standards Semantics Encoding General-purpose GPU (GPGPU) parallel programming tools

来源：评论

学校读者我要写书评

暂无评论

A Classification-Based Approach to Fault-Tolerance Support in parallel Programs

A Classification-Based Approach to Fault-Tolerance Support i...

引用

10th International Conference on parallel and Distributed Computing, Applications and Technologies

作者： Jakadeesan, Gopinatha Goswami, Dhrubajyoti Concordia Univ Dept Comp Sci & Software Engn Montreal PQ Canada

ISBN: (纸本)9781424452910

Fault tolerance is an important requirement for long-running parallel programs. This paper presents a different approach to fault-tolerance support in message-passing parallel programs based on their structural and behavioral characteristics, commonly known as patterns. A classification of these patterns and their applicable fault-tolerance strategies is aimed to facilitate an application developer to incorporate appropriate fault-tolerance strategies to an application. Fault-tolerance strategies for two of the patterns are discussed, and one specific strategy is elaborated and analyzed. The presented strategies have been incorporated into a fault-tolerance support framework called FT-PAS. One objective of the framework is to separate the fault tolerance related details from an application developer's main objectives (separation-of-concerns). The paper presents the additional key features of the framework, and concludes with a discussion on current and future research directions.

关键词： Fault tolerance parallel programming tools design patterns

来源：评论

学校读者我要写书评

暂无评论

High-Level Performance Modeling of Task-Based Algorithms A Blueprint for Understanding the Performance of TBB Algorithms

High-Level Performance Modeling of Task-Based Algorithms A B...

引用

IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

作者： Alexandrov, Alexei Armstrong, Douglas Rajic, Hrabri Voss, Michael Hayes, Donald Intel Corp Performance Anal & Threading Lab Santa Clara CA 95051 USA

ISBN: (纸本)9781424460229

Performing modeling and visualization of task-based parallel algorithms is challenging. Libraries such as Intel Threading Building Blocks (TBB) and Microsoft's parallel Patterns Library provide high-level algorithms that are implemented using low-level tasks. Current tools present performance at this lower level. Developers like to tune and debug at the same level as the coding abstraction, so in this paper we propose tools and a two step methodology that target this level of abstraction. In the first step, the system level metrics of utilization and overhead are collected to determine if performance is acceptable. If a problem is suspected, the second step of our methodology projects these metrics on to the algorithms contained in the application. Using these projections many common performance issues can be quickly diagnosed. We demonstrate our methodology using a prototype implementation that is integrated with the Intel Threading Building Blocks library. We show the flexibility of the approach by analyzing three applications, including a client-server benchmark that uses a parallel_for nested within a parallel pipeline.

关键词： Intel (R) Threading Building Blocks task-based algorithms parallel programming tools

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：