检索结果-内蒙古大学图书馆

Communication performance optimisation requires minimising variance

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 1999年第3期15卷 453-459页

作者： Donaldson, SR Hill, JMD Skillicorn, DB Univ Oxford Comp Lab Programming Res Grp Oxford OX1 3QD England Queens Univ Dept Comp & Informat Sci Kingston ON K7L 3N6 Canada

The cost of communication in message-passing systems can only be computed based on a large number of low-level details. Consequently, the only architectural measure they naturally suggest is a frrst-order one, latency. We show that a second-order property, the standard deviation of the delivery times is also of interest. Most importantly, the average performance of a large communication system depends not only on the average performance of its components, but also on the standard deviation of these performances. In other words, building a high-performance system requires components that are themselves performing high-performance, but their performance must also have small variance. We illustrate this effect using distributions of the BSP g parameter. Lower bounds in the time per unit transfer of communication in large systems can be derived from data measured over single links. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： parallel programming high-performance computing communication performance machine architecture BSP

来源：评论

学校读者我要写书评

暂无评论

Automatic model generation for performance estimation of parallel programs

引用

parallel COMPUTING 1999年第6期25卷 667-680页

作者： Mierendorff, H Schwamborn, H GMD German Natl Res Ctr Informat Technol SCAI D-53754 St Augustin Germany

A hybrid method for performance modeling of parallel programs is considered where the runtime of large sequential segments is estimated statically and the parallel program structure is evaluated by simulation. The present paper describes a way to generate a model of a given program automatically from the source code where the user has to provide only values for a small number of variables, This model contains the control structure of the original program and timing information for generalized basic blocks. We consider Fortran programs which are parallelized using the message passing paradigm. A prototype of a tool for automatic model generation has been developed which is able to treat examples of moderate size. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： automatic performance modeling parallel programming

来源：评论

学校读者我要写书评

暂无评论

Profiling techniques for communication in fine-grained parallel languages

引用

SOFTWARE-PRACTICE & EXPERIENCE 1999年第6期29卷 519-550页

作者： Scheiman, CJ Haake, B Ibel, M Schauser, KE Univ Calif Santa Barbara Dept Comp Sci Santa Barbara CA 93106 USA Calif Polytech State Univ San Luis Obispo Dept Comp Sci San Luis Obispo CA 93407 USA

Fine tuning the performance of large parallel programs is a very difficult task. A profiling tool can provide detailed insight into the utilization and communication of the different processors, which helps identify performance bottlenecks, In this paper we present two profiling techniques for the fine-grained parallel programming language Split-C, which provides a simple global address space memory model. One profiler provides a detailed analysis of a program's execution. The other profiler collects cumulative information. As our experience shows, it is quite challenging to profile programs that make use of efficient, low-overhead communication. We incorporated techniques which minimize profiling effects on the running program, and quantified the profiling overhead. We present several Split-C applications showing that the profiler is useful in determining performance bottlenecks. Copyright (C) 1999 John Whey & Sons, Ltd.

关键词： parallel programming performance analysis profiling fine-grained communication Split-C Active Messages

来源：评论

学校读者我要写书评

暂无评论

Class Act in parallel programming

引用

IEEE Software 1997年第6期14卷 107-107页

作者： Schaller, Nan C. Rochester Institute of Technology United States

来源：评论

学校读者我要写书评

暂无评论

Compiling High Performance Fortran for distributed-memory architectures

引用

parallel COMPUTING 1999年第13-14期25卷 1785-1825页

作者： Benkner, S Zima, H NEC Europe Ltd C&C Res Labs D-53757 St Augustin Germany Univ Vienna Inst Software Technol & Parallel Syst A-1090 Vienna Austria

High Performance Fortran (HPF) is a data-parallel language that provides a high-level interface for programming scientific applications, while delegating to the compiler the task of generating explicitly parallel message-passing programs. This paper provides an overview of HPF compilation and runtime technology for distributed-memory architectures, and deals with a number of topics in some detail. In particular, we discuss distribution and alignment processing, the basic compilation scheme and methods for the optimization of regular computations. A separate section is devoted to the transformation and optimization of independent loops with irregular data accesses. The paper concludes with a discussion of research issues and outlines potential future development paths of the language. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： High Performance Fortran (HPF) parallel programming parallelization code generation irregular problems distributed-memory architectures

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of simulated annealing using transaction processing

引用

IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES 1999年第2期146卷 107-113页

作者： Pao, DCW Lam, SP Fong, AS City Univ Hong Kong Dept Elect Engn Tat Chee Ave Kowloon Peoples R China

Simulated annealing is an effective method for solving large combinatorial optimisation problems. Because of its iterative nature the annealing process requires a substantial amount of computation time. A new parallel implementation based on the concurrency control theory of database systems is presented;the parallelised annealing process is serialisable. Concurrent updates to the base solution are allowed provided that they do not have data conflict. Using the travelling salesman problem as the example application, the parallel simulated annealing algorithm is implemented on a Motorola Delta 3000 shared-memory multiprocessor system with eight processors. With a moderate problem size of 400 cities, a speedup efficiency of over 90% is achieved at high annealing temperature and close to 100% at a low annealing temperature.

关键词： concurrency control simulated annealing parallel implementation parallel programming combinatorial optimisation parallel algorithms transaction processing parallelised Optimisation techniques shared-memory

来源：评论

学校读者我要写书评

暂无评论

parallelizing I/O-intensive image access and processing applications

引用

IEEE CONCURRENCY 1999年第2期7卷 28-37页

作者： Messerli, V Figueiredo, O Gennart, B Hersch, RD Ecole Polytech Fed Lausanne Dept Comp Sci Peripheral Syst Lab CH-1015 Lausanne Switzerland

CAP, a computer-aided parallelization tool, generates highly pipelined applications that run communication and I/O operations in parallel with processing operations. One of CAP's successes is the Visible Human Slice Server (http://visible ***), a 3D tomographic image server that allows clients to choose and view any cross section of the human body.

关键词： Concurrent computing Application software Humans Distributed computing Web server parallel programming Personal communication networks Computer errors File systems

来源：评论

学校读者我要写书评

暂无评论

Cache-only memory architectures

引用

COMPUTER 1999年第6期32卷 72-+页

作者： Dahlgren, F Torrellas, J Ericsson Mobile Commun Lund Sweden Univ Illinois Dept Comp Sci Urbana IL 61801 USA

The shared-memory concept makes it easier to write parallel programs, but tuning the application to reduce the impact of frequent long-latency memory accesses still requires substantial programmer effort. Researchers have proposed using compilers, operating systems, or architectures to improve performance by allocating data close to the processors that use it, The Cache-Only Memory Architecture (CO,MA) increases the chances of data being available locally because the hardware transparently replicates the data and migrates it to the memory module of the node that is currently accessing it. Each memory module acts as a huge cache memory in which each block has a tag with the address and the state. The authors explain the functionality, architecture, performance, and complexity of COMA systems. They also outline different COMA designs, compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.

关键词： COMA Cache-Only Memory Architecture NUMA systems cache storage compilers data allocation data replication frequent long latency memory accesses huge cache memory memory architecture memory module nonuniform memory access operating systems parallel programming parallel programs performance obstacles programmer effort shared memory concept shared memory systems storage management

来源：评论

学校读者我要写书评

暂无评论

Space efficient execution of deterministic parallel programs

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1999年第6期25卷 870-882页

作者： Simpson, DJ Burton, FW Simon Fraser Univ Burnaby BC V5A 1S6 Canada

We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of memory (i.e., heap memory allocation can be modeled). We call a parallel schedule "space efficient" if the amount of memory required is at mast equal to the number of processors times the amount of memory required for some depth-first execution of the program by a single processor. We will describe a simple, locally depth-first, scheduling algorithm and shaw that it is always space efficient. Since the scheduling algorithm is greedy, it will be within a factor of two of being optimal with respect to time. For the special case of a program having a series-parallel structure, we show how to efficiently compute the worst case memory requirements over all possible depth-first executions of a program. Finally, we show how scheduling can be decentralized, making the approach scalable to a large number of processors when there is sufficient parallelism.

关键词： memory management scheduling worst case performance parallel programming memory bounds shared memory

来源：评论

学校读者我要写书评

暂无评论

The role of graphics in parallel program development

引用

JOURNAL OF VISUAL LANGUAGES AND COMPUTING 1999年第3期10卷 215-243页

作者： Zhang, K Hintz, T Ma, XW Macquarie Univ Dept Comp Sydney NSW 2109 Australia Univ Technol Sydney Sch Comp Sci Sydney NSW 2007 Australia Fujitsu Australia Software Technol French Forest NSW 2150 Australia

Graphical visualisation plays an important role in parallel program development. Researchers have proposed and developed many visualisation tools that assist the development of parallel programs. A number of graph formalisms or notations have been used to visualise various aspects of parallel programs and their executions. This paper attempts to classify and compare these graph formalisms and notations which provide different information at different stages of parallel program development. (C) 1999 Academic Press.

关键词： parallel programming visual programming program visualisation graph models debugging

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：