In this paper, we address the issues of partitioning sparse arrays whose non-zero elements are distributed non-uniformly. We consider inference schemes for Fortran 90 array intrinsics so that the non-zero structure of...
详细信息
ISBN:
(纸本)0769512585
In this paper, we address the issues of partitioning sparse arrays whose non-zero elements are distributed non-uniformly. We consider inference schemes for Fortran 90 array intrinsics so that the non-zero structure of the output array can be deduced from the non-zero structures of the input arrays. Experiments are conducted to measure the effectiveness of our method with the Harwell-Boeing sparse matrix collection. We also demonstrate that, given the sparsity structures of the source arrays and with the help of our inference schemes, one can predict the performance differences among a collection of equivalent Fortran 90 code for sample on-line analytical processing (OLAP). The experiments are performed on an IBM SP2 cluster with the library support of our sparse array intrinsics.
This paper presents benchmark results of three different parallel-programming paradigms on an unstructured shock capturing numerical code for transient problems. The three parallel programming methods include: (1) a s...
详细信息
ISBN:
(纸本)0769509908
This paper presents benchmark results of three different parallel-programming paradigms on an unstructured shock capturing numerical code for transient problems. The three parallel programming methods include: (1) a shared-memory programming of OpenMP using cache coherent non-uniform memory access (CC-NUMA) of SGI Origin2000 (2) an MPI (Message Passing Interface) implementation and (3) a SHMEM implementation using the parallel library called "Shared Memory Access Library". The methods (2) and (3) are both based on distributed memory architecture. SGI Origin2000 is used throughout the current study. It is found that the scalability of the programming (1) is so poor that its usage for the unstructured CFD code is impractical. The scalabilities of programming (2) and (3) are much better than programming (1) and the computational speed of giga-flops range can be achieved with 16 CPUs. The parallel programming with SHMEM libraries is approximately twice as fast as the one with MPI.
The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as "oblivious synchronization", i...
详细信息
ISBN:
(纸本)0769509878
The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as "oblivious synchronization", implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of accuracy in prediction. This paper proposes an extension of the BSP complexity model to deal with oblivious barriers and shows its accuracy.
The close association between higher order functions and algorithmic skeletons is a promising source of automatic parallelisation of programs. An approach to automatically synthesizing higher order functions from func...
详细信息
ISBN:
(纸本)076951426X
The close association between higher order functions and algorithmic skeletons is a promising source of automatic parallelisation of programs. An approach to automatically synthesizing higher order functions from functional programs through proof planning is presented Our work has been conducted within the context of a parallelising compiler for SML, with the objective of exploiting parallelism latent in potential higher order function use in programs.
A workable approach for modernization of existing software into parallel/distributed applications is through coarse-grain restructuring. If, for instance, entire subroutines of legacy code can be plugged into a new st...
详细信息
A workable approach for modernization of existing software into parallel/distributed applications is through coarse-grain restructuring. If, for instance, entire subroutines of legacy code can be plugged into a new structure, the investment required for the rediscovery of the details of what they do can be spared. The resulting renovated software can then take advantage of the improved performance offered by modern parallel/distributed computing environments, without rethinking or rewriting the bulk of their existing code. The authors discuss one of their experiments using the coordination language MANIFOLD to restructure an existing sequential numerical application written in Fortran 77, into a concurrent application.
A distributed shared memory system, named SMS, is a user-level software system. It provides a virtual shared memory environment on a computer cluster consisting of computers connected by a communication network. Altho...
详细信息
A distributed shared memory system, named SMS, is a user-level software system. It provides a virtual shared memory environment on a computer cluster consisting of computers connected by a communication network. Although the SMS requires only commodity hardware and software, it enables users to write parallel programs under a shared memory programming model.
The Visual Environment for the Development of Real-Time parallel Programs (http://***//spl sim/tev) has been designed to facilitate the change from the sequential to the parallel paradigm. This graphic environment ena...
详细信息
The Visual Environment for the Development of Real-Time parallel Programs (http://***//spl sim/tev) has been designed to facilitate the change from the sequential to the parallel paradigm. This graphic environment enables the fast prototyping of parallel applications executed in the real-time operating system Virtuoso from Eonic Systems Inc (http://***). Such parallel applications run on a parallel machine making intensive calculations and interacting with the outside world through a text-based local terminal. However, with the advance of the Internet it has been possible to develop programs capable of remotely executing, manipulating and monitoring such applications. This paper focuses on the GUI Generator with Support to Real-Time Remote Procedure Calls which aims at managing the development of visual Java applications able to manipulate, at distance, real-time parallel programs through remote procedure calls that respect time requirements and guarantee the atomicity of the remote execution.
Providing variable granularities is an attractive way to achieve good speedups for various classes of parallel applications. A few systems achieve this goal by instrumenting an application with the checking code for t...
详细信息
ISBN:
(纸本)0769509908
Providing variable granularities is an attractive way to achieve good speedups for various classes of parallel applications. A few systems achieve this goal by instrumenting an application with the checking code for the state of shared data. Although these systems can provide arbitrary granularities flexibly, they have severe race conditions inherent to software-only approaches as well as the run-time overhead of the instrumentation. In this paper, we propose a new mechanism, which has low overhead and incurs no race conditions while providing variable granularities in software. The unique idea of our mechanism is to delegate the state checks to the segmentation hardware of the Intel X86. The instrumented code only maintains the state of shared data to use the segmentation hardware. Because the hardware atomically performs the required state checks and corresponding references, our mechanism is free from difficult race conditions. This feature efficiently enhances the response time to remote requests via an interrupt mechanism without additional synchronization overheads for avoiding race conditions. The run-time overhead further decreases owing to the reduced works to be done by software. The evaluation results show that our mechanism exhibits sufficiently low overhead even without any optimization.
MPI-IO/GPFS is an optimized prototype implementation of the I/O chapter of the Message Passing Interface (MPI) 2 standard. It uses the IBM General parallel File System (GPFS) Release 3 as the underlying file system. T...
详细信息
MPI-IO/GPFS is an optimized prototype implementation of the I/O chapter of the Message Passing Interface (MPI) 2 standard. It uses the IBM General parallel File System (GPFS) Release 3 as the underlying file system. This paper describes optimization features of the prototype that take advantage of new GPFS programming interfaces. It also details how collective data access operations have been optimized by minimizing the number of messages exchanged in sparse accesses and by increasing the overlap of communication with file access. Experimental results show a performance gain. A study of the impact of varying the number of tasks running on the same node is also presented.
We present a benchmark suite for computational grids in this paper. It is based on the NAS parallel Benchmarks (NPB) and is called the NAS Grid Benchmark (NGB). We present NGB as a data flow graph encapsulating an ins...
详细信息
We present a benchmark suite for computational grids in this paper. It is based on the NAS parallel Benchmarks (NPB) and is called the NAS Grid Benchmark (NGB). We present NGB as a data flow graph encapsulating an instance of an NPB code in each graph node, which communicates with other nodes by sending/receiving initialization data. These nodes may be mapped to the same or different Grid machines. Like NPB, NGB specifies several different classes (problem sizes). NGB also specifies the generic Grid services that are sufficient for running the suite. The implementor has the freedom to choose any Grid environment. We describe a reference implementation in Java and present some scenarios for using NGB.
暂无评论