Fork95 is an imperative parallel programming language intended to express algorithms for synchronous shared memory machines (PRAMs). It is based on ANSI C and offers additional constructs to hierarchically divide proc...
详细信息
Fork95 is an imperative parallel programming language intended to express algorithms for synchronous shared memory machines (PRAMs). It is based on ANSI C and offers additional constructs to hierarchically divide processor groups into subgroups and manage shared and private address subspaces. Fork95 makes the assembly-level synchronicity of the underlying hardware available to the programmer at the language level. Nevertheless, it supports locally asynchronous computation where desired by the programmer. We present a one pass compiler, fee, which compiles Fork95 and C programs to the SB-PRAM machine. The SB-PRAM is a lock-step synchronous, massively parallel multiprocessor currently being built at Saarbrucken University, with a physically shared memory and uniform memory access time. We examine three important types of parallel computation Frequently used for the parallel solution of real-world problems. While farming and parallel divide-and-conquer are directly supported by Fork95 language constructs, pipelining can be easily expressed using existing language Features;an additional language construct for pipelining is not required.
A suite of High Performance Fortran (HPF) coding examples of practical scientific algorithms are examined in detail, with the idea that on these simple but non-trivial examples, we can fairly well understand issues re...
详细信息
A suite of High Performance Fortran (HPF) coding examples of practical scientific algorithms are examined in detail, with the idea that on these simple but non-trivial examples, we can fairly well understand issues related to different data distributions, different parallel constructs, and different programming styles (static Versus dynamic allocations). Coding examples include 2D stencils solution of PDEs, N-body problem, LU factorization, several vector/matrix library routines, 2D and 3D array redistribution. Performances of HPF codes are compared to hand-written Fortran codes with message passing libraries. From 1997 to 1998, HPF compilers are improved significantly such that HPF codes perform as well as Fortran+MPI codes for all the examples investigated here. However, many important peculiarities of HPF coding still exist. (C) 1999 Elsevier Science B.V. All rights reserved.
Neural network hardware using time-shared bus and integer representation architecture has already been fabricated and reported from the design viewpoint. However, nothing related to performance evaluation of hardware ...
详细信息
Neural network hardware using time-shared bus and integer representation architecture has already been fabricated and reported from the design viewpoint. However, nothing related to performance evaluation of hardware has yet been presented. Computation-speed, scalability and learning accuracy of hardware are evaluated theoretically and experimentally using a Back Propagation (BP) algorithm. In addition, a mirror-weight assignment technique is proposed for high-speed computation in the BP. NETTalk, an English-pronunciation-reasoning task, has been chosen as the target application for the BP. In the experiment, recently-developed neuro-hardware based on the above architecture and its parallel programming language are used. An outline of the language is described along with BP programming. Mirror-weight assignment allows maximum speed at 55.0 MCUPS (Million Connections Updated Per Second) using 256 neurons in the hidden-layer (numbers of neurons in input- and output-layers are fixed at 203 and 26 respectively in NETTalk). In addition, if scalability is defined as a function of the number of neurons in the hidden-layer, the machine retains high scalability at 0.5 if such a maximum speed needs to be used. No degradation in learning accuracy occurs when experimental results computed using the neuro-hardware are compared with those obtained by floating-point representation architecture (workstation). The experiment indicates that the present integer representational design of the neuro-hardware is sufficient for NETTalk. Performance has been evaluated theoretically. evaluation purposes, it is assumed that most of the execution-time is taken up by bus cycles. On the basis of this assumption, an analytical model of computation-speed and scalability is proposed. Analytical predictions agreed well with experimental results.
The distributed computer system described in this paper is a set of computernodes interconnected in an interconnection network via packet-switching *** nodes communicate with each other by means of message-passing pro...
详细信息
The distributed computer system described in this paper is a set of computernodes interconnected in an interconnection network via packet-switching *** nodes communicate with each other by means of message-passing protocols. Thispaper presents the implementation of rendezvous facilities as highlevel prhoitives provided by a parallel programming language to support interprocess cornmunication andsynchronisation.
Effectively using shared-memory multiprocessors requires substantial programming effort. We present the programminglanguage COOL (Concurrent Object-Oriented language), which was designed to exploit coarse-grained par...
详细信息
Effectively using shared-memory multiprocessors requires substantial programming effort. We present the programminglanguage COOL (Concurrent Object-Oriented language), which was designed to exploit coarse-grained parallelism at the task level in shared-memory multiprocessors. COOL's primary design goals are efficiency and expressiveness. By efficiency we mean that the language constructs should be efficient to implement and a program should not have to pay for features it does not use. By expressiveness, we imply that the program should flexibly support different concurrency patterns, thereby allowing various decompositions of a problem. COOL emphasizes the integration of concurrency and synchronization with data abstraction to ease the task of creating modular and efficient parallel programs. It is an extension of C++, which was chosen because it supports abstract data type definitions and is widely used
We are convinced that the combination of data-parallellanguages and MIMD hardware can make an important contribution to high-speed computing. In this paper, we describe the implementation of two compilers for the dat...
详细信息
We are convinced that the combination of data-parallellanguages and MIMD hardware can make an important contribution to high-speed computing. In this paper, we describe the implementation of two compilers for the data-parallel programming language Dataparallel C. One compiler generates code for Intel and nCUBE hypercube multicomputers;the other generates code for Sequent multiprocessors. We have compiled and executed a suite of Dataparallel C programs, and we present their execution times and speedups on the Intel iPSC/2, the nCUBE 3200, and the Sequent Symmetry.
programming multiprocessor parallel architectures is a complex task. This paper describes a block-structured scientific programminglanguage, BLAZE, designed to simplify this task. BLAZE contains array arithmetic, ‘f...
详细信息
programming multiprocessor parallel architectures is a complex task. This paper describes a block-structured scientific programminglanguage, BLAZE, designed to simplify this task. BLAZE contains array arithmetic, ‘forall’ loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow.
A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture, while neglecting the remainder. This paper describes the features of BLAZE, and show how this language would be used in typical scientific programming.
暂无评论