In massively parallel computer systems for embedded real-time applications there are normally very high bandwidth demands on the interconnection network. Other important properties are time-deterministic latency and s...
详细信息
In massively parallel computer systems for embedded real-time applications there are normally very high bandwidth demands on the interconnection network. Other important properties are time-deterministic latency and services to guarantee that deadlines are met. In this paper we analyze how these properties vary with the design parameters for a passive optical star network, specifically when used in a massively parallel radar signal processing system. The aggregated bandwidth and computational power of the radar system are approximately 45 Gb/s and 100 GOPS, respectively. The analysis is focused on the medium access control protocol, called TD-TWDMA, for the time and wavelength multiplexed network. It is concluded that the proposed network is very well suited to this kind of signal-processing applications. We also present a new distributed slot-allocation algorithm with real-time properties.
While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This paper presents our st...
详细信息
While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This paper presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading on data distribution and workload distribution with variable thread granularity. Various types of workload distribution strategies are defined along thread granularity. Three types of data distribution strategies are investigated: row-wise cyclic, k-way partial-row cyclic and blocked distribution. We have implemented all of these on the 80-processor EM-4 distributed-memory multiprocessor using highly-sequential Gaussian elimination with partial pivoting and highly-parallel matrix multiplication. Experimental results indicated that multithreading can offset the loss that is due to the mismatch of data distribution to workload distribution for even sequential and irregular problems while giving high absolute performance.
Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. This report explicates the overlapping capability of two distributed-memory multiprocessors: t...
详细信息
Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. This report explicates the overlapping capability of two distributed-memory multiprocessors: the EM-X and IBM SP-2. The well-known bitonic sorting algorithm is selected for experiments. Various message sizes are used to determine when, where, how much and why overlapping takes place. Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. EM-X is found to be message-size insensitive yielding high overlap for various message sizes, while SP-2 was effective for the window of message size 512 to 2K integers.
This paper discusses the present state of the art of components, systems, and application technology related to parallel optical data links (ODL) as demonstrated by the OptoElectronic technology Consortium (OETC). Par...
详细信息
This paper discusses the present state of the art of components, systems, and application technology related to parallel optical data links (ODL) as demonstrated by the OptoElectronic technology Consortium (OETC). Parallel ODL technology is poised for large volume commercialization despite some uncertainties in industrial standards and system applications. This is fueled by the demand for high-bandwidth to support the upcoming information age. To meet the need for low-cost, broadband digital multimedia services, parallel ODL technology faces the challenge of providing reasonable cost/performance ratios when compared with other established technologies. Responding to this challenge has required the integration of a number of state-of-the-art component technologies (e.g. VCSEL, monolithic integrated photoreceiver, MCM, GaAs IC, optical array connector and cable) with system designs and applications.
Harder, new requirements are appearing in the area of database systems. The popularity reached by parallel database systems during the past decade, due to their high performance and scalability characteristics, should...
详细信息
One of the most important features of interconnection networks for massively parallel computer systems is scaleability. The fiber-optic network described in this paper uses both wavelength division multiplexing and a ...
详细信息
One of the most important features of interconnection networks for massively parallel computer systems is scaleability. The fiber-optic network described in this paper uses both wavelength division multiplexing and a configurable ratio between optics and electronics to gain an architecture with good scaleability. The network connects distributed modules together to a huge parallel system where each node itself typically consists of parallel processing elements. The paper describes two different implementations of the star topology, one uses an electronic star and fiber optic connections, the other is purely optical with a passive optical star in the center. The medium access control of the communication concept is presented and some scaleability properties are discussed involving also a multiple-star topology.
Harder, new requirements are appearing in the area of database systems. The popularity reached by parallel database systems during the past decade, due to their high performance and scalability
Harder, new requirements are appearing in the area of database systems. The popularity reached by parallel database systems during the past decade, due to their high performance and scalability
Proper distribution of operations among parallel processors in a large scientific computation executed on a distributed-memory machine can significantly reduce the total computation time. In this paper, we propose an ...
详细信息
Proper distribution of operations among parallel processors in a large scientific computation executed on a distributed-memory machine can significantly reduce the total computation time. In this paper, we propose an operation called simultaneous parallel reduction(SPR), that is amenable to such optimization. SPR performs reduction operations in parallel, each operation reducing a one-dimensional consecutive section of a distributed array. Each element of the distributed array is used as an operand to many reductions executed concurrently over the overlapping array's sections. SPR is distinct from a more commonly considered parallel reduction which concurrently evaluates a single reduction. In this paper we consider SPR on Single Instruction Multiple Data (SIMD) machines with different interconnection networks. We focus on SPR over sections whose size is not a power of 2 with the result shifted relative to the arguments. Several algorithms achieving some of the lower bounds on SPR complexity are presented under various assumptions about the properties of the binary operator of the reduction and of the communication cost of the target architectures.
High level data parallel languages such as Vienna Fortran and High Performance Fortran (HPF) have been introduced to allow the programming of massively parallel distributed memory machines at a relatively high level o...
详细信息
High level data parallel languages such as Vienna Fortran and High Performance Fortran (HPF) have been introduced to allow the programming of massively parallel distributed memory machines at a relatively high level of abstraction, based on the single program multiple data (SPMD) paradigm. Their main features include mechanisms for expressing the distribution of data across the processors of a machine. The paper introduces additional language functionality to allow the efficient processing of sparse matrix codes. It introduces methods for the representation and distribution of sparse matrices, which forms a powerful mechanism for storing and manipulating sparse matrices able to be efficiently implemented on massively parallel machines.< >
暂无评论