This paper describes and evaluates new methods for relation declustering in parallel databases. To process queries in parallel, relations are partitioned across multiple processors, typically by using the value of one...
详细信息
To exploit instruction level parallelism in programs over multiple basic blocks, programs should have reducible control flow graphs. However not all programs satisfy this property. A new method, called Controlled Node...
详细信息
Automatic fault diagnosis in power systems presents real challenges to computing technologies. As an alternative approach to expert systems, several neural network solutions have been proposed recently. In this paper ...
详细信息
Automatic fault diagnosis in power systems presents real challenges to computing technologies. As an alternative approach to expert systems, several neural network solutions have been proposed recently. In this paper a modular, neural network-based solution to power systems alarm handling and fault diagnosis is described that overcomes the limitations of 'toy' alternatives constrained to small and fixed-topology electrical networks. In contrast to monolithical diagnosis systems, the neural network-based approach presented here fulfills the scalability and dynamic adaptability requirements of the application. Mapping the power grid onto a set of interconnected modules that model the functional behaviour of electrical equipment provides the flexibility and speed demanded by the problem. The way in which the neural system is conceived allows full scalability to real-size power systems.
Next generation radar systems, with phase-controlled array antennas, will have to process data that is many times larger than in current systems. This requires an enormous computing power. Even in a relatively small a...
详细信息
Next generation radar systems, with phase-controlled array antennas, will have to process data that is many times larger than in current systems. This requires an enormous computing power. Even in a relatively small airborne radar system, with hard size and power consumption constraints, a sustained computing power of 40 GOPS (or 40 GFLOPS, if floating point calculations are used) will be needed to perform only the subset of the calculations known as the space-time adaptive processing (STAP). Consequently, there is a need for new parallel computing modules, as well as new overall system architectures and application development environments. In this paper a modular architecture with highly parallel SIMD-modules is presented as a promising solution, capable of handling STAP. A version of the architecture, equipped with bit-serial floating point processing elements, is described and evaluated. Implementation technology aspects are discussed.
In massively parallel computer systems for embedded real-time applications there are normally very high bandwidth demands on the interconnection network. Other important properties are time-deterministic latency and s...
详细信息
In massively parallel computer systems for embedded real-time applications there are normally very high bandwidth demands on the interconnection network. Other important properties are time-deterministic latency and services to guarantee that deadlines are met. In this paper we analyze how these properties vary with the design parameters for a passive optical star network, specifically when used in a massively parallel radar signal processing system. The aggregated bandwidth and computational power of the radar system are approximately 45 Gb/s and 100 GOPS, respectively. The analysis is focused on the medium access control protocol, called TD-TWDMA, for the time and wavelength multiplexed network. It is concluded that the proposed network is very well suited to this kind of signal-processing applications. We also present a new distributed slot-allocation algorithm with real-time properties.
While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This paper presents our st...
详细信息
While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This paper presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading on data distribution and workload distribution with variable thread granularity. Various types of workload distribution strategies are defined along thread granularity. Three types of data distribution strategies are investigated: row-wise cyclic, k-way partial-row cyclic and blocked distribution. We have implemented all of these on the 80-processor EM-4 distributed-memory multiprocessor using highly-sequential Gaussian elimination with partial pivoting and highly-parallel matrix multiplication. Experimental results indicated that multithreading can offset the loss that is due to the mismatch of data distribution to workload distribution for even sequential and irregular problems while giving high absolute performance.
Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. This report explicates the overlapping capability of two distributed-memory multiprocessors: t...
详细信息
Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. This report explicates the overlapping capability of two distributed-memory multiprocessors: the EM-X and IBM SP-2. The well-known bitonic sorting algorithm is selected for experiments. Various message sizes are used to determine when, where, how much and why overlapping takes place. Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. EM-X is found to be message-size insensitive yielding high overlap for various message sizes, while SP-2 was effective for the window of message size 512 to 2K integers.
This paper discusses the present state of the art of components, systems, and application technology related to parallel optical data links (ODL) as demonstrated by the OptoElectronic technology Consortium (OETC). Par...
详细信息
This paper discusses the present state of the art of components, systems, and application technology related to parallel optical data links (ODL) as demonstrated by the OptoElectronic technology Consortium (OETC). Parallel ODL technology is poised for large volume commercialization despite some uncertainties in industrial standards and system applications. This is fueled by the demand for high-bandwidth to support the upcoming information age. To meet the need for low-cost, broadband digital multimedia services, parallel ODL technology faces the challenge of providing reasonable cost/performance ratios when compared with other established technologies. Responding to this challenge has required the integration of a number of state-of-the-art component technologies (e.g. VCSEL, monolithic integrated photoreceiver, MCM, GaAs IC, optical array connector and cable) with system designs and applications.
暂无评论