Run-time data redistribution can enhance algorithm performance in distributed-memory machines, Explicit redistribution of data can be performed between algorithm phases when a different data decomposition is expected ...
详细信息
Run-time data redistribution can enhance algorithm performance in distributed-memory machines, Explicit redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Redistribution, however, represents increased program overhead as algorithm computation is discontinued while data are exchanged among processor memories, In this paper, we present a technique that minimizes the amount of data exchange for BLOCK to CY-CLIC (c) (or vice-versa) redistributions of arbitrary number of dimensions, Preserving the semantics of the target (destination) distribution pattern, the technique manipulates the data to logical processor mapping of the target pattern, When implemented on an IBM SP, the mapping technique demonstrates redistribution performance improvements of approximately 40% over traditional data to processor mapping. Relative to the traditional mapping technique, the proposed method affords greater flexibility in specifying precisely which data elements are redistributed and which elements remain on-processor.
A set of so-called cortical images, motivated by the function of simple cells in the primary visual cortex of mammals, is computed from each of two input images and an image pyramid is constructed for each cortical im...
详细信息
ISBN:
(纸本)3540593934
A set of so-called cortical images, motivated by the function of simple cells in the primary visual cortex of mammals, is computed from each of two input images and an image pyramid is constructed for each cortical image. The two sets of cortical image pyramids are matched synchronously and an optimal mapping of the one image onto the other image is determined. The method was implemented on the Connection Machine CM-5 of the University of Groningen(1) in the data-parallel programming model and applied to the problem of face recognition.
The results are presented of an investigation into the use of the data-parallel programming approach on four different massively-parallel computers: the MasPar MP-1 and MP-2 and the Thinking Machines CM-200 and CM-5. ...
详细信息
The results are presented of an investigation into the use of the data-parallel programming approach on four different massively-parallel computers: the MasPar MP-1 and MP-2 and the Thinking Machines CM-200 and CM-5. A code to calculate inviscid compressible flow, originally written in FORTRAN 77 for a traditional vector computer, has been re-written entirely in Fortran 90 to take advantage of the compilers available on the massively-parallel computers. It is shown that the discretization of the governing equations on a regular mesh is well adapted to dataparallelism. For a typical test problem of supersonic flow through a ramped duct, computational speeds have been achieved using these massively-parallel computers that are superior to those obtained using a single processor of a Cray Y-MP. In addition, this study has enabled the question of code portability between the different computers to be assessed.
We describe a system that allows programmers to take advantage of both control and dataparallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream 1/0 to ...
详细信息
We describe a system that allows programmers to take advantage of both control and dataparallelism through multiple intercommunicating data-parallel modules. This programming environment extends C-type stream 1/0 to include intermodule communication channels. The programmer writes each module as a separate data-parallel program, then develops a channel linker specification describing how to connect the modules together. A channel linker we have developed loads the separate modules on to the parallel machine and binds the communication channels together as specified. We present performance data that demonstrates a mixed control- and data-parallel solution can yield better performance than a strictly data-parallel solution. The system described currently runs on the Intel iWarp multicomputer.
暂无评论