This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collective communication routines for the hypercube. We show how a systematic application of scattering reduces load imbala...
详细信息
We report our progress on computer chess last described at the Second Conference on Kypercubes. Our program follows the strategy of currently successful sequential chess programs: searching of an alpha-beta pruned gam...
详细信息
A cooperative algorithm for extracting disparity information from stereo image pairs has been implemented on the NCUBE hypercube computer. Software is written in C-Iangauge, using communication routines of the "C...
详细信息
The VERTEX message passing system provided with NCUBE hypercubes is unsafe. The system can fail under high message loads. We have implemented a message passing system with the same "look and feel" as VERTEX ...
详细信息
ISBN:
(纸本)0897912780
The VERTEX message passing system provided with NCUBE hypercubes is unsafe. The system can fail under high message loads. We have implemented a message passing system with the same "look and feel" as VERTEX but which is instead based upon the crystal-trouter running at clock interrupt time. The system is written mostly in C with a few bits of assembly code to run the DMA devices. This implies that safety checks (are the buffers full?) and complex error handling mechanisms can be easily implemented at the C level. A first version works, is safe, and is faster than VERTEX in the high volume limit, slower in the low volume case. At the very least, this system will be interesting for high message traffic applications, such as disk backup.
Coherent Parallel C (CPC) is an extension of C foi parallelism. The extensions are not simply parallel for loops;instead, a data parallel programming model is adopted. This means that one has an entire process for eac...
详细信息
ISBN:
(纸本)0897912780
Coherent Parallel C (CPC) is an extension of C foi parallelism. The extensions are not simply parallel for loops;instead, a data parallel programming model is adopted. This means that one has an entire process for each data object. An example of an "object" is one mesh point in a finite element solver. How the processes are actually distributed on a parallel machine is transparent- the user is to imagine that an entire processor in a distributed-memory environment is dedicated to each process. This simplifies programming tremendously: complex if statements associated with domain boundaries disappear;problems which do not exactly match the machine size and irregular boundaries are all handled transparently. The usual communication calls are not seen at all at the user level. Variables of other processes (which may or may not be on another processor) are merely accessed (global memory). The first pass of the CPC compiler schedules the necessary communications in an efficient, coherent manner. Processes in CPC are insulated from one another and interact in a deterministic manner. This allows tractable debugging. Standard C I/O is provided, with simple extensions for parallelism. We currently have a CPC runtime system implemented on an NCUBE and have started implementing a true compiler for the language. CPC is not specific to distributed memory machines. Implementation of this language on other architectures is natural-for example, there seems to be no fundamental problem with CPC on shared-memory parallel computers.
This paper describes a communication system designed to support highly asynchronous application or system software on a distributed-memory multicomputer such as a hypercube. The system is called generalized signals be...
详细信息
ISBN:
(纸本)0897912780
This paper describes a communication system designed to support highly asynchronous application or system software on a distributed-memory multicomputer such as a hypercube. The system is called generalized signals because it is based on the signal facility in System V UNIX, with enhancements to allow signals to carry data. Any processor can send a signal to any other processor at any time. When a signal arrives, the receiving processor traps to a user-specified subroutine;when this subroutine is finished the interrupted code is resumed. Signal interrupts happen in a controlled manner, thereby simplifying the programmer's task. There is a facility for protection of critical sections in user programs. The generalized signals system has been implemented on the NCUBE hypercube. This implementation is based on a modified version of NCUBE's VERTEX message-passing system. Generalized signals can coexist with VERTEX messages and the enhancements to VERTEX are transparent to ordinary programs.
The performance of a puie gauge QCD code on the NCUBE hypercube is analyzed. Both load imbalance and communication contribute to the concurrent overhead, with the load imbalance term becoming dominant for sufficiently...
详细信息
A two vehicle navigator on a descrete space is analyzed. The concept of linking time maps as source to optimal path planning is discussed. The rules for constructing these maps are given in a cellular automata mode. T...
详细信息
The problem of *** multiple targets in the presence of displacement noise and clutter is formulated as a nonconvex optimization problem. The form of the suggested cost function is shown to be suitable for the Graduate...
详细信息
We introduce a string or world line formalism that provides a general description of time dependent complex systems. We show that it can be applied to mapping general problems onto both sequential and parallel compute...
详细信息
ISBN:
(纸本)0897912780
We introduce a string or world line formalism that provides a general description of time dependent complex systems. We show that it can be applied to mapping general problems onto both sequential and parallel computers. In principle it unifies the concepts of an optimizing compiler with that of parallel decomposition. We show that it reproduces and smoothly interpolates both our original load balancing methods for loosely synchronous problems and optimal communication and combining algorithms such as index and fold. We evaluate two explicit implementations, the neural-router and neural-accumulator, which use an optimizing neural network.
暂无评论