We develop systolic algorithms for the OR, AND, Oversizing, and Undersizing of rectilinear polygons. These algorithms work on an edge representation of the polygons rather than on a bit map representation. The algorit...
详细信息
We develop systolic algorithms for the OR, AND, Oversizing, and Undersizing of rectilinear polygons. These algorithms work on an edge representation of the polygons rather than on a bit map representation. The algorithms are to be run on a systolic chain of processors. The edges are input at the left end of this chain. From here, they ‘float’ as far to the right as necessary. As edges float to the right, they compare themselves with edges that are resident in the processors they are floating through. During this comparison the output polygons are generated. Output polygons float to the left. These polygons are output from the left end of the chain. The throughput of the systolic system can be improved by increasing the length of the processor chain.
Programming multiprocessor systems is not a simple task and is often error-prone. In particular, programming a message-passing parallel computer usually demands large amounts of work in development and debugging to av...
详细信息
Programming multiprocessor systems is not a simple task and is often error-prone. In particular, programming a message-passing parallel computer usually demands large amounts of work in development and debugging to avoid deadlock. It is, therefore, essential to have software to support the task of parallel programming, or more desirably to have automatic parallel program generators to greatly reduce the labour. In this paper, we present a novel method of automatically generating Occam programs;or more precisely, we present an automatic FP-to-Occam translator called AFO. Occam is a parallel, message-passing programming language executed on a network of processors called transputers, but is basically a sequential language extended with primitive parallel and communication constructs. In contrast, FP is a functional language that contains no communication details and supports parallel thinking naturally with higher-level parallel constructs. FP has been used to derive many systolic algorithms. systolic algorithms can be implemented in Occam as soft-systolic algorithms. With this knowledge, we developed AFO to automatically generate Occam programs from FP programs that have corresponding systolic algorithms, in the hope of generating parallel programs without needing to know the details of low-level communications, and thus greatly improving the productivity of parallel programming.
This thesis presents some new. systolic algorithms for numerical computation, that are suitable for implementation on VLSI processor arrays or optical processors. Chapter 1 is an introduction to the environment for th...
详细信息
This thesis presents some new. systolic algorithms for numerical computation, that are suitable for implementation on VLSI processor arrays or optical processors. Chapter 1 is an introduction to the environment for the development of the systolic approach, followed by an over- view of major research areas in systolic systems. Chapter 2 contains basic mathematical definitions and a brief intro- duction to specific areas of numerical analysis. Chapter 3 starts with some basic definitions and terminology for sys- tolic computing; then fundamental systolic algorithms are described. Following is a review of some transformation techniques and an introduction to systolic 'programming and soft-systolic simulation. Finally, systolic and optical com- puting are combined, and a framework for developing systolic algorithms is outlined. Chapter 4 investigates systolic algorithms for the solution of polynomial equations, and the systolic calcula- tion of the roots of the characteristic equation of certain matrices. Chapter 5 presents systolic algorithms for the efficient solution and the ·updating of the solution of linear systems of equations, using LU decomposition. Chapter 6 develops the concept of pipelining systolic a 7 r a y s ~ as well as the combination of area and time expan- Slon, 1n iterative solution of linear systems of equations, based on series of systolic matrix-vector multiplications. Chapter 7 further develops the idea of expanding iterative systolic algorithms in area and/or in time. The systolic implementation of successive matrix-matrix multiplications is discussed and then a group of algorithms based on matrix powering is studied. Chapter 8 presents some optical sys- tolic algorithms. The direct mapping of VLSI systolic algo- rithms on optical processors is discussed, and then, the Outer Product processor is used for the optical systolic implementation of basic matrix computations. Chapter 9 completes this thesis with some general con- clusions, and suggesti
In this paper we examine the computational complexity of optimal systolic array algorithms for tensor product. We provide a time minimal schedule that meets the computed processor lower and upper bounds including one ...
详细信息
In this paper we examine the computational complexity of optimal systolic array algorithms for tensor product. We provide a time minimal schedule that meets the computed processor lower and upper bounds including one for tensor product. Our principle contribution is to find lower and upper bounds in the algorithm and its implementation for generating function and find processor-time-minimal schedule. (c) 2004 Elsevier Inc. All rights reserved.
This paper describes the systolic implementation of a class of Artificial Neural Networks generally termed as Associative Memories. Such networks are the Discrete Autocorrelator or Discrete Hopfield, the Bi-directiona...
详细信息
This paper describes the systolic implementation of a class of Artificial Neural Networks generally termed as Associative Memories. Such networks are the Discrete Autocorrelator or Discrete Hopfield, the Bi-directional Associative Memory, the Temporal Associative Memory, the Linear Associative Memory and the Novelty Filter. Further, problems such as different weight updating strategies, higher order correlation techniques and Fuzzy pattern encoding are discussed. The Re-Usable Matrix Vector and Matrix Multiplication systolic Arrays form the basis of the systolic implementation.
The paper reviews published systolic algorithms for both standard and square-root Kalman filtering. The formation of these complex arrays from simpler ones for matrix computations, including multiplication, backsubsti...
详细信息
The paper reviews published systolic algorithms for both standard and square-root Kalman filtering. The formation of these complex arrays from simpler ones for matrix computations, including multiplication, backsubstitution, orthogonal decomposition and the Schur complement, is described. The Kalman filter arrays are compared in terms of the number of parallel computations required and the speed and efficiency of the algorithm.
Autocorrelation becomes an increasingly important tool to verify improvements in the state of the simulational art in Lattice Gauge Theory. Semi-systolic and full-systolic algorithms are presented which are intensivel...
详细信息
Autocorrelation becomes an increasingly important tool to verify improvements in the state of the simulational art in Lattice Gauge Theory. Semi-systolic and full-systolic algorithms are presented which are intensively used for correlation computations on the Connection Machine CM-2. The semi-systolic algorithm makes use of an intrinsic, microprogrammed global-add reduction function which is implemented extremely well on the Connection Machine. Nevertheless, the full-systolic correlation algorithm which makes use only of local communication and computation operations turns out to be substantially superior to the semi-systolic scheme whose basic step involves a non-local sum computation that extends over the entire machine.
In this paper we design a new and efficient systolic architecture for the longest common subsequences problem which is, given two finite strings on any alphabet, to recover a subsequence of maximal length of both stri...
详细信息
In this paper we design a new and efficient systolic architecture for the longest common subsequences problem which is, given two finite strings on any alphabet, to recover a subsequence of maximal length of both strings. A natural extension to this problem is to determine the set of all longest common subsequences of the two given strings. First, we present a modular linear time algorithm on an input/output bounded and fault-tolerant semi-mesh systolic structure for the longest common subsequence problem. Then, we extend this algorithm to the set of all longest common subsequences problem. (C) 1998 Elsevier Science B.V. All rights reserved.
Two different linear systolic arrays have been suggested for the computation of discrete cosine transform (DCT). The proposed linear arrays are complementary to each other in the sense that the output of the linear ar...
详细信息
Two different linear systolic arrays have been suggested for the computation of discrete cosine transform (DCT). The proposed linear arrays are complementary to each other in the sense that the output of the linear arrays of one type may be fed as the input for the linear arrays of the other type. This feature of the proposed linear arrays has been utilised for designing a bilayer structure for computing the prime-factor DCT. It is interesting to note that the proposed structure does not require any hardware/time for transposition of the intermediate results. The desired transposition is achieved by orthogonal alignment of the linear arrays of the upper layer with respect to those of the lower layer. The proposed structures provide high throughput of computation due to fully pipelined processing, and massive parallelism employed in the bilayer architecture.
This paper describes a systolic algorithm for interpolation and evaluation of polynomials over any field using a linear array of processors. The periods of these algorithms are O(n) for interpolatin and O(1) for evalu...
详细信息
This paper describes a systolic algorithm for interpolation and evaluation of polynomials over any field using a linear array of processors. The periods of these algorithms are O(n) for interpolatin and O(1) for evaluation. This algorithm is readily adapted for Chinese remaindering, easily generalized for the multivariable interpolation and can be extended for rational interpolation to produce Pade approximants. The instruction systolic array implementation of the algorithm is presented here.
暂无评论