In this brief, we propose a combined systolic array-content addressable memory architecture for real-time Gabor decomposition. We then present codec designs for Progressive Image Transmission using this architecture. ...
详细信息
In this brief, we propose a combined systolic array-content addressable memory architecture for real-time Gabor decomposition. We then present codec designs for Progressive Image Transmission using this architecture. Gabor decomposition is attractive for image compression since the basis functions match the human visual profiles, Gabor functions also achieve the lowest bound on the joint uncertainty of data, However these functions are not orthogonal and hence an analytic solution for the decomposition does not exist, Recently it has been shown that Gabor decomposition can be computed as a multiplication between a transform matrix and a vector of image data. For an n x n image, the proposed architecture for Gabor decomposition consists of a linear systolic array of n processing elements each with a local CAM, Simulations and complexity studies show that this architecture can achieve real-time performance with current VLSI technology.
A dynamically reconfigurable bit-serial systolic array implemented in 1.2-mu m double-metal P-well CMOS is described. This processor array is proposed as the central computational unit in the Reconfigurable systolic A...
详细信息
A dynamically reconfigurable bit-serial systolic array implemented in 1.2-mu m double-metal P-well CMOS is described. This processor array is proposed as the central computational unit in the Reconfigurable systolic Array (RSA) neuro-computer and performance estimates suggest that a 64 IC system (containing a total of 1024 usable processors) can achieve a learning rate of 1134 MCUPS on the NETtalk problem, The architecture employs reconfiguration techniques for both fault-tolerance and functionality, and allows a number of neural network models (in both the recall and learning phases) from associative memory networks, supervised networks, and unsupervised networks to be supported, (C) 1997 Academic Press.
Limitations of current systolic designs are pointed out and new constraints are imposed to make systolic solutions practical. Matrix multiplication is used as an illustration, and a simple but very high performance sy...
详细信息
Limitations of current systolic designs are pointed out and new constraints are imposed to make systolic solutions practical. Matrix multiplication is used as an illustration, and a simple but very high performance systolic architecture, the supercoprocessor for matrix problems (S-MP), which satisfies these constraints is presented. Implementation alternatives for the linear systolic array for matrix-vector multiplication, which forms the core of the S-MP, are also described.
The paper reviews published systolic algorithms for both standard and square-root Kalman filtering. The formation of these complex arrays from simpler ones for matrix computations, including multiplication, backsubsti...
详细信息
The paper reviews published systolic algorithms for both standard and square-root Kalman filtering. The formation of these complex arrays from simpler ones for matrix computations, including multiplication, backsubstitution, orthogonal decomposition and the Schur complement, is described. The Kalman filter arrays are compared in terms of the number of parallel computations required and the speed and efficiency of the algorithm.
A modular architecture for very fast digital signal processing (DSP) elements are presented. The computation is performed over finite rings (or fields) and is able to emulate processing over the integer ring using res...
详细信息
A modular architecture for very fast digital signal processing (DSP) elements are presented. The computation is performed over finite rings (or fields) and is able to emulate processing over the integer ring using residue number systems. The computations are restricted to closed operations (ring or field binary operators) with the ability to perform limited scaling operations. Computations naturally defined over finite mathematical systems are also easily implemented using this approach. The technique evolves from the decomposition of each closed calculation using the ring/field associativity property. Linear systolic arrays, formed with multiple elements, each of a single generic form, are used for all calculations. The pipeline cycle is determined from the generic cell and is predicted to be very fast by a critical path analysis. The cells are matched to the VLSI medium, and the resulting array structures are very dense. Examples of DSP applications are given to illustrate the technique, and example cell and array VLSI layouts are presented for a 3- mu m CMOS process.
This paper describes a systolic algorithm for interpolation and evaluation of polynomials over any field using a linear array of processors. The periods of these algorithms are O(n) for interpolatin and O(1) for evalu...
详细信息
This paper describes a systolic algorithm for interpolation and evaluation of polynomials over any field using a linear array of processors. The periods of these algorithms are O(n) for interpolatin and O(1) for evaluation. This algorithm is readily adapted for Chinese remaindering, easily generalized for the multivariable interpolation and can be extended for rational interpolation to produce Pade approximants. The instruction systolic array implementation of the algorithm is presented here.
In this paper we design a new and efficient systolic architecture for the longest common subsequences problem which is, given two finite strings on any alphabet, to recover a subsequence of maximal length of both stri...
详细信息
In this paper we design a new and efficient systolic architecture for the longest common subsequences problem which is, given two finite strings on any alphabet, to recover a subsequence of maximal length of both strings. A natural extension to this problem is to determine the set of all longest common subsequences of the two given strings. First, we present a modular linear time algorithm on an input/output bounded and fault-tolerant semi-mesh systolic structure for the longest common subsequence problem. Then, we extend this algorithm to the set of all longest common subsequences problem. (C) 1998 Elsevier Science B.V. All rights reserved.
Two different linear systolic arrays have been suggested for the computation of discrete cosine transform (DCT). The proposed linear arrays are complementary to each other in the sense that the output of the linear ar...
详细信息
Two different linear systolic arrays have been suggested for the computation of discrete cosine transform (DCT). The proposed linear arrays are complementary to each other in the sense that the output of the linear arrays of one type may be fed as the input for the linear arrays of the other type. This feature of the proposed linear arrays has been utilised for designing a bilayer structure for computing the prime-factor DCT. It is interesting to note that the proposed structure does not require any hardware/time for transposition of the intermediate results. The desired transposition is achieved by orthogonal alignment of the linear arrays of the upper layer with respect to those of the lower layer. The proposed structures provide high throughput of computation due to fully pipelined processing, and massive parallelism employed in the bilayer architecture.
The theory and design of systolic arrays for Viterbi processing in communication systems with a time-dispersive time-varying channel is discussed. The architecture, algorithms, and processor elements, for a two-dimens...
详细信息
The theory and design of systolic arrays for Viterbi processing in communication systems with a time-dispersive time-varying channel is discussed. The architecture, algorithms, and processor elements, for a two-dimensional systolic array are described. The array supports the branch metric computations required for an adaptive Viterbi processor. The array is designed so that computations propagate along the rows of the array, while data symbols propagate along the columns. All interprocessor data flow and connections within the array are nearest-neighbor. The array illustrates how the Viterbi-processor algorithms can be structured to achieve a high degree of computational concurrency. Variations in the array design are described and evaluated in terms of computational resource requirements and utilization and computational throughput. A high-bandwidth memory interface is proposed, and system design considerations are discussed.
Autocorrelation becomes an increasingly important tool to verify improvements in the state of the simulational art in Lattice Gauge Theory. Semi-systolic and full-systolic algorithms are presented which are intensivel...
详细信息
Autocorrelation becomes an increasingly important tool to verify improvements in the state of the simulational art in Lattice Gauge Theory. Semi-systolic and full-systolic algorithms are presented which are intensively used for correlation computations on the Connection Machine CM-2. The semi-systolic algorithm makes use of an intrinsic, microprogrammed global-add reduction function which is implemented extremely well on the Connection Machine. Nevertheless, the full-systolic correlation algorithm which makes use only of local communication and computation operations turns out to be substantially superior to the semi-systolic scheme whose basic step involves a non-local sum computation that extends over the entire machine.
暂无评论