This paper discusses high-speed array implementations of two image processingalgorithms, namely the 'Hough transform for Detection of line segments', and 'Backprojection in CT image reconstruction'. A...
详细信息
This paper discusses high-speed array implementations of two image processingalgorithms, namely the 'Hough transform for Detection of line segments', and 'Backprojection in CT image reconstruction'. A multi-chip-module (MCM) construction is proposed consisting of three types of chips, a high speed multi-function nonlinear chip, a flexible Multiply-Accumulate chip, and an image kernel chip. Called v-array, it can be configured to have eight HOUGH modules, so as to produce the Hough transform of a 1024 × 1024 image in an estimated 13 ms in 2.0 micron CMOS technology (6.6 ms in 1.0 micron CMOS technology). Similarly, a v-array MCM can accommodate eight CT modules, which can produce the backprojected image in 209 ms in 2.0 micron CMOS technology (105 ms in 1.0 micron CMOS technology). To gain a significant speed advantage, we have developed an advanced multi-function cell for performing any one of four nonlinear operations: (1) square-root, (2) reciprocal, (3) sine/cosine, and (4) arctangent - - all realized in a single chip, accessible on a selectable basis. A 16 bit four-function 'one cycle' vLSI chip, fabricated in 2.0 micron CMOS technology, is presently available which outputs a new result every clock cycle. Using this nonlinear cell and two other cells, an application level Hough transform module and a CT module are presented.
The authors present an investigation into the utilisation of parallel computing techinques for real-time simulation and control of a flexible beam structure in transverse vibration. The performance demands of modern c...
详细信息
The authors present an investigation into the utilisation of parallel computing techinques for real-time simulation and control of a flexible beam structure in transverse vibration. The performance demands of modern control systems require the employment of complex algorithms with demanding operations which, in turn, leads to shorter sampling times. Therefore, real-time performance in control applications where the use of advanced control methods is warranted becomes difficult to accomplish. Many demanding complex control processes cannot be satisfactorily realised with conventional uni-processor and multi-processor systems. Previous investigations have demonstrated the limitations of employing only transputers for real-time implementations in control applications. Alternative strategies where multi-processor based systems are employed, utilising digital signalprocessing (DSP) and parallel processing techniques, could provide suitable methodologies.< >
The paper describes a parallel architecture for universal digital signalprocessing. This architecture uses not only multiply-accumulate but also nonlinear operations, such as reciprocal, squareroot, exponential, sine...
详细信息
The paper describes a parallel architecture for universal digital signalprocessing. This architecture uses not only multiply-accumulate but also nonlinear operations, such as reciprocal, squareroot, exponential, sine/cosine, etc. Several advancedalgorithms can thus be mapped to this array architecture. Specifically, the paper focuses attention on two very diverse algorithms, namely the fast Fourier transform and the matrix LU decomposition. Only two types of cells are used in the architecture; these are the Universal Multiply-Subtract-Add cell (UMSA) and the Universal Nonlinear cell (UNL). Both MA and nonlinear operations are performed in hardware, so that the operation times are on the order of chip-clock cycle times. The use of only two types of cells makes the architecture highly suitable for wafer scale integration. It is interesting to note that the same resources on the wafer are used for configuring it to either the FFT algorithm or the LU decomposition algorithm.< >
This paper discusses high-speed array implementations of two image processingalgorithms, namely the 'Hough transform for detection of line segments', and 'backprojection in CT image reconstruction'. A...
详细信息
This paper discusses high-speed array implementations of two image processingalgorithms, namely the 'Hough transform for detection of line segments', and 'backprojection in CT image reconstruction'. A multi-chip-module (MCM) construction is proposed consisting of three types of chips, a high speed multi-function nonlinear chip, a flexible multiply-accumulate chip, and an image kernel chip. Called v-array, it can be configured to have eight Hough modules, so as to produce the Hough transform of a 1024/spl times/1024 image in an estimated 13 ms in 2.0 micron CMOS technology (6.6 ms in 1.0 micron CMOS technology). Similarly, a v-array MCM can accommodate eight CT modules, which can produce the backprojected image in 209 ms in 2.0 micron CMOS technology (105 ms in 1.0 micron CMOS technology). To gain a significant speed advantage, we have developed an advanced multi-function cell for performing any one of four nonlinear operations: square-root, reciprocal, sine/cosine, and arctangent-all realized in a single chip, accessible on a selectable basis. A 16 bit four-function "one cycle" vLSI chip, fabricated in 2.0 micron CMOS technology, is presently available which outputs a new result every clock cycle. Using this nonlinear cell and two other cells, an application level Hough transform module and a CT module are presented.< >
The proceedings contain 45 papers. The topics discussed include: recent advances in nonstationary signal analysis: time-varying higher-order spectra and multilinear time-frequency signal analysis;real-time radar signa...
The proceedings contain 45 papers. The topics discussed include: recent advances in nonstationary signal analysis: time-varying higher-order spectra and multilinear time-frequency signal analysis;real-time radar signalprocessing for autonomous aircraft landing;radon transform-based velocity estimation of vehicles with linear arrays;coherent signal-subspace processing for near-field broad-band source localization;performance analysis of the subspace-based DOA estimator in the presence of unknown noise;numerical integration of PDEs by discrete passive modeling of physical systems;time-recursive computation and real-time parallel architectures, with application on the modulated lapped transform;and discrete frequency estimation using parametric filtering and the contraction mapping method.
Implementing Jacobi algorithms in parallel vLSI processor arrays is a non-trivial task, in particular when the algorithms are parametrized with respect to size and the architectures are parametrized with respect to sp...
详细信息
ISBN:
(纸本)0819412767
Implementing Jacobi algorithms in parallel vLSI processor arrays is a non-trivial task, in particular when the algorithms are parametrized with respect to size and the architectures are parametrized with respect to space-time trade-offs. The paper is concerned with an approach to implement several time-adaptive Jacobi-type algorithms on a parallel processor array, using only Cordic arithmetic and asynchronous communications, such that any degree of parallelism, ranging from single-processor up to full-size array implementation, is supported by a `universal' processing unit. This result is attributed to a gracious interplay between algorithmic and architectural engineering.
This paper demonstrates that order-recursive least squares (ORLS) algorithms based on orthogonal transformations and hyperbolic transformations can be systematically constructed in two steps. The first step is to dete...
详细信息
ISBN:
(纸本)0819412767
This paper demonstrates that order-recursive least squares (ORLS) algorithms based on orthogonal transformations and hyperbolic transformations can be systematically constructed in two steps. The first step is to determine the structure of the ORLS algorithm according to the property of the data vector in the LS estimation and the requirements to the output. The second step is to determine the proper implementation of building blocks of the ORLS structure using orthogonal or hyperbolic transformations. The canonical ORLS structure and some possible orthogonal/hyperbolic implementations of their building blocks are presented. It is also shown that some of the orthogonal transformations are only applicable to certain types of ORLS structures and not to others.
The fast recursive least squares (RLS) algorithms have wide applications in signalprocessing and control. They are computationally efficient. Thus their stability is of major concern. In this paper, we investigate th...
详细信息
ISBN:
(纸本)0819412767
The fast recursive least squares (RLS) algorithms have wide applications in signalprocessing and control. They are computationally efficient. Thus their stability is of major concern. In this paper, we investigate the error propagation and stability of some typical fast RLS algorithms. Through a random example, we show that a typical conventional fast RLS algorithm is weakly unstable in computing both the residuals and the gain vectors and a QR based algorithm is expected to be weakly stable in computing the residuals but weakly unstable in computing the gain vectors. We propose an error correction scheme for computing the gain vectors.
An on-chip vLSI architecture for computation of Fourier transforms is presented. It performs the arithmetic operations in a digit-level pipeline fashion. For this purpose, the implementation of arithmetic operators is...
详细信息
ISBN:
(纸本)0819412767
An on-chip vLSI architecture for computation of Fourier transforms is presented. It performs the arithmetic operations in a digit-level pipeline fashion. For this purpose, the implementation of arithmetic operators is based on on-line (i.e., digit-serial and most significant digit first) arithmetic, and the transforms are performed by a parallel-pipeline version of the Cooley- Tukey fast Fourier transform (FFT) algorithm.
variants of the Winograd FFT algorithm for prime transform size are derived that offer options as to operational counts and arithmetic balance. Their implementations on vAX, IBM 3090 vF, and IBM RS/6000 are discussed....
详细信息
variants of the Winograd FFT algorithm for prime transform size are derived that offer options as to operational counts and arithmetic balance. Their implementations on vAX, IBM 3090 vF, and IBM RS/6000 are discussed. For processors that perform floating-point addition, floating-point multiplication, and floating-point ''multiply-add'' with the same time delay, variants of the FFT algorithm have been designed such that all floating-point multiplications can be overlapped by using ''multiply-add.'' The use of a tensor product formulation, throughout, gives a means for producing variants of algorithms matching to computer architectures.
暂无评论