We present here an efficient systolic implementation for 3-D IIR digital filters. The systolic implementation is obtained by using an algebraic mapping technique. This new mapping technique gives us the choice to mix ...
详细信息
We present here an efficient systolic implementation for 3-D IIR digital filters. The systolic implementation is obtained by using an algebraic mapping technique. This new mapping technique gives us the choice to mix pipelined variables and broadcast variables. We also determine, through the mapping method, the buffer sizes, the direction of variables propagations and the data feeding and extracting points. The resultant systolic array implementation is a modular structure composed of 2-D filter modules connected by simple buffers. This new systolic implementation is regular, modular and amenable to VLSI implementation.
In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array ...
详细信息
In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array processors using systematic approach. The systematic approach guarantees to find optimal systolic array processors from a large solution space in terms of the number of processing elements and I/O channels, the processing time, topology, pipeline period, etc. The optimal systolic array processors are scalable, modular and suitable for VLSI implementation. An application of the designed systolic array processors to the prime-factor DFT is also presented.
The development of multiple communication standards and services has created the need for a flexible and efficient computational platform for baseband signal processing. Using a set of heterogeneous reconfigurable exe...
详细信息
The development of multiple communication standards and services has created the need for a flexible and efficient computational platform for baseband signal processing. Using a set of heterogeneous reconfigurable execution units (RCEUS) and a homogeneous control mechanism, the proposed reconfigurable architecture achieves a large computational capability while still providing a high degree of flexibility. Software tools and a library of commonly used algorithms are also proposed in this paper to provide a convenient framework for hardware generation and algorithm mapping. In this way, the architecture can be specified in a high-level language and it also provides increased hardware resource usage. Finally, we evaluate the system's performance on representative algorithms, specifically a 32-tap finite impulse response (FIR) filter and a 256-point fast Fourier transform (FFT), and compare them with commercial digital signal processor (DSP) chips as well as with other reconfigurable and multi-core architectures.
In this paper we propose a new general purpose VLSI architecture called ring-connected trees (RCT) for parallel processing. RCT requires less hardware in terms of processing elements and connecting links compared to a...
详细信息
In this paper we propose a new general purpose VLSI architecture called ring-connected trees (RCT) for parallel processing. RCT requires less hardware in terms of processing elements and connecting links compared to a mesh-of-tree of comparable size and its diameter is less than that of mesh. It requires less chip area, less maximum edge length and crossing number compared to those required by mesh-of-tree [I] [F.T. Leighton, Layout for the shuffle-exchange graph and lower bound techniques for VLSI, Ph.D. dissertation, Department of Mathematics, MIT, 1981] under the Grid model of Thompson [2] [C.D. Thompson, Area-time complexity for VLSI. Technical report, Division of Computer Science, University of California, Berkeley, CA, January 1984]. By using spare PEs and links, RCT is made to tolerate multiple faults. Suitability of this architecture for multipurpose applications is demonstrated by designing parallel version of algorithms for a number of common computational problems. This structure requires linear and sublinear time for these algorithms and this is quite reasonable considering the simpler nature of the architecture. (C) 1998 Published by Elsevier Science B.V.
The memory-based processor array (MPA) was previously designed as an effective memory-processor integrated architecture. The MPA can be easily attached into any host system via memory interface. In this paper, the imp...
详细信息
The memory-based processor array (MPA) was previously designed as an effective memory-processor integrated architecture. The MPA can be easily attached into any host system via memory interface. In this paper, the impact of the memory interface structure is analytically analyzed for computer vision tasks. An analytical model is constructed to describe the characteristics of the memory interface structure. Performance improvement for the memory interface model of the MPA system can be 6-40% for vision tasks consisting of sequential and data parallel tasks. mappingalgorithms to implement convolution and connected component labeling on the MPA are also presented. The asymptotic time complexities of the algorithms are evaluated to verify the cost-effectiveness and the efficiency of the MPA system. (C) 2000 Elsevier Science B.V. All rights reserved.
Because of the increasing need to develop efficient high-speed computational kernels, researchers have been looking at various acceleration technologies. One approach is to use field programmable gate arrays (FPGAs) i...
详细信息
Because of the increasing need to develop efficient high-speed computational kernels, researchers have been looking at various acceleration technologies. One approach is to use field programmable gate arrays (FPGAs) in conjunction with general purpose processors to form what are known as high performance reconfigurable computers (HPRCs). HPRCs have already been shown to work well for both fixed-point and integer calculations. Floating-point calculations are a different matter;obtaining speedups has been somewhat elusive. This article, after introducing the three primary HPRC development flows, takes a detailed look at "the three p's," which addresses the crucial relationship among performance, pipelining, and parallelism. It also examines "the FPGA design boundary," which addresses some of the heuristics that allow developers to determine which application modules can be mapped onto the FPGAs. These ideas are illustrated by way of a simple floating-point application that is mapped onto a contemporary HPRC. This article expands upon earlier work by including details on how to map customized intellectual property cores into an HPRC environment via a hybrid development flow.
In order to meet the computing speed required by 4G wireless communications, and to provide the different data processing widths required by different algorithms, an SIMD (Single Instruction Multiple Data) core has be...
详细信息
ISBN:
(纸本)9783037857519
In order to meet the computing speed required by 4G wireless communications, and to provide the different data processing widths required by different algorithms, an SIMD (Single Instruction Multiple Data) core has been designed. The ISA (Instruction Set Architecture) and main components of the SIMD core are discussed focus on how the SIMD core can be configured. Finally, the simulation result of the multiplication of two 8*8 matrices is presented to show the execution of instructions in the proposed SIMD core, and the result verifies the correctness of the SIMD core design.
A specialized CAD tool is described that will take a user's high level code description of a non-uniform affinely indexed algorithm and automatically generate abstract latency-optimal systolic arrays. Emphasis has...
详细信息
ISBN:
(纸本)0819446467
A specialized CAD tool is described that will take a user's high level code description of a non-uniform affinely indexed algorithm and automatically generate abstract latency-optimal systolic arrays. Emphasis has been placed on ease of use and the ability to either force conformation to specific design criteria or perform unconstrained explorations. How such design goals are achieved is illustrated in the context of LU decomposition and the matrix Lyapunov equation. The tool is then used to generate new I-D and 2-D hardware efficient systolic arrays for the discreet Fourier transform that take advantage of the use of the radix-4 matrix decomposition.
In order to meet the requirements of high speed and real-time in SAR processing system, as well as breaking the bondage that traditional processing board is subject to the algorithm. This paper designs a generic mass ...
详细信息
ISBN:
(纸本)9783038351153
In order to meet the requirements of high speed and real-time in SAR processing system, as well as breaking the bondage that traditional processing board is subject to the algorithm. This paper designs a generic mass storage real-time signal processing module with TI's latest multi-core DSP-TMS320C6678 based on OpenVPX high-speed serial bus standard. This module has standardized, modularized, reconfigurable characteristics. This paper discusses the design of this module and the implementation of typical parallel SAR imaging algorithm mapping on this module. This peocessing module has been applied in a variety of airborne SAR radar signal processing systems and fully validated its powerful processing ability and versatility.
This paper maps a new application, namely vector-scalar operations, onto the M1 MorphoSys (from UCI) reconfigurable computing system. A performance analysis study of the M1 RC is also presented to evaluate the efficie...
详细信息
ISBN:
(纸本)0769511651
This paper maps a new application, namely vector-scalar operations, onto the M1 MorphoSys (from UCI) reconfigurable computing system. A performance analysis study of the M1 RC is also presented to evaluate the efficiency of the algorithm execution on the M1 system. For Instance, 2 algorithms on an 8x8 RC array M1 were run, and numerical examples were simulated to validate our results, using the MorphoSys mULATE program, which simulates MorphoSys operation.
暂无评论