We report our current research in a computer assisted methodology for synthesizing regular arrayprocessors using the ALPHA language and design environment. The design process starts from an algorithmic level descript...
详细信息
We report our current research in a computer assisted methodology for synthesizing regular arrayprocessors using the ALPHA language and design environment. The design process starts from an algorithmic level description of the function and finishes with a netlist of an array processor which performs the specified function. To illustrate the proposed approach, we present the design of an array processor to do polynomial division.
This paper presents a register transfer modeling scheme for array processor simulation. Its main goals are to verify the applicationspecific design by real data computation, and to help fine tune the array architectu...
详细信息
This paper presents a register transfer modeling scheme for array processor simulation. Its main goals are to verify the applicationspecific design by real data computation, and to help fine tune the array architecture by precise timing analysis. The data flow graph of the design is translated into a register transfer language which is further combined with a hardware description module. An interactive simulator SISim v2.0 has been implemented to simulate the behavior of such a system. The results are compared with the expected values to verify the array processor design. The recorded timing information can help the designer to analyze the system and improve the performance and resource utilization.
Details are presented of the DAC (DSP ASIC Compiler) silicon compiler framework. DAC allows a non-specialist to automatically design DSP ASICs and DSP ASIC cores directly form a high level specification. Typical desig...
详细信息
Details are presented of the DAC (DSP ASIC Compiler) silicon compiler framework. DAC allows a non-specialist to automatically design DSP ASICs and DSP ASIC cores directly form a high level specification. Typical designs take only several minutes and the resulting layouts are comparable in area and performance to handcrafted designs.
A new approach for computing the 2-D DFT and 2-D DCT is presented. A new design of a systolic array for transposed matrix multiplication is also shown in this paper. The new 2-D DFT/DCT avoids the need for the array t...
详细信息
A new approach for computing the 2-D DFT and 2-D DCT is presented. A new design of a systolic array for transposed matrix multiplication is also shown in this paper. The new 2-D DFT/DCT avoids the need for the array transposer that was required by earlier implementations, and all processing can be pipelined easily. This approach employs a simple and regular structure that is well suited for VLSI implementation. This array can be easily scaled without modifying the basic control scheme and PE structure.
We first argue that the spectrum of processor architectures-from general-purpose processors (GPPs) to application-specificprocessors (ASPs) to FPGA co-processors-is narrowing (but not converging), due to some dominat...
详细信息
ISBN:
(纸本)0769526829
We first argue that the spectrum of processor architectures-from general-purpose processors (GPPs) to application-specificprocessors (ASPs) to FPGA co-processors-is narrowing (but not converging), due to some dominating physical and economic forces. We then suggest some research opportunities driven by these forces.
20 years ago, the first Systolic array Workshop was held at the University of Oxford. This became an annual event with the name being changed to the applicationspecificarray Processor (ASAP) conference at the Prince...
详细信息
ISBN:
(纸本)0769526829
20 years ago, the first Systolic array Workshop was held at the University of Oxford. This became an annual event with the name being changed to the applicationspecificarray Processor (ASAP) conference at the Princeton Workshop in 1990. Under either name, the conference highlights the implementation of special purpose computational processors, a basic feature of which is performing large numbers of arithmetic computations per second. In this paper we discuss representations of numbers and, in particular, the properties and advantages of arithmetic processors using these representations. In a retrospective, this paper looks at our own attempts, over the past 2 decades, to find new ways of representing, and computing with, numbers in order to achieve some advantages at the implementation level.
We present a fully scalable SIMD array architecture for a most efficient implementation of pattern classification by nearest-neighbor algorithms using the city-block metric. The elementary accumulator cell is highly o...
详细信息
We present a fully scalable SIMD array architecture for a most efficient implementation of pattern classification by nearest-neighbor algorithms using the city-block metric. The elementary accumulator cell is highly optimized for a sequential accumulation of absolute integer differences, so that several hundreds of them can be easily integrated on a single chip. A two-dimensional M × N array structure, reflecting an inherent two-fold data parallelism of the applications, reduces the data transfer to off-chip memory from O(M × N) to O(M + N). Here, we discuss the realization of a VLSI structure, the system architecture, and large networks of associative blocks as possible applications.
This paper addresses the problem of deriving optimised array architectures for real-time multi-dimensional signal processing systems, as occurring in image, speech and video applications. The starting point is a set o...
详细信息
This paper addresses the problem of deriving optimised array architectures for real-time multi-dimensional signal processing systems, as occurring in image, speech and video applications. The starting point is a set of Weak Single Assignment Codes (WSAC's). For this abstract specification, we solve the difficult task of finding a globally optimised architecture with matched throughput while avoiding an explosion of the search space. The cost function not only includes the data-path area but incorporates also the crucial foreground and background memory storage. The effectiveness of our solution has been substantiated with realistic test cases.
This paper motivates the use of hardware virtualization on coarse-grained reconfigurable architectures. We introduce Zippy, a coarse-grained multi-context hybrid CPU with architectural support for efficient hardware v...
详细信息
ISBN:
(纸本)0769524079
This paper motivates the use of hardware virtualization on coarse-grained reconfigurable architectures. We introduce Zippy, a coarse-grained multi-context hybrid CPU with architectural support for efficient hardware virtualization. The architectural details and the corresponding tool flow are outlined. As a case study, we compare the nonvirtualized and the virtualized execution of an ADPCM decoder
暂无评论