This paper present a framework for automatic mapping of perfectly nested loops with constant dependences onto regular processor arrays, Suitable for direct implementation oil Field Programmable Gate arrays (FPGAs). Th...
详细信息
ISBN:
(纸本)9781595936028
This paper present a framework for automatic mapping of perfectly nested loops with constant dependences onto regular processor arrays, Suitable for direct implementation oil Field Programmable Gate arrays (FPGAs). The problem is modeled as that of finding a Suitable completion procedure for a full-rank linear transformation on the iteration space. The approach enables extraction of necessary degrees of communication-free and pipelined parallelism to optimize performance under the resource constraints of limited logic resources and I/O bandwidth available on an FPGA. The generation of control signals for the custom processing elements is also addressed. Examples of automatic derivation of parallel designs for some common nested loops are provided. Experimental results on the Cray XD1 show that an FPGA-based matrix-multiplication design obtained using the framework attains significant speedup on the XD1's attached FPGA, when compared to execution oil the XD1 CPU.
We consider the problem of automatic mapping of computation-intensive loop nests onto FPGA hardware. The regular cell array structure of these chips reflects the parallelism in regular loop-like computations. Furtherm...
详细信息
We consider the problem of automatic mapping of computation-intensive loop nests onto FPGA hardware. The regular cell array structure of these chips reflects the parallelism in regular loop-like computations. Furthermore, the flexibility of FPGAs allows the cost-effective implementation of reconfigurable high performance processorarrays. So far, there exists no continuous design flow that allows automated generation of FPGA configuration data from a loop nest specified in a high level language. Here, we present a methodology for automatic generation of synthesizable VHDL code specifying a processor array and optimized for FPGA implementation.
Two regular processor arrays for multiplying unsigned numbers are described. The essence is a structure that allows designs with different degrees of pipelining to be synthesised. The impact of varying the degree of p...
详细信息
Two regular processor arrays for multiplying unsigned numbers are described. The essence is a structure that allows designs with different degrees of pipelining to be synthesised. The impact of varying the degree of pipelining on performance is assessed.
暂无评论