We demonstrate that a small library of customizable interconnect components permits low-area, high-performance, reliable communication tuned to an application, by analogy withthe way designers customize their compute...
详细信息
FPGAs normally have numerous independent memory banks that can be accessed simultaneously, potentially offering a very large memory bandwidth. Adopting a suitable application-based memory partitioning strategy is thus...
详细信息
Static leakage power consumption is critical in modern FPGAs for many applications. Dynamic Power-Gating (DPG), in which parts of the FPGA in-use logic blocks are powered-down at run-time, is a promising technique to ...
详细信息
the effect of variability has become increasingly significant as a result of technology geometry scaling. this paper describes Asynchronous Assisting logic (AAL) blocks and the method of introducing them into modern F...
详细信息
Emerging programming models for chip heterogeneous multiprocessor (CHMP) systems elevate architecture details up into the source code. this eliminates portability and requires designers to navigate a multidimensional ...
详细信息
While the transistor density continues to grow exponentially in field-programmable Gate Arrays (FPGAs), the increased leakage current of CMOS transistors act as a power wall for the aggressive integration of transisto...
详细信息
Compressor trees offer an effective realization of the multiple input addition needed by many arithmetic operations. However, mapping the commonly used carry save adders (CSA) of classical compressor trees to FPGAs su...
详细信息
We present a scalable design for accelerating the problem of solving a dense linear system of equations using LU Decomposition. A novel systolic array architecture that can be used as a building block in scientific ap...
详细信息
ISBN:
(纸本)9781479936090
We present a scalable design for accelerating the problem of solving a dense linear system of equations using LU Decomposition. A novel systolic array architecture that can be used as a building block in scientific applications is described and prototyped on a Xilinx Virtex 6 FPGA. this solver has a throughput of around 3.2 million linear systems per second for matrices of size N= 4 and around 80 thousand linear systems per second for matrices of size N= 16. In comparison with similar work, our design offers up to a 12-fold improvement in speed whilst requiring up to 50% less hardware resources. As a result, a linear system of size N= 64 can be implemented on a single FPGA, whereas previous work was limited to a size of N= 12 and resorted to complex multi-FPGA architectures to scale. Finally, the scalable design can be adapted to different sized problems with minimum effort.
this paper proposes a FPGA implementation based on sliding processing window for Harris corner algorithm. It represents one of the most frequently used pre-processing method, for a wide variety of image processing alg...
详细信息
FPGA-based prototyping enables evaluating complex designs directly in hardware, at speeds orders of magnitude faster than simulation. However, this approach suffers from the lack of observability during debugging. To ...
详细信息
暂无评论