The FPGA architectural issue of the effect of logic block functionality on FPGA performance and density is investigated. In particular, in the context of lookup tables (LUT), cluster-based island-style FPGAs, the effe...
详细信息
The FPGA architectural issue of the effect of logic block functionality on FPGA performance and density is investigated. In particular, in the context of lookup tables (LUT), cluster-based island-style FPGAs, the effect of LUT size and cluster size on the speed and logic density of an FPGA is analyzed. A fully timing-driven experimental flow, in which a set of benchmark circuits are synthesized, is used into different cluster based logic book architectures, which contain groups of LUTs and flip-flops.
Clock network power in field-programmablegatearrays (FP- ) is considered and two complementary approaches for power reduction in the Xilinx RVirtexTM-5 FPGA are. The approaches are unique in that they lever- specifi...
详细信息
ISBN:
(纸本)9781605584102
Clock network power in field-programmablegatearrays (FP- ) is considered and two complementary approaches for power reduction in the Xilinx RVirtexTM-5 FPGA are. The approaches are unique in that they lever- specific architectural aspects of Virtex-5 to achieve re- in dynamic power consumed by the clock network. first approach comprises a placement-based technique reduce interconnect resource usage on the clock network, reducing capacitance and power (up to 12%). The approach borrows the "clock gating" notion from the domain and applies it to FPGAs. Clock enable sig- on flip-flops are selectively migrated to use the dedi- clock enable available on the FPGA's built-in clock, leading to reduced toggling on the clock intercon- and lower power (up to 28%). Power reductions are achieved without any performance penalty, on average. Copyright 2009 acm.
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in FPGAs for implem...
详细信息
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in FPGAs for implementing look up tables (LUTs) and connection and routing switches, several low-leakage implementations of pass transistor based multiplexers and routing switches are proposed and their design trade-offs are presented based on transistor-level simulation, physical design, and impact on overall system performance. We find that gate biasing, the use of redundant SRAM cells, and integration of multi-Vt technology are ideal for FPGAs, and they can reduce leakage current by 2X-4X compared to an implementation without any leakage reduction technique. For some of the potential low-leakage design techniques being evaluated in our study, the impact on chip area is very minimal to an increase of 15% - 30%.
The fieldprogrammable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit i...
详细信息
ISBN:
(纸本)9781595939340
The fieldprogrammable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit is transformed by merging disparate addition and multiplication operations into large multi-input addition operations, which are synthesized as compressor trees on the FPCA;the remaining portion of the circuit is synthesized on the FPGA. This paper presents a series of architectural improvements to the FPCA that reduce routing delay, increase flexibility and component utilization, and simplify the integration process. Using an FPGA containing six FPCAs, we observed average and maximum speedups of 1.60x and 2.40x on a set of arithmetic benchmarks.
This paper introduces a coarse-grained FPGA architecture that is specialized for high-performance Finite Impulse Response (FIR) filtering. The proposed architecture provides the flexibility of a DSP processor with per...
详细信息
This paper introduces a coarse-grained FPGA architecture that is specialized for high-performance Finite Impulse Response (FIR) filtering. The proposed architecture provides the flexibility of a DSP processor with performance and area efficiency similar to that of a custom ASIC design, while allowing all of the basic FIR design parameters, including coefficient precision, to be configured. Previous research has already shown that FPGAs can provide a high-performance alternative to DSP processors. Experimental comparisons in this paper show that the performance and area efficiency of the proposed architecture is similar to that of custom approaches across a wide range of filter sizes and configurations.
This paper presents a split CPU-FPGA Multi-Scalar Multiplication (MSM) engine written in Hardcaml. Hardcaml MSM was submitted to the 2022 ZPrize cryptography competition and won 1st place in the FPGA track. Hardcaml M...
详细信息
A fieldprogrammable Analogue Array (FPAA) is presented based on switched capacitor technology. The chip allows true array programming of undedicated analogue cells. There is provision for internal signal-conditional ...
详细信息
ISBN:
(纸本)9780897917735
A fieldprogrammable Analogue Array (FPAA) is presented based on switched capacitor technology. The chip allows true array programming of undedicated analogue cells. There is provision for internal signal-conditional switching as part of the normal function of the array. With the presented chip it is possible to implement a very wide range of analogue signal processing functions such as data conversion, linear signal processing and non-linear functions. These functional configurations can be reconfigured and parameterised on the fly during concurrent signal processing. Very rapid prototyping of switched capacitor circuits is facilitated. The fieldprogrammable nature of the chip allows several markets/customers to be targeted simultaneously with obvious benefits in terms of both volume of sales and reduced time to market.
Memory-related constraints (memory bandwidth, cache size) are nowadays the performance bottleneck of most computational applications. Especially in the scenario of multiple cores, the performance does not scale with t...
详细信息
ISBN:
(纸本)9781450305549
Memory-related constraints (memory bandwidth, cache size) are nowadays the performance bottleneck of most computational applications. Especially in the scenario of multiple cores, the performance does not scale with the number of cores in many cases. In our work, we present our FPGA-based solution for the 3D Reverse Time Migration (RTM) algorithm. As the most computationally demanding imaging algorithm in current oil and gas exploration, RIM involves various computational challenges, such as a high demand for storage size and bandwidth, and a poor cache behavior. Combining optimizations from both the algorithmic and architectural perspectives, our FPGA-based solution manages to remove the memory constraints and provide a high performance that can scale well with the amount of computational resources available. Compared with an optimized CPU implementation using two quad-core Intel Nehalem CPUs, our solution achieves 4x speedup on two Virtex-5 FPGAs, and 8x speedup on two Virtex-6 FPGAs. Our projection demonstrates that the performance will continue to scale with the future increase of FPGA capacities.
The size of configuration bitstreams of field-programmablegatearrays (FPGA) is increasing rapidly. Compression techniques are used to decrease the size of bitstreams. In this paper, an appropriate bitstream format a...
详细信息
In floating-point datapaths synthesized on FPGAs, the shifters that perform mantissa alignment and normalization consume a disproportionate number of LUTs. Shifters are implemented using several rows of small multiple...
详细信息
ISBN:
(纸本)9781450311557
In floating-point datapaths synthesized on FPGAs, the shifters that perform mantissa alignment and normalization consume a disproportionate number of LUTs. Shifters are implemented using several rows of small multiplexers;unfortunately, multiplexer-based logic structures map poorly onto LUTs. FPGAs, meanwhile, contain a large number of multiplexers in the programmable routing network;these multiplexer are placed under static control of the FPGA's configuration bitstream. In this work, we modify some of the routing multiplexers in the intra-cluster routing network of a CLB in an FPGA to implement shifters for floating-point mantissa alignment and normalization;the number of CLBs required for these operations is reduced by 67%. If shifting is not required, the routing multiplexers that have been modified can be configured to operate as normal routing multiplexers, so no functionality is sacrificed. The area overhead incurred by these modifications is small, and there is no need to modify every routing multiplexer in the FPGA. Experiments show that there is no negative impact in terms of clock frequency or routability for benchmarks that do not use the dynamic multiplexers.
暂无评论