This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (fpgas), including studies, implementation techniques, operators, and structures, in various ...
详细信息
This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (fpgas), including studies, implementation techniques, operators, and structures, in various area-time tradeoffs. It covers the integer operations of addition/subtraction, multiplication, squaring, division, and square root, in parallel, and in both serial modes (least-significant digit first, and online). Many people, including researchers in the field of computer arithmetic, parallel computing, digital signal and image processing, system-on-a-programmable chip (SoPC) designers, and other people with a need to implement special purpose arithmetic circuits on fpgas, might find such a review useful, either as an introduction to the topic, as a knowledge update, or for reference.
The proceedings contains 22 papers from the 1997 internationalsymposium on fieldprogrammablegatearrays. Topics discussed include: fieldprogrammablegate array (fpga) architectures;fpga partitioning and synthesis;...
详细信息
The proceedings contains 22 papers from the 1997 internationalsymposium on fieldprogrammablegatearrays. Topics discussed include: fieldprogrammablegate array (fpga) architectures;fpga partitioning and synthesis;rapid prototyping and emulation;reconfigurable computing;and fpga floorplanning and routing.
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implem...
详细信息
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implementing look up tables (LUTs) and connection and routing switches, several low-leakage implementations of pass transistor based multiplexers and routing switches are proposed and their design trade-offs are presented based on transistor-level simulation, physical design, and impact on overall system performance. We find that gate biasing, the use of redundant SRAM cells, and integration of multi-Vt technology are ideal for fpgas, and they can reduce leakage current by 2X-4X compared to an implementation without any leakage reduction technique. For some of the potential low-leakage design techniques being evaluated in our study, the impact on chip area is very minimal to an increase of 15% - 30%.
Locality exploitation is essential to asymptotic energy minimization for gate array netlist evaluation. Naive implementations that ignore locality, including flat crossbars and simple processors based on monolithic me...
详细信息
We present the design of a high-performance, highly pipelined asynchronous fpga. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficient...
详细信息
We present the design of a high-performance, highly pipelined asynchronous fpga. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficiently take advantage of this large amount of pipelining. Our fpga, which does not use a clock to sequence computations, automatically "self-pipelines" its logic without the designer needing to be explicitly aware of all pipelining details. This property makes our fpga ideal for throughput-intensive applications and we require minimal place and route support to achieve good performance. Benchmark circuits taken from both the asynchronous and clocked design communities yield throughputs in the neighborhood of 300-400 MHz in a TSMC 0.25μm process and 500-700 MHz in a TSMC 0.18μm process.
This paper presents an fpga-specific implementation of the floating-point tangent function. The implementation inputs values in the interval [-π/2,π/2], targets the IEEE-754 single-precision format and has an accura...
详细信息
Memory-related constraints (memory bandwidth, cache size) are nowadays the performance bottleneck of most computational applications. Especially in the scenario of multiple cores, the performance does not scale with t...
详细信息
ISBN:
(纸本)9781450305549
Memory-related constraints (memory bandwidth, cache size) are nowadays the performance bottleneck of most computational applications. Especially in the scenario of multiple cores, the performance does not scale with the number of cores in many cases. In our work, we present our fpga-based solution for the 3D Reverse Time Migration (RTM) algorithm. As the most computationally demanding imaging algorithm in current oil and gas exploration, RIM involves various computational challenges, such as a high demand for storage size and bandwidth, and a poor cache behavior. Combining optimizations from both the algorithmic and architectural perspectives, our fpga-based solution manages to remove the memory constraints and provide a high performance that can scale well with the amount of computational resources available. Compared with an optimized CPU implementation using two quad-core Intel Nehalem CPUs, our solution achieves 4x speedup on two Virtex-5 fpgas, and 8x speedup on two Virtex-6 fpgas. Our projection demonstrates that the performance will continue to scale with the future increase of fpga capacities.
This paper discusses architectural issues arising from the use of dynamic reconfiguration and shows a possible use of dynamic reconfiguration to extend and accelerate a computation performed in system-on-a-chip design...
详细信息
This paper discusses architectural issues arising from the use of dynamic reconfiguration and shows a possible use of dynamic reconfiguration to extend and accelerate a computation performed in system-on-a-chip designs with microprocessors with fixed instruction sets. Further a sample application is discussed that uses a dynamically reconfigurable fpga to implement different floating-point calculations in hardware, reconfigured as required by the execution of the user code. The implementation data for two dynamically reconfigurable platforms available on the market - the Xilinx Virtex2 family fpgas and the Atmel FPSLIC family fpgas - is compared in terms of resource requirements, operating frequency, and power consumption.
Recent developments have shown fpgas to be effective for data centre applications, but debugging support in that environment has not evolved correspondingly. This presents an additional barrier to widespread adoption....
详细信息
暂无评论