Latency insensitive communication oers many potential benets for fpga designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to u...
详细信息
ISBN:
(纸本)9781450326711
Latency insensitive communication oers many potential benets for fpga designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to understand the costs and trade-os associated with any new design style. This paper presents optimized implementations of latency insensitive communication building blocks, quanties their overheads in terms of area and frequency, and provides guidance to designers on how to generate high-speed and areae cient latency insensitive systems.
We are proposing a shared-memory communication infrastructure that provides a common parallel programming interface for fpga and CPU components in a heterogeneous system. Our intent is to ease the integration of recon...
详细信息
To improve fpga performance for arithmetic circuits, this paper proposes a new architecture for fpga logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adja...
详细信息
ISBN:
(纸本)9781595939340
To improve fpga performance for arithmetic circuits, this paper proposes a new architecture for fpga logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed locally without the global routing network. Unlike previous carry-chains for binary and ternary addition, the carry chain used by the new cell only spans 2 logic blocks, which significantly improves the delay of multi-input addition operations mapped onto the fpga. The delay and area overhead that arises from augmenting a traditional fpga logic cell with the new compressor structure is minimal. Using this new cell, we observed an average speedup in combinational delay of 1.41 compared to adder trees synthesized using ternary adders. Copyright 2008 acm.
An algorithm is presented for partitioning a design in time. The algorithm divides a large, technology-mapped design into multiple configurations of a time-multiplexed fpga. These configurations are rapidly executed i...
详细信息
ISBN:
(纸本)9780897919784
An algorithm is presented for partitioning a design in time. The algorithm divides a large, technology-mapped design into multiple configurations of a time-multiplexed fpga. These configurations are rapidly executed in the fpga to emulate the large design. The tool includes facilities for optimizing the partitioning to improve routability, for fitting the design into more configurations than the depth of the critical path and for compressing the critical path of the design into fewer configurations, both to fit the design into the device and to improve performance. Scheduling results are shown for mapping designs into an 8-configuration time-multiplexed fpga and for architecture investigation for a time-multiplexed fpga.
This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (fpgas), including studies, implementation techniques, operators, and structures, in various ...
详细信息
This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (fpgas), including studies, implementation techniques, operators, and structures, in various area-time tradeoffs. It covers the integer operations of addition/subtraction, multiplication, squaring, division, and square root, in parallel, and in both serial modes (least-significant digit first, and online). Many people, including researchers in the field of computer arithmetic, parallel computing, digital signal and image processing, system-on-a-programmable chip (SoPC) designers, and other people with a need to implement special purpose arithmetic circuits on fpgas, might find such a review useful, either as an introduction to the topic, as a knowledge update, or for reference.
The proceedings contains 22 papers from the 1997 internationalsymposium on fieldprogrammablegatearrays. Topics discussed include: fieldprogrammablegate array (fpga) architectures;fpga partitioning and synthesis;...
详细信息
The proceedings contains 22 papers from the 1997 internationalsymposium on fieldprogrammablegatearrays. Topics discussed include: fieldprogrammablegate array (fpga) architectures;fpga partitioning and synthesis;rapid prototyping and emulation;reconfigurable computing;and fpga floorplanning and routing.
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implem...
详细信息
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implementing look up tables (LUTs) and connection and routing switches, several low-leakage implementations of pass transistor based multiplexers and routing switches are proposed and their design trade-offs are presented based on transistor-level simulation, physical design, and impact on overall system performance. We find that gate biasing, the use of redundant SRAM cells, and integration of multi-Vt technology are ideal for fpgas, and they can reduce leakage current by 2X-4X compared to an implementation without any leakage reduction technique. For some of the potential low-leakage design techniques being evaluated in our study, the impact on chip area is very minimal to an increase of 15% - 30%.
Locality exploitation is essential to asymptotic energy minimization for gate array netlist evaluation. Naive implementations that ignore locality, including flat crossbars and simple processors based on monolithic me...
详细信息
We present the design of a high-performance, highly pipelined asynchronous fpga. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficient...
详细信息
We present the design of a high-performance, highly pipelined asynchronous fpga. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficiently take advantage of this large amount of pipelining. Our fpga, which does not use a clock to sequence computations, automatically "self-pipelines" its logic without the designer needing to be explicitly aware of all pipelining details. This property makes our fpga ideal for throughput-intensive applications and we require minimal place and route support to achieve good performance. Benchmark circuits taken from both the asynchronous and clocked design communities yield throughputs in the neighborhood of 300-400 MHz in a TSMC 0.25μm process and 500-700 MHz in a TSMC 0.18μm process.
暂无评论