In this paper we study the effect of post-layout pin permutation of designs for fpga devices with non-uniform cell delays. We present a simple, but timing optimal, pin permutation scheme, and report the results of app...
详细信息
ISBN:
(纸本)9781595930293
In this paper we study the effect of post-layout pin permutation of designs for fpga devices with non-uniform cell delays. We present a simple, but timing optimal, pin permutation scheme, and report the results of applying the scheme on a set of public logic synthesis benchmark designs that were synthesized and placed by state-of-the-art commercial fpga design tools configured to maximum optimization level. Despite the preceding optimizations, we still observed an average timing improvement of 3.7%. This demonstrates the importance of fully utilizing non-uniform cell delays during design optimizations for modern fpga devices and the still presenting potential of improvement. Copyright 2005 acm.
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (fpgas). In this paper, we use the negotiation-based paradigm to pa...
详细信息
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (fpgas). In this paper, we use the negotiation-based paradigm to parallelize placement. Our new fpga placer, NAP (Negotiated Analytical Placement), uses an analytical technique for coarse placement and the negotiation paradigm for detailed placement. We describe the serial algorithm and report results. We also report findings related to parallelizing NAP under a multicast networking and multi-threaded operating system environment;the parallel placer is tolerant to multicast packet loss as well as out-of-order packet delivery. Our parallel placer exhibits little performance degradation while attaining speedups of 2 using 3 processors.
This paper examines the tradeoffs between flexibility, area, and power dissipation of programmable clock networks for field-programmablegatearrays (fpga's). The paper begins by describing a parameterized clock n...
详细信息
ISBN:
(纸本)1595932925
This paper examines the tradeoffs between flexibility, area, and power dissipation of programmable clock networks for field-programmablegatearrays (fpga's). The paper begins by describing a parameterized clock network model that describes a broad range of programmable clock network architectures. Specifically, the model supports architectures with multiple local and global clock domains and varying amounts of flexibility at various levels of the clock network. Using the model, the architectural parameters that control the flexibility of the clock network are varied to determine the cost of this flexibility in terms of area and power dissipation. From these experiments, the study finds that area and power costs are highest for networks with flexibility close to the logic blocks. Furthermore, it found that clock networks with local clock domains have little overhead and are significantly more efficient than clock networks without local clock domains for applications with multiple clocks. Copyright 2006 acm.
Technology mapping is an important step in the fpga CAD flow in which a network of simple gates is converted into a network of logic blocks. We consider enhancements to a traditional LUTbased mapping algorithm for an ...
详细信息
ISBN:
(纸本)9781605584102
Technology mapping is an important step in the fpga CAD flow in which a network of simple gates is converted into a network of logic blocks. We consider enhancements to a traditional LUTbased mapping algorithm for an fpga comprised of logic blocks which implement only a subset of functions of up to k variables- specifically, the logic block is a partial LUT, but it possesses more inputs than typical LUTs. Numerical results are presented to demonstrate the efficacy of our proposed techniques using real circuits mapped to a commercial fpga architecture. Copyright 2009 acm.
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an fpga by disabling/enabling short circuits programmed into the interconnect. The power pin of the fpga serves as the output of the...
详细信息
ISBN:
(纸本)9781450326711
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an fpga by disabling/enabling short circuits programmed into the interconnect. The power pin of the fpga serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.
The aim of this paper is to propose a real time reconfigurable (RTR) micro-fpga using new non volatile memory. Magnetic tunneling junctions (MTJ) used in Magnetic random access memories (MRAM.) are compatible with cla...
详细信息
ISBN:
(纸本)1595932925
The aim of this paper is to propose a real time reconfigurable (RTR) micro-fpga using new non volatile memory. Magnetic tunneling junctions (MTJ) used in Magnetic random access memories (MRAM.) are compatible with classical CMOS processes. Moreover remanent property of such a memory could limit configuration time and power consumption required at each power up of the die. Nevertheless, each configuration memory point has to be readable independently from each other, that is why the approach is different from the classical memory array one. Copyright 2006 acm.
We consider packing in the commercial fpga context and examine the speed, performance and power trade-offs associated with packing in a state-of-the art fpga - the Xilinx (R) Virtex (TM) -5 fpga. Two aspects of packin...
详细信息
ISBN:
(纸本)9781595939340
We consider packing in the commercial fpga context and examine the speed, performance and power trade-offs associated with packing in a state-of-the art fpga - the Xilinx (R) Virtex (TM) -5 fpga. Two aspects of packing are discussed: 1) packing for general logic blocks, and 2) packing for large IP blocks. Virtex-5 logic blocks contain dual-output 6-input look-up-tables (LUTs). Such LUTs call implement any single logic function requiring no more than 6 inputs, or any two logic functions requiring no more than 5 distinct inputs. The second LUT Output is associated with slower speed, and therefore, must be used judiciously. We present placement-based techniques for dual-output LUT packing that;lead to improved area-efficiency and power, with minimal performance degradation. We then move on to address packing for large IP blocks, specifically, block RAMs and DSPs. We present a packing optimization that is widely applicable in DSP designs that leads to significantly improved design performance.
We are proposing a shared-memory communication infrastructure that provides a common parallel programming interface for fpga and CPU components in a heterogeneous system. Our intent is to ease the integration of recon...
详细信息
To improve fpga performance for arithmetic circuits, this paper proposes a new architecture for fpga logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adja...
详细信息
ISBN:
(纸本)9781595939340
To improve fpga performance for arithmetic circuits, this paper proposes a new architecture for fpga logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed locally without the global routing network. Unlike previous carry-chains for binary and ternary addition, the carry chain used by the new cell only spans 2 logic blocks, which significantly improves the delay of multi-input addition operations mapped onto the fpga. The delay and area overhead that arises from augmenting a traditional fpga logic cell with the new compressor structure is minimal. Using this new cell, we observed an average speedup in combinational delay of 1.41 compared to adder trees synthesized using ternary adders. Copyright 2008 acm.
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implem...
详细信息
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implementing look up tables (LUTs) and connection and routing switches, several low-leakage implementations of pass transistor based multiplexers and routing switches are proposed and their design trade-offs are presented based on transistor-level simulation, physical design, and impact on overall system performance. We find that gate biasing, the use of redundant SRAM cells, and integration of multi-Vt technology are ideal for fpgas, and they can reduce leakage current by 2X-4X compared to an implementation without any leakage reduction technique. For some of the potential low-leakage design techniques being evaluated in our study, the impact on chip area is very minimal to an increase of 15% - 30%.
暂无评论