FPGA place and route is time consuming, often serving as the major obstacle inhibiting a fast edit-compile-test loop in prototyping and development and the major obstacle preventing late-bound hardware and design mapp...
详细信息
FPGA place and route is time consuming, often serving as the major obstacle inhibiting a fast edit-compile-test loop in prototyping and development and the major obstacle preventing late-bound hardware and design mapping for reconfigurable systems. Previous work showed that hardware-assisted routing can accelerate fanout-free routing on Fat-Trees by three orders of magnitude with modest modifications to the network itself. In this paper, we show how these techniques can be applied to any FPGA and how they can be implemented on top of LUT networks in cases where modification of the FPGA itself is not justified. We further show how to accommodate fanout and how to achieve comparable route quality to software-based methods. For a tree network, we estimate an FPGA implementation of our routing logic could route the Toronto Place and Route Benchmarks at least two orders of magnitude faster than a software Pathfinder while achieving within 3% of the aggregate quality. Preliminary results on small mesh benchmarks achieve within one track of vpr -fast.
Aggressive scaling increases the number of devices we can integrate per square millimeter but makes it increasingly difficult to guarantee that each device fabricated has the intended operational characteristics. With...
详细信息
ISBN:
(纸本)9781605584102
Aggressive scaling increases the number of devices we can integrate per square millimeter but makes it increasingly difficult to guarantee that each device fabricated has the intended operational characteristics. Without careful mitigation, component yield rates will fall, potentially negating the economic benefits of scaling. the fine-grained reconfigurability inherent in FPGAs is a powerful tool that can allow us to drop the stringent requirement that every device be fabricated perfectly in order for a component to be useful. To exploit inherent FPGA reconfigurability while avoiding full CAD mapping, we propose lightweight techniques compatible withthe current single bitstream model that can avoid defective devices, reducing yield loss at high defect rates. In particular, by embedding testing operations and alternative path configurations into the bitstream, each FPGA can avoid defects by making only simple, greedy decisions at bitstream load time. With 20% additional tracks above the minimum routable hannel width, routes can tolerate 0.01% switch defect rates, raising yield from essentially 0% to near 100%. Copyright 2009 acm.
this paper presents a flexible FPGA architecture evaluation framework, named fpgaEVA-LP, for power efficiency analysis of LUT-based FPGA architectures. Our work has several contributions: (i) We develop a mixed-level ...
详细信息
this paper presents a flexible FPGA architecture evaluation framework, named fpgaEVA-LP, for power efficiency analysis of LUT-based FPGA architectures. Our work has several contributions: (i) We develop a mixed-level FPGA power model that combines switch-level models for interconnects and macromodels for LUTs;(ii) We develop a tool that automatically generates a back-annotated gate-level netlist with post-layout extracted capacitances and delays;(iii) We develop a cycle-accurate power simulator based on our power model. It carries out gate-level simulation under real delay model and is able to capture glitch power;(iv) Using the frame work fpgaEVA-LP, we study the power efficiency of FPGAs, in 0.10um technology, under various settings of architecture parameters such as LUT sizes, cluster sizes and wire segmentation schemes and reach several important conclusions. We also present the detailed power consumption distribution among different FPGA components and shed light on the potential opportunities of power optimization for future FPGA designs (e.g., ≤ 0.10um technology).
the future of high-performance computing is likely to rely the ability to efficiently exploit huge amounts of paral- . One way of taking advantage of this parallelism is formulate problems as "embarrassingly para...
详细信息
ISBN:
(纸本)9781605584102
the future of high-performance computing is likely to rely the ability to efficiently exploit huge amounts of paral- . One way of taking advantage of this parallelism is formulate problems as "embarrassingly parallel" Monte- simulations, which allow applications to achieve a lin- speedup over multiple computational nodes, without re- a super-linear increase in inter-node communication. , such applications are reliant on a cheap supply high quality random numbers, particularly for the three maximum entropy distributions: uniform, used as a source of randomness;Gaussian, for discrete-time;and exponential, for discrete-event simulations. this paper we look at four different types of platform: multi-core CPUs (Intel Core2);GPUs (NVidia 200);FPGAs (Xilinx Virtex-5);and Massively Paral- Processor arrays (Ambric AM2000). For each platform determine the most appropriate algorithm for generat- each type of number, then calculate the peak generation rate and estimated power efficiency for each device. Copyright 2009 acm.
In this paper, we study the problem of placement-driven technology mapping for table-lookup based FPGA architectures to optimize circuit performance. Early work on technology mapping for FPGAs such as Chortle-d[14] an...
详细信息
In this paper, we study the problem of placement-driven technology mapping for table-lookup based FPGA architectures to optimize circuit performance. Early work on technology mapping for FPGAs such as Chortle-d[14] and Flowmap[3] aim to optimize the depth of the mapped solution without consideration of interconnect delay. Later works such as Flowmap-d[7], Bias-Clus[4] and EdgeMap consider interconnect delays during mapping, but do not take into consideration the effects of their mapping solution on the final placement. Our work focuses on the interaction between the mapping and placement stages. First, the interconnect delay information is estimated from the placement, and used during the labeling process. A placement-based mapping solution which considers both global cell congestion and local cell congestion is then developed. Finally, a legalization step and detailed placement is performed to realize the design. We have implemented our algorithm in a LUT based FPGA technology mapping package named PDM (Placement-Driven Mapping) and tested the implementation on a set of MCNC benchmarks. We use the tool VPR[1][2] for placement and routing of the mapped netlist. Experimental results show the longest path delay on a set of large MCNC benchmarks decreased by 12.3% on the average.
this paper presents experimental measurements of the differences between a 90nm CMOS FPGA and 90nm CMOS Standard Cell ASICs in terms of logic density, circuit speed and power consumption. We are motivated to make thes...
详细信息
ISBN:
(纸本)1595932925
this paper presents experimental measurements of the differences between a 90nm CMOS FPGA and 90nm CMOS Standard Cell ASICs in terms of logic density, circuit speed and power consumption. We are motivated to make these measurements to enable system designers to make better informed choices between these two media and to give insight to FPGA makers on the deficiencies to attack and thereby improve FPGAs. In the paper, we describe the methodology by which the measurements were obtained and we show that, for circuits containing only combinational logic and flipflops, the ratio of silicon area required to implement them in FPGAs and ASICs is on average 40. Modern FPGAs also contain "hard" blocks such as multiplier/accumulators and block memories and we find that these blocks reduce this average area gap significantly to as little as 21. the ratio of critical path delay, from FPGA to ASIC, is roughly 3 to 4, with less influence from block memory and hard multipliers. the dynamic power consumption ratio is approximately 12 times and, with hard blocks, this gap generally becomes smaller. Copyright 2006 acm.
the goal of this paper is to perform a timing optimization of a circuit described by a network of cells on a target structure whose connection delays have discrete values following its hierarchy. the circuits is model...
详细信息
ISBN:
(纸本)9780897919784
the goal of this paper is to perform a timing optimization of a circuit described by a network of cells on a target structure whose connection delays have discrete values following its hierarchy. the circuits is modelled by a set of timed cones whose delay histograms allow their classification into critical, potential critical and neutral cones according to predicted delays. the floorplanning is then guided by this cone structuring and has two innovative features: first, it is shown that the placement of the elements of the neutral cones has no impact on timing results, thus a significant reduction is obtained;second, despite a greedy approach, a near optimal floorplan is achieved in a large number of examples.
this work presents an architecture for a hardware implementation of the Rijndael block cipher with 128-bit key. Rijjdael block cipher was recently adopted by the United States government as the new Advanced Encryption...
详细信息
ISBN:
(纸本)0769518079
this work presents an architecture for a hardware implementation of the Rijndael block cipher with 128-bit key. Rijjdael block cipher was recently adopted by the United States government as the new Advanced Encryption Standard - AES. the proposed architecture was designed for low-cost, mid-density FPGA.
In this paper, we present an algorithm for circuit partitioning with complex resource constraints in large FPGAs. Traditional partitioning methods estimate the capacity of an FPGA device by counting the number of logi...
详细信息
ISBN:
(纸本)9780897919784
In this paper, we present an algorithm for circuit partitioning with complex resource constraints in large FPGAs. Traditional partitioning methods estimate the capacity of an FPGA device by counting the number of logic blocks, however this is not accurate withthe increasing capacity and diverse resource types in the new FPGA architectures. We propose a network flow based method to optimally check whether a circuit or a sub-circuit is feasible for a set of available heterogeneous resources. the feasibility checking procedure is integrated in the FM-based algorithm for circuit partitioning. Incremental flow technique is employed for efficient implementation. Experimental results on the MCNC benchmark circuits show that our partitioning algorithm not only yields good results, but also is efficient. Our algorithm for partitioning with complex resource constraints is applicable for both multiple FPGA designs (e.g. logic emulation systems) and partitioning-based placement algorithms for a single large hierarchical FPGA (e.g. Actel's ES6500 FPGA family).
this paper presents a new power saving, high speed FPGA design enhancing a previous SiGe CML FPGA based on the Xilinx 6200 FPGA. the design aims at having a higher performance but minimizing power consumption. the new...
详细信息
this paper presents a new power saving, high speed FPGA design enhancing a previous SiGe CML FPGA based on the Xilinx 6200 FPGA. the design aims at having a higher performance but minimizing power consumption. the new SiGe process has traded off the circuit's performance for reduced power consumption. the power supply voltage has been reduced from 3.4 V to 2.0 V. the structure of the Basic Cell, including the Configurable Logic Block (CLB) and routing multiplexers (MUXs), has been modified so that the supply voltage reduction can be attained. Simulations have shown that the gate delay of the new Basic Cell is reduced from 130 ps in the prior design to 51 ps. the total power consumption for each Basic Cell has been reduced 94% from 71 mW to 4.2 mW. making a large scale FPGA feasible. this design is currently under fabrication for testing.
暂无评论