Retiming is a synchronous circuit transformation that can optimize the delay of a synchronous circuit by moving registers across combinational circuit elements. The combinational structure remains unchanged and the ob...
详细信息
ISBN:
(纸本)9781581134520
Retiming is a synchronous circuit transformation that can optimize the delay of a synchronous circuit by moving registers across combinational circuit elements. The combinational structure remains unchanged and the observable behavior of the circuit is identical to the original. In this paper, we address the problem of applying retiming techniques to circuits implemented in fieldprogrammablegatearrays (fpgas). fpgas contain prefabricated and configurable routing elements that allow us to easily implement a variety of circuits. However this interconnect contributes greatly to the overall delay in the implemented circuit. If a circuit is retimed prior to the placement and routing phases of the CAD flow, then it has no information about the delays introduced by the configurable interconnect. Our fundamental experiment is to determine whether there are any gains in tightly coupling retiming and placement so that the retiming algorithm has some estimate of the routing delays. Specifically, we introduce a post-placement retiming algorithm that understands how to take advantage of fpga architectural features. This retiming algorithm may introduce extra registers into the circuit. These new registers need to be placed in some location in the fpga. Retiming register placement is accomplished by a novel incremental clustering and placement algorithm. The incremental algorithm builds upon the placement of the non-retimed circuit to intelligently sift in the newly-introduced registers. In addition, we explore making the placement algorithms "retiming aware." These placement algorithms try to place logic blocks in such a way that the subsequent retiming produces better speed results. These techniques include the identification of retiming-critical cycles during placement. Our experiments show that the integration of retiming with placement results in 19% better clock periods in comparison to the application of retiming before the place and route steps.
field-programmablegatearrays have become popular ever since their introduction. Compared to other digital circuit implementation media, they have lower NRE cost and rapid turnaround with the penalties of reduced spe...
详细信息
ISBN:
(纸本)0769515622
field-programmablegatearrays have become popular ever since their introduction. Compared to other digital circuit implementation media, they have lower NRE cost and rapid turnaround with the penalties of reduced speed and larger size. Thus better fpgaprogrammable switch technology is desired in order to gain speed and density advantages. In this paper, Laser-induced MakeLink(TM)* technology is proposed as a programmable switch element. The Electrical resistance is as low as 0.8 Omega to 11 Omega, depending on the size of the link, which is 2-3 orders smaller than that of NMOS transistor in a SRAM based fpga. Thus the speed improvement for Laser field-programmablegate Array (Lfpga) is significant. Other features of Laser-induced vertical links technology, such as small size and radiation hardness, car, also greatly improve the fpga performance. The cluster-based Lfpga with 128 by 64 basic logic elements (BLE) is laid out under a 0.5 mum commercialized technology. The chip size is about 138mm(2).
As fpgas push ever deeper into mainstream digital design, there is an increasing desire for high-performance circuits. This paper describes a manual editor, called EVE, which can assist a designer to perform manual pa...
详细信息
ISBN:
(纸本)9781581134520
As fpgas push ever deeper into mainstream digital design, there is an increasing desire for high-performance circuits. This paper describes a manual editor, called EVE, which can assist a designer to perform manual packing, placement and pipelining of commercial fpga circuits to achieve a meaningful increase in performance. This effort is inspired by Von Herzen's paper, which proposed the notion of an "Event Horizon" - a high-speed circuit design approach in which complete knowledge of the timing effect of every synthesis change is used. It is very laborious to implement circuits using this approach;therefore we try to augment manual design tools in order to make this Event Horizon methodology easier to perform. This paper describes a first step in that direction, which focuses on placement, packing and pipelining. EVE provides an interactive environment that immediately reroutes and timing analyzes after each user circuit modification, giving an exact value for critical path delay. It can also suggest good placement positions and provide flip-flop insertion assist during pipelining. Compared to a state-of-the-art Synthesis and place and route flow, we used EVE to achieve an average of 12.7% higher operating frequency on a set of eight Xilinx Virtex-E circuits of 250 or fewer LUTs.
Circuits implemented in fpgas have delays that are dominated by its programmable interconnect. This interconnect provides the ability to implement arbitrary connections. However, it contains both highly capacitive and...
详细信息
Circuits implemented in fpgas have delays that are dominated by its programmable interconnect. This interconnect provides the ability to implement arbitrary connections. However, it contains both highly capacitive and resistive elements. The delay encountered by any connection depends strongly on the number of interconnect elements used to route the connection. These delays are only completely known after the place and route phase of the CAD flow. We propose the use of Clock Shifting optimization techniques to improve the clock frequency as a post place and route step. Clock Shifting Optimization is a technique first formalized in [4]. It is a cycle-stealing algorithm that allows one to reduce the critical path delay of a synchronous circuit by shifting the clock signals at each register. This technique allows late arriving signals to be sampled at a later point in time by intentionally introducing a skew on the clock input of the sampling register. Typical fpgas contain a number of special purpose global clock networks that distribute clock signals to every register in the chip. Unused global clock lines in fpgas can be used to distribute a finite set of clock skews to the entire circuit. We propose an efficient integer programming method to find the optimal circuit improvement for a finite set of clock skews. This technique is modified to consider inherent uncertainties present in the timing models. The uncertainty controls the aggressiveness of the optimizations as we must take great care in ensuring functionality for any range of possible timing characteristics. Our results confirm intuition that more aggressive speed optimizations can be performed as timing models become more accurate. We also show that providing 4 skewed versions of the nominal clock signal results in the best delay-area tradeoff. This result is evocative as it may suggest future fpga architectures that contain greater numbers of global clock lines, as we tradeoff gains in speed for greater po
This paper presents the theory and algorithm for SPFD-based global rewiring (SPFD-GR). SPFD-GR allows us to globally replace a target wire with some alternative wire possibly far away from the target. It successfully ...
详细信息
This paper presents the theory and algorithm for SPFD-based global rewiring (SPFD-GR). SPFD-GR allows us to globally replace a target wire with some alternative wire possibly far away from the target. It successfully overcomes the limitations of the existing SPFD-based local rewiring algorithm (SPFD-LR), which can only replace a wire with another wire that has the same destination node. In order to perform SPFD-based global rewiring, we developed the theory and algorithm for solving a fundamental problem in SPFD-based rewiring: Given the in-pin functions of a node and the SPFD at the node's out-pin, is there a way to modify the node's internal function so that the SPFD at the node's out-pin can be satisfied? Combined with a state-of-the-art partitioning algorithm, SPFD-GR scales well to large circuits with good synthesis quality. Our SPFD-based rewiring algorithm is ideal for LUT-based fpgas, where the node's internal function can be changed freely without any area or delay penalty. Extensive experimental results show that for LUT-based fpgas, the rewiring ability of SPFD-GR (in terms of the number of wires that have alternative wires) is 1.45, and 3 times that of SPFD-LR and an ATPG-based rewiring algorithm (with a preliminary experimental flow), respectively, while the run time is quite acceptable. When applied to the post-mapping area reduction for large LUT-based fpgas under circuit depth restriction, SPFD-GR achieves 17.1% average area reduction, with no or little delay increase.
In this paper, we propose the idea of temporal logic replication in dynamically reconfigurable field-programmablegate array partitioning to reduce communication cost. Temporal logic replication has never been explore...
详细信息
In this paper, we propose the idea of temporal logic replication in dynamically reconfigurable field-programmablegate array partitioning to reduce communication cost. Temporal logic replication has never been explored before. We define the min-area min-cut replication problem given a k-stage temporal partition satisfying all temporal constraints and devise an optimal algorithm to solve this problem. We have also devised a flow-based replication heuristic in case there is a tight area bound that limits the amount of replication. In addition, we will present a correct network flow model for partitioning sequential circuits temporally.
As interconnection delay plays an important role in determining circuit performance in fpgas, timing-driven fpga routing has received much attention recently. In this paper, we present a new timing-driven routing algo...
详细信息
As interconnection delay plays an important role in determining circuit performance in fpgas, timing-driven fpga routing has received much attention recently. In this paper, we present a new timing-driven routing algorithm for fpgas. The algorithm finds a routing with minimum critical path delay for a given placed circuit using the Lagrangian relaxation technique. Lagrangian multipliers used to relax timing constraints are updated by subgradient method over iterations. Incorporated into the cost function, these multipliers guide the router to construct routing tree for each net. During routing, the exclusivity constraints on each routing resources are also taken care of to route circuits successfully. Experimental results on benchmark circuits show that our approach outperforms the state-of-the-art VPR router.
The proceddings contains 24 papers from international syposium on fieldprogrammablegatearrays. Some of the topics discussed include: timing driven placement for hierarchical programmable logic devices;performance d...
详细信息
The proceddings contains 24 papers from international syposium on fieldprogrammablegatearrays. Some of the topics discussed include: timing driven placement for hierarchical programmable logic devices;performance driven mapping for CPLD architectures;detailed routing arhitectures for embeembedded programmable logic IP cores;Microprocessor and applification specific integrated circuits;the effect of reconfigurable units in superscalar processors;run-time defect tolerance using Jbits;fpga implementation of a novel, fast motin estimation algorithm for real video compression.
Circuits implemented in fpgas have delays that are dominated by its programmable interconnect. This interconnect provides the ability to implement arbitrary connections. However, it contains both highly capacitive and...
ISBN:
(纸本)9781581134520
Circuits implemented in fpgas have delays that are dominated by its programmable interconnect. This interconnect provides the ability to implement arbitrary connections. However, it contains both highly capacitive and resistive elements. The delay encountered by any connection depends strongly on the number of interconnect elements used to route the connection. These delays are only completely known after the place and route phase of the CAD flow. We propose the use of Clock Shifting optimization techniques to improve the clock frequency as a post place and route *** Shifting Optimization is a technique first formalized in [4]. It is a cycle-stealing algorithm that allows one to reduce the critical path delay of a synchronous circuit by shifting the clock signals at each register. This technique allows late arriving signals to be sampled at a later point in time by intentionally introducing a skew on the clock input of the sampling register. Typical fpgas contain a number of special purpose global clock networks that distribute clock signals to every register in the chip. Unused global clock lines in fpgas can be used to distribute a finite set of clock skews to the entire circuit. We propose an efficient integer programming method to find the optimal circuit improvement for a finite set of clock skews. This technique is modified to consider inherent uncertainties present in the timing models. The uncertainty controls the aggressiveness of the optimizations as we must take great care in ensuring functionality for any range of possible timing *** results confirm intuition that more aggressive speed optimizations can be performed as timing models become more accurate. We also show that providing 4 skewed versions of the nominal clock signal results in the best delay--area tradeoff. This result is evocative as it may suggest future fpga architectures that contain greater numbers of global clock lines, as we tradeoff gains in speed for greater pow
暂无评论