As interconnection delay plays an important role in determining circuit performance in FPGAs, timing-driven FPGA routing has received much attention recently. In this paper, we present a new timing-driven routing algo...
详细信息
As interconnection delay plays an important role in determining circuit performance in FPGAs, timing-driven FPGA routing has received much attention recently. In this paper, we present a new timing-driven routing algorithm for FPGAs. The algorithm finds a routing with minimum critical path delay for a given placed circuit using the Lagrangian relaxation technique. Lagrangian multipliers used to relax timing constraints are updated by subgradient method over iterations. Incorporated into the cost function, these multipliers guide the router to construct routing tree for each net. During routing, the exclusivity constraints on each routing resources are also taken care of to route circuits successfully. Experimental results on benchmark circuits show that our approach outperforms the state-of-the-art VPR router.
Circuits implemented in FPGAs have delays that are dominated by its programmable interconnect. This interconnect provides the ability to implement arbitrary connections. However, it contains both highly capacitive and...
ISBN:
(纸本)9781581134520
Circuits implemented in FPGAs have delays that are dominated by its programmable interconnect. This interconnect provides the ability to implement arbitrary connections. However, it contains both highly capacitive and resistive elements. The delay encountered by any connection depends strongly on the number of interconnect elements used to route the connection. These delays are only completely known after the place and route phase of the CAD flow. We propose the use of Clock Shifting optimization techniques to improve the clock frequency as a post place and route *** Shifting Optimization is a technique first formalized in [4]. It is a cycle-stealing algorithm that allows one to reduce the critical path delay of a synchronous circuit by shifting the clock signals at each register. This technique allows late arriving signals to be sampled at a later point in time by intentionally introducing a skew on the clock input of the sampling register. Typical FPGAs contain a number of special purpose global clock networks that distribute clock signals to every register in the chip. Unused global clock lines in FPGAs can be used to distribute a finite set of clock skews to the entire circuit. We propose an efficient integer programming method to find the optimal circuit improvement for a finite set of clock skews. This technique is modified to consider inherent uncertainties present in the timing models. The uncertainty controls the aggressiveness of the optimizations as we must take great care in ensuring functionality for any range of possible timing *** results confirm intuition that more aggressive speed optimizations can be performed as timing models become more accurate. We also show that providing 4 skewed versions of the nominal clock signal results in the best delay--area tradeoff. This result is evocative as it may suggest future FPGA architectures that contain greater numbers of global clock lines, as we tradeoff gains in speed for greater pow
Conventional technology mapping algorithms for SRAM-based fieldprogrammablegatearrays (FPGAs) are normally carried out on a fixed logic decomposition of a circuit. The impact of logic decomposition on delay and are...
详细信息
ISBN:
(纸本)9781581133417
Conventional technology mapping algorithms for SRAM-based fieldprogrammablegatearrays (FPGAs) are normally carried out on a fixed logic decomposition of a circuit. The impact of logic decomposition on delay and area of the technology mapping solutions is not well understood. In this paper, we present an algorithm named SLDMap that performs delay-minimized technology mapping on a large set of decompositions and simultaneously controls the mapping area under delay constraints. Our study leads to two conclusions: (1) For depth minimization, the best algorithms in conventional flow (dmig + CutMap) produce satisfactory results with a short runtime, even with a fixed decomposition;(2) When all the structural decompositions of the 6-bounded Boolean network are explored, SLDMap consistently outperforms the state-of-the-art separate flow (dmig + CutMap) by 12% in depth and 10% in area on average;it also consistently outperforms the state-of-the-art combined approach dogma by 8% in depth and 6% in area on average.
The partial reconfiguration feature of some of the current-generation fieldprogrammablegatearrays (FPGAs) can improve dependability by detecting and correcting errors in on-chip configuration data. Such an error re...
详细信息
ISBN:
(纸本)9781581133417
The partial reconfiguration feature of some of the current-generation fieldprogrammablegatearrays (FPGAs) can improve dependability by detecting and correcting errors in on-chip configuration data. Such an error recovery process can be executed online with minimal interference of user applications. However, because Look-up Tables (LUTs) in Configurable Logic Blocks (CLBs) of FPGAs can also implement memory modules for user applications, a memory coherence issue arises such that memory contents in user applications may be altered by the online configuration data recovery process. In this paper, we investigate this memory coherence problem and propose a memory coherence technique that does not impose extra constraints on the placement of memory-configured LUTs. Theoretical analyses and simulation results show that the proposed technique guarantees the memory coherence with a very small (on the order of 0.1%) execution time overhead in user applications.
Wave-steering is a new design methodology that realizes high throughput circuits by embedding layout friendly synthesized structures in silicon. In the wave-steering design methodology, circuits inherently utilize lat...
详细信息
ISBN:
(纸本)9781581133417
Wave-steering is a new design methodology that realizes high throughput circuits by embedding layout friendly synthesized structures in silicon. In the wave-steering design methodology, circuits inherently utilize latches. Inside the synthesized structures they are used for signal skewing, and on the interconnects to guarantee the correct arrival times at the inputs. Recently, we proposed a novel high-throughput FPGA architecture based on the wavesteering design principle to handle throughput-intensive applications. Previously our work was focussed mainly on the Logic Block (LB) design. In this paper we discuss a pipelined interconnect scheme to support the strict timing requirements that is necessitated by the wave-steered design style. We characterize designs that best fit the new architecture and show that as technology scales down towards deep submicron (DSM), this FPGA fabric shows an increasing throughput performance.
This paper presents an FPGA architecture for video encoding according to the H.263 standard for video teleconferencing systems. The implementation is based on an off-the-shelf FPGA and is embedded in a PCI plug-in car...
详细信息
ISBN:
(纸本)9781581133417
This paper presents an FPGA architecture for video encoding according to the H.263 standard for video teleconferencing systems. The implementation is based on an off-the-shelf FPGA and is embedded in a PCI plug-in card with on-board SRAM plus external SRAM. The most complex part of the H.263 protocol, a base-line encoder, was implemented. The strategies, which have been applied to build the complex encoding operations, are treated in this paper. The complete application is able to operate at 30 MHz. This leads to a maximum compression speed of 120 Mbit/s allowing simultaneous real-time operation of several video streams in a single reconfigurable chip. Enhanced coding options can also become implemented with present-day FPGAs. The use of FPGA technology enables the adaptation of hardware to various protocols and environments by software and therefore saves development time and hardware costs.
This paper describes LRoute, a novel router for the popular and scalable hierarchical Complex programmable Logic Devoices (CPLDs). CPLD routing has constraints om routing topologies due to architectural limitations an...
详细信息
ISBN:
(纸本)9781581133417
This paper describes LRoute, a novel router for the popular and scalable hierarchical Complex programmable Logic Devoices (CPLDs). CPLD routing has constraints om routing topologies due to architectural limitations and performance considerations. These constraints make the problem quite different from FPGA routing and render the routing problem more complicated. Extensions of popular FPGA routers like the maze router performs poorly on such CPLDs. There is also little published work on CPLD routing. LRoute uses a different paradigm based on the Lagrangian Relaxation framework in the theory of mathematical programming. It respects the topology constraints imposed and routes a circuit with minimum delay. We tested this router on a set of industry problems that commercial software failed to route. Our router was able to route all of them very quickly.
FPGAs are a promising technology for developing high-performance embedded systems. The density and performance of FPGAs have drastically improved over the past few years. Consequently, the size of the configuration bi...
详细信息
ISBN:
(纸本)9781581133417
FPGAs are a promising technology for developing high-performance embedded systems. The density and performance of FPGAs have drastically improved over the past few years. Consequently, the size of the configuration bit-streams has also increased considerably. As a result, the cost-effectiveness of FPGA-based embedded systems is significantly affected by the memory required for storing various FPGA configurations. This paper proposes a novel compression technique that reduces the memory required for storing FPGA configurations and results in high decompression efficiency. Decompression efficiency corresponds to the decompression hardware cost as well as the decompression rate. The proposed technique is applicable to any SRAM-based FPGA device since configuration bit-streams are processed as raw data. The required decompression hardware is simple and is independent of the individual semantics of configuration bit-streams or specifc features of the on-chip configuration mechanism. Moreover, the time to configure the device is not affected by our compression technique. Using our technique, we demonstrate up to 41% savings in memory for configuration bit-streams of several real-world applications.
Matching and searching computations play an important role in the indexing of data. These computations are typically encoded in very tight loops with a single index variable and a simple search/matching predicate. The...
详细信息
ISBN:
(纸本)9781581133417
Matching and searching computations play an important role in the indexing of data. These computations are typically encoded in very tight loops with a single index variable and a simple search/matching predicate. Their inherent sequential nature, either because of data dependences but more often because of very strong control dependences, makes it impossible to apply existing data dependence and parallelization analysis to exploit significant levels parallelism on traditional architectures. This paper describes a class of searching and matching computations and describes a mapping strategy to map these computations to hardware. We have developed a compiler analysis in SUIF using array data dependence analysis and implicit loop unrolling analysis to expose more parallelism for the parallel evaluation of these computations. Our compiler generates parallel hardware specifications in VHDL. The resulting parallel hardware yields significant performance improvements when these kernel operators are repeated over shifted portions of the input data on FPGA-based computing architectures.
暂无评论