As more and more complex applications are implemented on FPGAs, high-level design tools are needed to reduce the design time. A good high-level synthesis tool usually has an automated design space exploration pass to ...
详细信息
ISBN:
(纸本)1581138539
As more and more complex applications are implemented on FPGAs, high-level design tools are needed to reduce the design time. A good high-level synthesis tool usually has an automated design space exploration pass to determine the effects of various compiler optimizations on the area and power of the synthesized hardware. Such a pass needs early estimation of area and power. Towards this end, we have developed high-level equation based area and power macro-models for various RTL level operators such as adders, multipliers, and logical operators. The area model is parameterized with the bit width of the device and the power model takes into account input switching activity and input spatial correlation as well as input bit width. These models are derived by actual synthesis of these RTL operators using back-end logic synthesis and place-and-route tools. Compared with the other approaches, our method generated a uniform macro-model for each operator with fewer coefficients and sometimes lower degrees. It is also easier to analyze the power sensitivity to different parameters. Experimental results show that these area and power models are accurate and efficient.
Migration of software from older general purpose embedded processors onto newer mixed hardware/software Systems-On-Chip (SOC) platforms is becoming an increasingly important topic. Automatic translation of general pur...
详细信息
ISBN:
(纸本)1581138539
Migration of software from older general purpose embedded processors onto newer mixed hardware/software Systems-On-Chip (SOC) platforms is becoming an increasingly important topic. Automatic translation of general purpose software binaries and assembly code onto hardware implementations using FPGAs require sophisticated scheduling and allocation algorithms to maximize the resource utilization of such hardware devices. This paper describes the effects of scheduling and chaining of node operations in a CDFG onto an FPGA. The effects of register allocation on scheduled nodes are also discussed. The Texas Instruments C6000 DSP processor architecture was chosen as the DSP processor platform and assembly code, and the Xilinx Virtex II XC2V250 was chosen as the target FPGA. Results are reported on ten benchmarks, which show that scheduling with chaining operations produces the best results on FPGAs, while the addition of register allocation in fact generates poorer designs in terms of area and frequency.
Long past are the days when programmable logic (FPGAs and CPLDs) were used only for prototyping and interface logic. Today's modem devices have complicated architectures with close to 200,000 logic elements and fl...
详细信息
ISBN:
(纸本)1581139470
Long past are the days when programmable logic (FPGAs and CPLDs) were used only for prototyping and interface logic. Today's modem devices have complicated architectures with close to 200,000 logic elements and flip-flops, dedicated blocks for DSP processing, embedded memories and processors, and support many I/O standards including high-speed serial and now embedded tranceivers. The CAD software to support FPGAs has grown in sophistication and scope to support these larger, more complicated, devices and the size of software groups at FPGA vendors is now larger than all but the biggest EDA companies. Most user designs are now complete systems and go to production as an FPGA. In this tutorial we will talk about recent FPGA and CPLD device architectures and CAD tools, with an emphasis on the interaction between the software and the architecture, and how this has driven recent evolutions and revolutions in PLD architecture. We will discuss the software behind the FPGA - the synthesis, place and route algorithms and CAD flow used to convert a high-level design into a bitstream to program the device. Finally, we will discuss issues in designing hardware for FPGAs, including coding styles to achieve better performance and area, and effective use of dedicated resources on FPGAs.
This paper proposes an innovative method for SPFD-based rewiring in Look-Up-Table-based (LUT-based) FPGA circuits. The new method adds new input wires to two or more LUT's in order to remove or to replace a target...
详细信息
ISBN:
(纸本)1581138539
This paper proposes an innovative method for SPFD-based rewiring in Look-Up-Table-based (LUT-based) FPGA circuits. The new method adds new input wires to two or more LUT's in order to remove or to replace a target wire. There have been a few rewiring methods for FPGA circuits so far, such as the original SPFD-based optimization sometimes called Local Rewiring (LR), SPFD-based Global Rewiring (GR) and SPFD-based Enhanced Rewiring (ER). However, all of them replace one wire with other new input wire to one LUT but not with those to two or more LUT's. Moreover, the LR removes or replaces input wires with new one to the same LUT only, and the GR and ER topologically limit the LUT's where new input wires are added. Our new method, called One-to-Many Rewiring (OMR), loosens such topological constraints for more flexible FPGA circuit transformation so that it is easier to import constraints on physical design to the logic optimization. The experimental results show our OMR can transform FPGA circuits more flexibly than the LR, GR and ER. by introducing the new manipulation, wire addition. The OMR can rewire 1.2 times as many wires as the existing methods, especially, the ER. The computation time is as short as the existing methods.
With increasing device and design sizes, Interconnect Planning is fast becoming an important design issue for large FPGA based designs. The fundamental requirement for interconnect planning is the ability to estimate ...
详细信息
ISBN:
(纸本)0769520723
With increasing device and design sizes, Interconnect Planning is fast becoming an important design issue for large FPGA based designs. The fundamental requirement for interconnect planning is the ability to estimate the routing requirements of a given design at all stages of physical design. A number of interconnect estimation methods have been proposed, but very few operate prior to placement. Pre-placement estimation is very useful for detailed design space exploration during logic synthesis and earlier stages of design. We propose a new local neighborhood analysis based method to estimate the wirelengths of every net in a given netlist, prior to placement. We assume an optimal placement with respect to total wirelength and estimate the bounding-box sizes of all the nets. We then use the bounding-box estimates to compute the post-routing peak channel width of the device. Our method efficiently handles pad constrained designs. We compare our net bounding-box and peak channel width estimates with the post-placement and post-routing results obtained using V-PR-[1], a commonly used FPGA tool suite.
Networks-on-Chip (NoCs) emerge as the solution for the problem of interconnecting cores (or IPs) in Systems-on-Chip (SoCs) which require reusable and scalable communication architectures. The building block of a NoC i...
详细信息
ISBN:
(纸本)1581139470
Networks-on-Chip (NoCs) emerge as the solution for the problem of interconnecting cores (or IPs) in Systems-on-Chip (SoCs) which require reusable and scalable communication architectures. The building block of a NoC is its router (or switch), whose architecture has great impact on the costs and on the performance of the network. This work presents a parameterizable router architecture for NoCs which is based on a canonical template and on a library of building components offering different alternatives and implementations for the circuits used for packet forwarding in a NoC. Such features allow to explore the NoC design space in order to obtain a router configuration which best fits the performance requirements of a target application at lower silicon costs. We describe the router architecture and present some synthesis results which demonstrate the feasibility of this new router.
We propose a new energy efficient method of designing switch blocks inside FPGAs using novel variations of the Dual Threshold CMOS (DTMOS) based switches instead of the conventional NMOS pass transistor or tri-state b...
详细信息
We propose a new energy efficient method of designing switch blocks inside FPGAs using novel variations of the Dual Threshold CMOS (DTMOS) based switches instead of the conventional NMOS pass transistor or tri-state buffer based switches. By intelligently sharing the extra transistor needed for using DTMOS based switches, the area overhead is kept to a minimum. Sleep transistors are used to reduce sub-threshold leakage. Using our new, novel design, we obtain a 16% improvement in the power-delay product during the active mode per switch and a factor of 20 improvement in the stand-by mode, over conventional approaches. Extensive simulation results over benchmark circuits in CMOS 0.13μ are presented to illustrate the superiority of the proposed techniques. Since the proposed techniques target the switches and multiplexers which are present in large numbers on FPGAs, the overall improvement in the power-delay product is significant for an application implemented on a FPGA having the proposed features.
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (FPGAs). In this paper, we use the negotiation-based paradigm to pa...
详细信息
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (FPGAs). In this paper, we use the negotiation-based paradigm to parallelize placement. Our new FPGA placer, NAP (Negotiated Analytical Placement), uses an analytical technique for coarse placement and the negotiation paradigm for detailed placement. We describe the serial algorithm and report results. We also report findings related to parallelizing NAP under a multicast networking and multi-threaded operating system environment;the parallel placer is tolerant to multicast packet loss as well as out-of-order packet delivery. Our parallel placer exhibits little performance degradation while attaining speedups of 2 using 3 processors.
In recent years the challenge of high performance, low power retargettable embedded system has been faced with different technological and architectural solutions. In this paper we present a new configurable unit expl...
详细信息
In recent years the challenge of high performance, low power retargettable embedded system has been faced with different technological and architectural solutions. In this paper we present a new configurable unit explicitly designed to implement additional reconfigurable pipelined datapaths, suitable for the design of reconfigurable processors. A VLIW reconfigurable processor has been implemented on silicon in a standard 0.18 μm CMOS technology to prove the effectiveness of the proposed unit. Testing on a signal processing algorithms benchmark showed speedups from 4.3x to 13.5x and energy consumption reduction up to 92%.
暂无评论