In this work we investigate the routing architecture of FPGAs, focusing primarily on determining the best distribution of routing segment lengths and the best mix of pass transistor and tri-state buffer routing switch...
详细信息
In this work we investigate the routing architecture of FPGAs, focusing primarily on determining the best distribution of routing segment lengths and the best mix of pass transistor and tri-state buffer routing switches. While most commercial FPGAs contain many length 1 wires (wires that span only one logic block) we find that wires this short lead to FPGAs that are inferior in terms of both delay and routing area. Our results show instead that it is best for FPGA routing segments to have lengths of 4 to 8 logic blocks. We also show that 50% to 80% of the routing switches in an FPGA should be pass transistors, with the remainder being tri-state buffers. Architectures that employ the best segmentation distributions and the best mixes of pass transistor and tri-state buffer switches found in this paper are not only 11% to 18% faster than a routing architecture very similar to that of the Xilinx XC4000X but also considerably simpler. These results are obtained using an architecture investigation infrastructure that contains a fully timing-driven router and detailed area and delay models.
Cut enumeration is a common approach used in a number of FPGA synthesis and mapping algorithms for consideration of various possible LUT implementations at each node in a circuit. Such an approach is very general and ...
详细信息
Cut enumeration is a common approach used in a number of FPGA synthesis and mapping algorithms for consideration of various possible LUT implementations at each node in a circuit. Such an approach is very general and flexible, but often suffers high computational complexity and poor scalability. In this paper, we develop several efficient and effective techniques on cut enumeration, ranking and pruning. These techniques lead to much better runtime and scalability of the cut-enumeration based algorithms;they can also be used to compute a tight lower-bound on the size of an area-minimum mapping solution. For area-oriented FPGA mapping, experimental results show that the new techniques lead to over 160 X speed-up over the original optimal duplication-free mapping algorithm, achieve mapping solutions with 5-21% smaller area for heterogeneous FPGAs compared to those by Chortle-crf [6], MIS-pga-new [9], and TOS-TUM [4], yet with over 100X speed-up over MIS-pga-new [9] and TOS-TUM [4].
Multi-FPGA systems are used as custom computing machines to solve compute intensive problems and also in the verification and prototyping of large circuits. In this paper, we address the problem of routing multi-termi...
详细信息
Multi-FPGA systems are used as custom computing machines to solve compute intensive problems and also in the verification and prototyping of large circuits. In this paper, we address the problem of routing multi-terminal nets in a multi-FPGA system that uses partial crossbars as interconnect structures. First, we model the multi-terminal routing problem as a partitioned bin packing problem and formulate it as an integer linear programming problem where the number of variables is exponential. A fast heuristic is applied to compute an upper bound on the routing solution. Then, a column generation technique is used to solve the linear relaxation of the initial master problem in order to obtain a lower bound on the routing solution. This is followed by an iterative branch-and-price procedure that attempts to find a routing solution somewhere between the two established bounds. In this regard, the proposed algorithm guarantees an exact routing solution by searching a branch-and-price tree. Due to the tightness of the bounds, the branch-and-price tree is small resulting in shorter execution times. Experimental results are provided for different netlists and board configurations in order to demonstrate the algorithm's performance. The obtained results show that the algorithm finds an exact routing solution in a very short time.
Pipelining of data path structures increases the throughput rate at the expense of enlarged resource usage and latency unless architectures optimized towards specific applications are used. This paper describes a nove...
详细信息
Pipelining of data path structures increases the throughput rate at the expense of enlarged resource usage and latency unless architectures optimized towards specific applications are used. This paper describes a novel methodology for the design of generic bit-level pipelined data paths that have the low resource usage and latency of specifically tailored architectures but still allow the flexible trade-off between speed and resource requirements inherent in generic circuits. This is achieved through the elimination of all skew and alignment flip-flops from the data path whilst still maintaining the original pipelining scheme, hence allowing more compact structures with decreased circuit delays. The resulting low latency is beneficial in the realization of all recursive signal-processing applications and the reduced resource usage enables particularly the efficient FPGA realization of high performance signal processing functions. The design process is illustrated through the high level synthesis-based FPGA realization of a 9th-order wave digital filter, demonstrating that high performance and efficient resource usage are possible. For example, the implementation of a wave digital filter with 10-bit signal word length and 6 bit coefficients using a Xilinx XCA013XL-1 device supports sample rates of 2.5 MHz.
This work presents the design of an energy efficient FPGA architecture. Significant reduction in the energy consumption is achieved by tackling both circuit design and architecture optimization issues concurrently. A ...
详细信息
ISBN:
(纸本)9781581131338
This work presents the design of an energy efficient FPGA architecture. Significant reduction in the energy consumption is achieved by tackling both circuit design and architecture optimization issues concurrently. A hybrid interconnect structure incorporating nearest neighbor connections, symmetric mesh architecture, and hierarchical connectivity is used. The energy of the interconnect is also reduced by employing low-swing circuit techniques. These techniques have been employed to design and fabricate an FPGA. Preliminary analysis show energy improvement of more than an order of magnitude when compared to existing commercial architectures.
Multi-fieldprogrammablegate array (FPGA) systems (MFS) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture;...
详细信息
Multi-fieldprogrammablegate array (FPGA) systems (MFS) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture;the manner in which wires, FPGAs and fieldprogrammable interconnect devices (FPID) are connected. A new routing architecture, called hybrid complete-graph and partial-crossbar (HCGP), which has superior speed and cost compared to a partial crossbar is proposed. The architecture uses both hard-wired and programmable connections between the FPGAs.
In this paper, we present a new retiming-based technology mapping algorithm for look-up table-based fieldprogrammablegatearrays. The algorithm is based on a novel iterative procedure for computing all k-cuts of all...
详细信息
In this paper, we present a new retiming-based technology mapping algorithm for look-up table-based fieldprogrammablegatearrays. The algorithm is based on a novel iterative procedure for computing all k-cuts of all nodes in a sequential circuit, in the presence of retiming. The algorithm completely avoids flow computation which is the bottleneck of previous algorithms. Due to the fact that k is very small in practice, the procedure for computing all k-cuts is very fast. Experimental results indicate the overall algorithm is very efficient in practice.
An algorithm is presented for partitioning a design in time. The algorithm divides a large, technology-mapped design into multiple configurations of a time-multiplexed FPGA. These configurations are rapidly executed i...
详细信息
ISBN:
(纸本)9780897919784
An algorithm is presented for partitioning a design in time. The algorithm divides a large, technology-mapped design into multiple configurations of a time-multiplexed FPGA. These configurations are rapidly executed in the FPGA to emulate the large design. The tool includes facilities for optimizing the partitioning to improve routability, for fitting the design into more configurations than the depth of the critical path and for compressing the critical path of the design into fewer configurations, both to fit the design into the device and to improve performance. Scheduling results are shown for mapping designs into an 8-configuration time-multiplexed FPGA and for architecture investigation for a time-multiplexed FPGA.
暂无评论