We present an architecture for a synthesizable datapath-oriented fieldprogrammablegate Array (FPGA) core which can be used to provide post-fabrication flexibility to a System-on-Chip (SoC). Our architecture is optim...
详细信息
ISBN:
(纸本)9781595936004
We present an architecture for a synthesizable datapath-oriented fieldprogrammablegate Array (FPGA) core which can be used to provide post-fabrication flexibility to a System-on-Chip (SoC). Our architecture is optimized for bus-based operations that are common in signal processing and computation intensive applications. It employs a directional routing architecture, which allows it to be synthesized using standard ASIC design tools and flows. We also describe a proof-of-concept layout of our core. It is shown that the proposed architecture is significantly more area efficient than the best previously reported synthesizable programmable logic core.
this paper describes the Altera Stratix logic and routing architecture. the primary goals of the architecture were to achieve high performance and logic density. We give an overview of the entire device, and then focu...
详细信息
this paper describes the Altera Stratix logic and routing architecture. the primary goals of the architecture were to achieve high performance and logic density. We give an overview of the entire device, and then focus on the logic and routing architecture. the Stratix logic architecture is based on a cluster of ten 4-input LUTs and its routing consists of staggered routing lines. We describe the development of the routing architecture, including its directional bias, its direct-drive routing which reduces both area and delay. the logic array block and logic cell design is also described, and new routing structures with in the logic array block, and logic element features are described.
In this paper, we present and analyze a sophisticated communication architecture that allows to integrate many different modules into a system by FPGA reconfiguration at runtime. Furthermore, we examine how this archi...
详细信息
ISBN:
(纸本)9781605584102
In this paper, we present and analyze a sophisticated communication architecture that allows to integrate many different modules into a system by FPGA reconfiguration at runtime. Furthermore, we examine how this architecture can be implemented on low-cost Spartan-3 devices. It will be demonstrated that modules can be exchanged in a system without disturbing the communication architecture. the paper points out, that the capabilities of Spartan-3 FPGAs are sufficient to build complex reconfigurable systems. Copyright 2009 acm.
FPGA place and route is time consuming, often serving as the major obstacle inhibiting a fast edit-compile-test loop in prototyping and development and the major obstacle preventing late-bound hardware and design mapp...
详细信息
FPGA place and route is time consuming, often serving as the major obstacle inhibiting a fast edit-compile-test loop in prototyping and development and the major obstacle preventing late-bound hardware and design mapping for reconfigurable systems. Previous work showed that hardware-assisted routing can accelerate fanout-free routing on Fat-Trees by three orders of magnitude with modest modifications to the network itself. In this paper, we show how these techniques can be applied to any FPGA and how they can be implemented on top of LUT networks in cases where modification of the FPGA itself is not justified. We further show how to accommodate fanout and how to achieve comparable route quality to software-based methods. For a tree network, we estimate an FPGA implementation of our routing logic could route the Toronto Place and Route Benchmarks at least two orders of magnitude faster than a software Pathfinder while achieving within 3% of the aggregate quality. Preliminary results on small mesh benchmarks achieve within one track of vpr -fast.
In this paper, we study the problem of placement-driven technology mapping for table-lookup based FPGA architectures to optimize circuit performance. Early work on technology mapping for FPGAs such as Chortle-d[14] an...
详细信息
In this paper, we study the problem of placement-driven technology mapping for table-lookup based FPGA architectures to optimize circuit performance. Early work on technology mapping for FPGAs such as Chortle-d[14] and Flowmap[3] aim to optimize the depth of the mapped solution without consideration of interconnect delay. Later works such as Flowmap-d[7], Bias-Clus[4] and EdgeMap consider interconnect delays during mapping, but do not take into consideration the effects of their mapping solution on the final placement. Our work focuses on the interaction between the mapping and placement stages. First, the interconnect delay information is estimated from the placement, and used during the labeling process. A placement-based mapping solution which considers both global cell congestion and local cell congestion is then developed. Finally, a legalization step and detailed placement is performed to realize the design. We have implemented our algorithm in a LUT based FPGA technology mapping package named PDM (Placement-Driven Mapping) and tested the implementation on a set of MCNC benchmarks. We use the tool VPR[1][2] for placement and routing of the mapped netlist. Experimental results show the longest path delay on a set of large MCNC benchmarks decreased by 12.3% on the average.
this paper presents a flexible FPGA architecture evaluation framework, named fpgaEVA-LP, for power efficiency analysis of LUT-based FPGA architectures. Our work has several contributions: (i) We develop a mixed-level ...
详细信息
this paper presents a flexible FPGA architecture evaluation framework, named fpgaEVA-LP, for power efficiency analysis of LUT-based FPGA architectures. Our work has several contributions: (i) We develop a mixed-level FPGA power model that combines switch-level models for interconnects and macromodels for LUTs;(ii) We develop a tool that automatically generates a back-annotated gate-level netlist with post-layout extracted capacitances and delays;(iii) We develop a cycle-accurate power simulator based on our power model. It carries out gate-level simulation under real delay model and is able to capture glitch power;(iv) Using the frame work fpgaEVA-LP, we study the power efficiency of FPGAs, in 0.10um technology, under various settings of architecture parameters such as LUT sizes, cluster sizes and wire segmentation schemes and reach several important conclusions. We also present the detailed power consumption distribution among different FPGA components and shed light on the potential opportunities of power optimization for future FPGA designs (e.g., ≤ 0.10um technology).
the fieldprogrammable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit i...
详细信息
ISBN:
(纸本)9781595939340
the fieldprogrammable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit is transformed by merging disparate addition and multiplication operations into large multi-input addition operations, which are synthesized as compressor trees on the FPCA;the remaining portion of the circuit is synthesized on the FPGA. this paper presents a series of architectural improvements to the FPCA that reduce routing delay, increase flexibility and component utilization, and simplify the integration process. Using an FPGA containing six FPCAs, we observed average and maximum speedups of 1.60x and 2.40x on a set of arithmetic benchmarks.
this paper describes an analytical model that relates the architectural parameters of an FPGA to the average prerouting wirelength of an FPGA implementation. Both homogeneous and heterogeneous FPGAs are considered. Fo...
详细信息
ISBN:
(纸本)9781605584102
this paper describes an analytical model that relates the architectural parameters of an FPGA to the average prerouting wirelength of an FPGA implementation. Both homogeneous and heterogeneous FPGAs are considered. For homogeneous FPGAs, the model relates the lookup-table size, the cluster size, and the number of inputs per cluster to the expected wirelength. For heterogeneous FPGAs, the number and positioning of the embedded blocks, as well as the number of pins on each embedded block is considered. Two applications of the model to FPGA architectural design are also presented. Copyright 2009 acm.
Good FPGA placement is crucial to obtain the best Quality of Results (QoR) from FPGA hardware. Although many published global placement techniques place objects in a continuous ASIC-like environment, FPGAs are discret...
详细信息
ISBN:
(纸本)9781450311557
Good FPGA placement is crucial to obtain the best Quality of Results (QoR) from FPGA hardware. Although many published global placement techniques place objects in a continuous ASIC-like environment, FPGAs are discrete in nature, and a continuous algorithm cannot always achieve superior QoR by itself. therefore, discrete FPGA-specific detail placement algorithms are used to improve the global placement results. Unfortunately, most of these detail placement algorithms do not have a global view. this paper presents a discrete "middle" placer that fills the gap between the two placement steps. It works like simulated annealing, but leverages various acceleration techniques. It does not pay the runtime penalty typical of simulated annealing solutions. Experiments show that withthis placer, final QoR is significantly better than withthe global-detail placer approach.
暂无评论