This paper describes an analytical model, based principally on Rent's Rule, that relates logic architectural parameters to the area efficiency of an FPGA. In particular, the model relates the lookup-table size, th...
详细信息
ISBN:
(纸本)9781424419609
This paper describes an analytical model, based principally on Rent's Rule, that relates logic architectural parameters to the area efficiency of an FPGA. In particular, the model relates the lookup-table size, the cluster size, and the number of inputs per cluster to the amount of logic that can be packed into each lookup-table and cluster, and the number of used inputs per cluster. Comparison to experimental results show that our models are accurate. This accuracy combined with the simple form of the equations make them a powerful tool for FPGA architects to better understand and guide the development of future FPGA architectures.
Temporal runtime-reconfiguration of FPGAs allows for a resource-efficient sequential execution of signal processing modules. Approaches for partitioning processing chains into modules have been derived in various prev...
详细信息
ISBN:
(纸本)9781479900046
Temporal runtime-reconfiguration of FPGAs allows for a resource-efficient sequential execution of signal processing modules. Approaches for partitioning processing chains into modules have been derived in various previous works. We will present a metric for weighted partitioning of pre-defined processing element sequences. The proposed method yields a set of reconfigurable partitions, which are balanced in terms of resources, while jointly have a minimal data throughput. Using this metric, we will formulate a partitioning algorithm with linear complexity and will compare our approach to the state of the art.
The Enhanced Scouting logic (ESL) is a memristive logic gate family with low sensitivity to resistance variation and high device endurance. This work studies the design methods of logic circuits based on the Voltage-I...
详细信息
ISBN:
(纸本)9781665473903
The Enhanced Scouting logic (ESL) is a memristive logic gate family with low sensitivity to resistance variation and high device endurance. This work studies the design methods of logic circuits based on the Voltage-Input Enhanced Scouting logic (VIESL) gates. Both the single-array and dual-array synthesis methods are proposed. The read/write separation technique of VIESL gates facilitates the pipelined logic operations. The synthesis results on the benchmarks show that the circuit generated by the proposed single-array synthesis method has the best performance compared with that of its counterparts, and the dual-array synthesis method reduces the cell counts effectively.
Carry chains on FPGAs have traditionally been only used for fast binary arithmetic operations. In this paper, we propose using the carry chain to implement general logic as a means of reducing the critical path delay ...
详细信息
ISBN:
(纸本)9781665437592
Carry chains on FPGAs have traditionally been only used for fast binary arithmetic operations. In this paper, we propose using the carry chain to implement general logic as a means of reducing the critical path delay and raising performance. To achieve this, we use a Majority-Inverter Graph (MIG) to represent the application during technology mapping, since carry functionality directly maps to the majority logic function. This aligns the subject graph of technology mapping with the capabilities of the carry chain. We first map an application to LUTs, then determine a chain of critical LUTs containing paths of majority "gates" that we deem beneficial for mapping onto the carry chain. We place such paths onto the carry chains, with the remaining logic in LUTs. In an experimental study using a suite of benchmarks, we observe that the proposed approach yields a post-place-and-route critical path delay that is superior to using delay-optimized mapping, yet without the significant area penalty. With carry-chain optimizations, area-delay product is improved by 9% vs. baseline LUT mappings.
With the deployment of FPGAs in a data center, there is the opportunity to build large multi-FPGA applications. In this paper, we design a partitioner to address the problem of efficiently assigning the various tasks ...
详细信息
ISBN:
(纸本)9798350341515
With the deployment of FPGAs in a data center, there is the opportunity to build large multi-FPGA applications. In this paper, we design a partitioner to address the problem of efficiently assigning the various tasks of a large multi-FPGA application to individual network-connected FPGAs according to constraints that consider resource usage, communication bandwidth and communication latency. By using simulated annealing, we can modify the cost function as new objectives and constraints are determined. We build on the Galapagos multi-FPGA platform by introducing a multi-die shell to extend Galapagos to more recent FPGA boards and design the partitioner to work on any collection of single- and multi-die FPGAs. Finally, We evaluate the new shell and partitioner using micro-benchmarks and analyze the partitioning of a real-world multi-FPGA application, a Transformer model.
The power consumption of digital circuits, e.g., fieldprogrammable Gate Arrays (FPGAs), is directly related to their operating supply voltages. On the other hand, usually, chip vendors introduce a conservative voltag...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
The power consumption of digital circuits, e.g., fieldprogrammable Gate Arrays (FPGAs), is directly related to their operating supply voltages. On the other hand, usually, chip vendors introduce a conservative voltage guardband below the standard nominal level to ensure the correct functionality of the design in worst-case process and environmental scenarios. For instance, this voltage guardband is empirically measured to be 12%, 20%, and 16% of the nominal level in commercial CPUs [1], Graphics Processing Units (GPUs) [2], and Dynamic RAMs (DRAMs) [3], respectively. However, in many real-world applications, this guardband is extremely conservative and eliminating it can result in significant power savings without any overhead. Motivated by these studies, we aim to extend the undevolting technique to commercial FPGAs. Toward this goal, we will practically demonstrate the voltage guardband for a representative Xilinx FPGA, with a preliminary concentration on on-chip memories, or Block RAMs (BRAMs).
Exploiting the underutilisation of variable-length DSP algorithms during normal operation is vital, when seeking to maximise the achievable functionality of an application within peak power budget. A system level, low...
详细信息
ISBN:
(纸本)9781424419609
Exploiting the underutilisation of variable-length DSP algorithms during normal operation is vital, when seeking to maximise the achievable functionality of an application within peak power budget. A system level, low power design methodology for FPGA-based, variable length DSP IP cores is presented. Algorithmic commonality is identified and resources mapped with a configurable datapath, to increase achievable functionality. It is applied to a digital receiver application where a 100% increase in operational capacity is achieved in certain modes without significant power or area budget increases. Measured results show resulting architectures requires 19% less peak power, 33% fewer multipliers and 12% fewer slices than existing architectures.
This tutorial describes the Why and How of the new 65-nm families of Virtex-5 FPGAs. It describes several aspects of the technology that affect speed, density, and power consumption. The basic device structure and pac...
详细信息
A fieldprogrammable Gate Array (FPGA), when used as a platform for implementing special-purpose computing architectures, offers the potential for increased functional parallelism over the alternative approach of soft...
详细信息
ISBN:
(纸本)9781424403127
A fieldprogrammable Gate Array (FPGA), when used as a platform for implementing special-purpose computing architectures, offers the potential for increased functional parallelism over the alternative approach of software running on a general-purpose microprocessor. However, the increasing disparity between the logic speed and density of a state-of-the-art FPGA versus a state-of-the-art microprocessor has already begun to negate the benefits of this increased functional parallelism for all but a limited set of applications. We believe that the solution to this problem is to construct distributed multi-FPGA architectures to aggregate the parallelism of multiple FPGAs. Such a system would require a high-capacity interconnect and thus we propose arranging the FPGAs onto a scalable direct network. This strategy requires each FPGA to contain an integrated router that must share the logic fabric with the application logic. In this paper, we propose a novel routing technique that can significantly boost such a network's capacity and be implemented into compact and efficient routers. We begin with an existing lightweight routing algorithm and augment it with a novel technique called predictive load balancing, where routers collect information about the blocking behavior on their output ports and use this information when making routing decisions.
Large number multiplication has always been an essential operation in cryptographic algorithms. In this paper, we propose Broken-Karatsuba multiplication by applying the non-least-positive form to represent large numb...
详细信息
ISBN:
(纸本)9789090304281
Large number multiplication has always been an essential operation in cryptographic algorithms. In this paper, we propose Broken-Karatsuba multiplication by applying the non-least-positive form to represent large numbers and dig the parallelism hidden in conventional Karatsuba multiplication. Further, we modify Montgomery modular multiplication algorithm with Broken-Karatsuba multiplication to make it suitable for pipeline implementation with fewer hardware resources. Based on this modified algorithm, a 256-bit two-stage modular multiplier is constructed. There is no stall in the pipeline when performing consecutive modular multiplications and the delay of a modular multiplication is reduced significantly. Implemented on Virtex-6 FPGA platforms, our design outperforms most previous works in terms of modular multiplication latency and area-time product, which makes it suitable for server-side applications.
暂无评论