In this work, the design of an energy-efficient FPGA interconnect architecture has been investigated. It concerns a dual-supply solution, where the logic blocks are powered by a nominal voltage supply and the intercon...
详细信息
In this work, the design of an energy-efficient FPGA interconnect architecture has been investigated. It concerns a dual-supply solution, where the logic blocks are powered by a nominal voltage supply and the interconnect part is powered by a reduced voltage supply. The behaviour of a fully-buffered, a fully pass-transistor based and a hybrid buffer and pass-transistor architecture has been investigated over a range of power supply voltages. It is found that there exists an optimal ratio between the number of pass-transistor and tri-state buffer switches depending on the load and power supply involved. By reducing the signal voltage swing on the interconnect, the need for a fully tri-state buffer-based interconnect is eliminated, thus saving valuable area and power. Through benchmark studies, it is confirmed that using an optimal composite of pass-transistor and tri-state buffer switches operating at a reduced power supply can meet the same speed as compared to the full-swing scenario at a much lower power consumption. An average reduction in power-delay of 4.4x for low-load critical paths and 2.7x for high-load critical paths is achieved using buffer receivers. Using levelshifter receivers, an average reduction in power-delay of 4.7x for low-load critical paths and 2.8x for high-load critical paths is obtained. It is also found that due to partially replacing tri-state buffers by pass-transistor switches and inspite of using levelshifters, we can save up to a factor of 4x in interconnect area as compared to fully-buffered architectures. The results have been validated over various benchmarks in a 0.1 μm CMOS technology.
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in FPGAs for implem...
详细信息
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in FPGAs for implementing look up tables (LUTs) and connection and routing switches, several low-leakage implementations of pass transistor based multiplexers and routing switches are proposed and their design trade-offs are presented based on transistor-level simulation, physical design, and impact on overall system performance. We find that gate biasing, the use of redundant SRAM cells, and integration of multi-Vt technology are ideal for FPGAs, and they can reduce leakage current by 2X-4X compared to an implementation without any leakage reduction technique. For some of the potential low-leakage design techniques being evaluated in our study, the impact on chip area is very minimal to an increase of 15% - 30%.
Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states th...
详细信息
Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states that improved clock frequency plus improved architecture yields a doubling of CPU performance every 18 months. This paper examines the impact of Moore's Law on the peak floating-point performance of FPGAs. Performance trends for individual operations are analyzed as well as the performance trend of a common instruction mix (multiply accumulate). The important result is that peak FPGA floating-point performance is growing significantly faster than peak floating-point performance for a CPU.
fieldprogrammablegatearrays (FPGAs) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using FPGAs had less than optimal choices for a source o...
详细信息
fieldprogrammablegatearrays (FPGAs) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using FPGAs had less than optimal choices for a source of truly random bits. In this paper we extend a technique that uses on-chip jitter and PLLs to a much larger class of FPGAs that do not contain PLLs. Our design uses only the Configurable Logic Blocks (CLBs) common to all FPGAs, and has a self-testing capability. Using the intrinsic jitter contained in digital circuits, we produce random bits at speeds of up to 0.5 Mbits/second with good statistical characteristics. We discuss the engineering challenges of extracting random bits from digital circuits, and we report the results of running standard statistical tests (NIST) on the output generated by our system.
In this work, we parameterize and explore the interconnect structure of pipelined FPGAs. Specifically, we explore the effects of interconnect register population, length of registered routing track segments, registere...
详细信息
In this work, we parameterize and explore the interconnect structure of pipelined FPGAs. Specifically, we explore the effects of interconnect register population, length of registered routing track segments, registered 10 terminals of logic units, and the flexibility of the interconnect structure on the performance of a pipelined FPGA. Our experiments with the RaPiD architecture identify tradeoffs that must be made while designing the interconnect structure of a pipelined FPGA. The post-exploration architecture that we found shows a 19% improvement over RaPiD, while the area overhead incurred in placing and routing benchmarks netlists on the post-exploration architecture is 18%.
We present the design of a high-performance, highly pipelined asynchronous FPGA. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficient...
详细信息
We present the design of a high-performance, highly pipelined asynchronous FPGA. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficiently take advantage of this large amount of pipelining. Our FPGA, which does not use a clock to sequence computations, automatically "self-pipelines" its logic without the designer needing to be explicitly aware of all pipelining details. This property makes our FPGA ideal for throughput-intensive applications and we require minimal place and route support to achieve good performance. Benchmark circuits taken from both the asynchronous and clocked design communities yield throughputs in the neighborhood of 300-400 MHz in a TSMC 0.25μm process and 500-700 MHz in a TSMC 0.18μm process.
In this paper we study the technology mapping problem of FPGA architectures with dual supply voltages (Vdds) for power optimization. This is done with the guarantee that the mapping depth of the circuit will not incre...
详细信息
In this paper we study the technology mapping problem of FPGA architectures with dual supply voltages (Vdds) for power optimization. This is done with the guarantee that the mapping depth of the circuit will not increase compared to the circuit with a single Vdd. We first design a single-Vdd mapping algorithm that achieves better power results than the latest published low-power mapping algorithms. We then show that our dual-Vdd mapping algorithm can further improve power savings by up to 11.6% over the single-Vdd mapper. In addition, we investigate the best low-Vdd/high-Vdd ratio for the largest power reduction among several dual-Vdd combinations. To our knowledge, this is the first work on dual-Vdd mapping for FPGA architectures.
This paper shows a method to verifying the thermal status of complex FPGA-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to ext...
详细信息
This paper shows a method to verifying the thermal status of complex FPGA-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to extract the output frequencies of an array of ring-oscillators previously distributed in the die, taking full advantage of the configuration port capabilities in Xilinx technology. As a result, it is shown that the FPGA technology offers the designers of embedded systems the possibility of viewing a detailed thermal map of a circuit at a minimum cost. The verification can be done in actual working conditions;for example with heat sinks and fans attached to the chip, inside the system case, or even in an on-board satellite application. The main results of the work are unthinkable using other alternatives like IR cameras, external sensors, or embedded diodes.
暂无评论