The performance of FPGAs is suffering from interconnect delays, especially the delays of the long wires which can be more than 10ns in large FPGAs. We have proposed the GAPLA: a Globally Asynchronous Locally Synchrono...
详细信息
ISBN:
(纸本)0769525024
The performance of FPGAs is suffering from interconnect delays, especially the delays of the long wires which can be more than 10ns in large FPGAs. We have proposed the GAPLA: a Globally Asynchronous Locally Synchronous (GALS) FPGA architecture to deal with this problem. In GAPLA architecture, The whole FPGA area is divided into locally synchronous blocks wrapped with asynchronous I/O interfaces. Interconnections inside each synchronous block axe local and fast. The long interconnections between synchronous blocks only come into picture when there are asynchronous communications. In this paper, we focus on the CAD tools designed for the GAPLA architecture. Starting from a behavioral circuit description, a design is first partitioned into smaller modules where each module can fit into one synchronous block. Then an As-Soon-As-Possible scheduler (other scheduler can also be applied) is used to schedule each module. After scheduling, control signals of the asynchronous communications are added to each module since the communication timing is now known in the scheduled design. Each module is then synthesized and mapped into a synchronous block using existing CAD tools for synchronous FPGAs. A module mapping process is used to find the implementation location for each module. Experimental results show that applications could run upto 2 times faster on the GAPLA FPGA compared to the synchronous FPGA counterparts.
Modern fieldprogrammablegate array (FPGAs) gains a growing importance due to large capacity, quick time to market and low non-recurring engineering cost. However, FPGA consumes more power than its ASIC (application ...
详细信息
Modern fieldprogrammablegate array (FPGAs) gains a growing importance due to large capacity, quick time to market and low non-recurring engineering cost. However, FPGA consumes more power than its ASIC (application specific integrated circuit) counterpart. Therefore, modeling and reduction of power for FPGAs has become an emerging research area. In this session, we will first present an overview of new challenges in commercial FPGA architecture design with an emphasis on the circuit and architecture issues for power at current and upcoming process nodes. Today's 90nm FPGAs utilize techniques such as programmable shut-down of unused resources at the architectural level and multiple threshold voltages and gate-oxides at the circuit level. At 65nm and 45nm new techniques will need to target not only power mitigation but process variation in power and timing and their impact on yield and manufacturability. Then, we will focus on power-reduction computer-aided design (CAD) techniques for standard FPGAs (with no architectural enhancements for power). We will start by discussing a typical FPGA CAD flow that is not power-aware. Then, we will study each of the four CAD steps (technology mapping, clustering, placement, and routing) and show how each CAD step can be enhanced to optimize the resulting implementation for power. For each step, we will describe both the techniques employed and present experimental results that show how well the techniques are able to optimize for power. We will first consider each stage in isolation, and then present combined results that give an indication of what sort of power savings are possible by optimizing the entire CAD flow. Finally, we will describe new architecture research and related CAD enhancement in power control, and discuss similar and unique aspects of power reduction in FPGA compared to ASIC. We will first introduce power modeling (both simulation and measurement) and architecture evaluation for FPGA. We will then discuss a n
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in FPGAs for implem...
详细信息
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in FPGAs for implementing look up tables (LUTs) and connection and routing switches, several low-leakage implementations of pass transistor based multiplexers and routing switches are proposed and their design trade-offs are presented based on transistor-level simulation, physical design, and impact on overall system performance. We find that gate biasing, the use of redundant SRAM cells, and integration of multi-Vt technology are ideal for FPGAs, and they can reduce leakage current by 2X-4X compared to an implementation without any leakage reduction technique. For some of the potential low-leakage design techniques being evaluated in our study, the impact on chip area is very minimal to an increase of 15% - 30%.
Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states th...
详细信息
Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states that improved clock frequency plus improved architecture yields a doubling of CPU performance every 18 months. This paper examines the impact of Moore's Law on the peak floating-point performance of FPGAs. Performance trends for individual operations are analyzed as well as the performance trend of a common instruction mix (multiply accumulate). The important result is that peak FPGA floating-point performance is growing significantly faster than peak floating-point performance for a CPU.
fieldprogrammablegatearrays (FPGAs) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using FPGAs had less than optimal choices for a source o...
详细信息
fieldprogrammablegatearrays (FPGAs) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using FPGAs had less than optimal choices for a source of truly random bits. In this paper we extend a technique that uses on-chip jitter and PLLs to a much larger class of FPGAs that do not contain PLLs. Our design uses only the Configurable Logic Blocks (CLBs) common to all FPGAs, and has a self-testing capability. Using the intrinsic jitter contained in digital circuits, we produce random bits at speeds of up to 0.5 Mbits/second with good statistical characteristics. We discuss the engineering challenges of extracting random bits from digital circuits, and we report the results of running standard statistical tests (NIST) on the output generated by our system.
In this work, we parameterize and explore the interconnect structure of pipelined FPGAs. Specifically, we explore the effects of interconnect register population, length of registered routing track segments, registere...
详细信息
In this work, we parameterize and explore the interconnect structure of pipelined FPGAs. Specifically, we explore the effects of interconnect register population, length of registered routing track segments, registered 10 terminals of logic units, and the flexibility of the interconnect structure on the performance of a pipelined FPGA. Our experiments with the RaPiD architecture identify tradeoffs that must be made while designing the interconnect structure of a pipelined FPGA. The post-exploration architecture that we found shows a 19% improvement over RaPiD, while the area overhead incurred in placing and routing benchmarks netlists on the post-exploration architecture is 18%.
We present the design of a high-performance, highly pipelined asynchronous FPGA. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficient...
详细信息
We present the design of a high-performance, highly pipelined asynchronous FPGA. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficiently take advantage of this large amount of pipelining. Our FPGA, which does not use a clock to sequence computations, automatically "self-pipelines" its logic without the designer needing to be explicitly aware of all pipelining details. This property makes our FPGA ideal for throughput-intensive applications and we require minimal place and route support to achieve good performance. Benchmark circuits taken from both the asynchronous and clocked design communities yield throughputs in the neighborhood of 300-400 MHz in a TSMC 0.25μm process and 500-700 MHz in a TSMC 0.18μm process.
In this paper we study the technology mapping problem of FPGA architectures with dual supply voltages (Vdds) for power optimization. This is done with the guarantee that the mapping depth of the circuit will not incre...
详细信息
In this paper we study the technology mapping problem of FPGA architectures with dual supply voltages (Vdds) for power optimization. This is done with the guarantee that the mapping depth of the circuit will not increase compared to the circuit with a single Vdd. We first design a single-Vdd mapping algorithm that achieves better power results than the latest published low-power mapping algorithms. We then show that our dual-Vdd mapping algorithm can further improve power savings by up to 11.6% over the single-Vdd mapper. In addition, we investigate the best low-Vdd/high-Vdd ratio for the largest power reduction among several dual-Vdd combinations. To our knowledge, this is the first work on dual-Vdd mapping for FPGA architectures.
This paper shows a method to verifying the thermal status of complex FPGA-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to ext...
详细信息
This paper shows a method to verifying the thermal status of complex FPGA-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to extract the output frequencies of an array of ring-oscillators previously distributed in the die, taking full advantage of the configuration port capabilities in Xilinx technology. As a result, it is shown that the FPGA technology offers the designers of embedded systems the possibility of viewing a detailed thermal map of a circuit at a minimum cost. The verification can be done in actual working conditions;for example with heat sinks and fans attached to the chip, inside the system case, or even in an on-board satellite application. The main results of the work are unthinkable using other alternatives like IR cameras, external sensors, or embedded diodes.
暂无评论