It has become clear that on-chip storage is an essential component of high-density fpgas. These arrays were originally intended to implement storage, but recent work has shown that they can also be used to implement l...
详细信息
ISBN:
(纸本)9781581131932
It has become clear that on-chip storage is an essential component of high-density fpgas. These arrays were originally intended to implement storage, but recent work has shown that they can also be used to implement logic very efficiently. This previous work has only considered single-port arrays. Many current fpgas, however, contain dual-port arrays. In this paper we present an algorithm that maps logic to these dual-port arrays. Our algorithm can either optimize area with no regard for circuit speed, or optimize area under the constraint that the combinational depth of the circuit does not increase. Experimental results show that, on average, our algorithm packs between 29% and 35% more logic than an algorithm that targets single-port arrays. We also show, however, that even with this algorithm, dual-port arrays are still not as area-efficient as single-port arrays when implementing logic.
Resource sharing is a key area-reduction approach in high-level synthesis (HLS) in which a single hardware functional unit is used to implement multiple operations in the high-level circuit specification. We show that...
详细信息
The partial reconfiguration feature of some of the current-generation fieldprogrammablegatearrays (fpgas) can improve dependability by detecting and correcting errors in on-chip configuration data. Such an error re...
详细信息
ISBN:
(纸本)9781581133417
The partial reconfiguration feature of some of the current-generation fieldprogrammablegatearrays (fpgas) can improve dependability by detecting and correcting errors in on-chip configuration data. Such an error recovery process can be executed online with minimal interference of user applications. However, because Look-up Tables (LUTs) in Configurable Logic Blocks (CLBs) of fpgas can also implement memory modules for user applications, a memory coherence issue arises such that memory contents in user applications may be altered by the online configuration data recovery process. In this paper, we investigate this memory coherence problem and propose a memory coherence technique that does not impose extra constraints on the placement of memory-configured LUTs. Theoretical analyses and simulation results show that the proposed technique guarantees the memory coherence with a very small (on the order of 0.1%) execution time overhead in user applications.
Current reconfigurable systems suffer from a significant overhead due to the time it takes to reconfigure their hardware. In order to deal with this overhead, and increase the power of reconfigurable systems, it is im...
详细信息
ISBN:
(纸本)9780897919784
Current reconfigurable systems suffer from a significant overhead due to the time it takes to reconfigure their hardware. In order to deal with this overhead, and increase the power of reconfigurable systems, it is important to develop hardware and software systems to reduce or eliminate this delay. In this paper we propose one technique for significantly reducing the reconfiguration latency: the prefetching of configurations. By loading a configuration into the reconfigurable logic in advance of when it is needed, we can overlap the reconfiguration with useful computation. We demonstrate the power of this technique, and propose an algorithm for automatically adding prefetch operations into reconfigurable applications. This results in a significant decrease in the reconfiguration overhead for these applications.
This paper presents abstract layout techniques for a variety of fpga switch block architectures. We evaluate the relative density of subset, universal, and Wilton switch block architectures. For subset switch blocks o...
详细信息
ISBN:
(纸本)9781581134520
This paper presents abstract layout techniques for a variety of fpga switch block architectures. We evaluate the relative density of subset, universal, and Wilton switch block architectures. For subset switch blocks of small size, we find the optimal implementations using a simple metric. We also develop a tractable heuristic that returns the optimal results for small switch blocks, and good results for large switch blocks. For switch blocks with general connectivity, we develop a representation and a layout evaluation technique. We use these techniques to compare a variety of small switch blocks. We find that the traditional Xilinx-style, subset switch block is superior to the other proposed architectures. Finally, we have hand-designed some small switch blocks to confirm our results.
Performance of fieldprogrammablegatearrays (fpgas) used for floating-point applications is poor due to the complexity of floating-point arithmetic. Implementing floatingpoint units on fpgas consume a large amount o...
详细信息
ISBN:
(纸本)9781605584102
Performance of fieldprogrammablegatearrays (fpgas) used for floating-point applications is poor due to the complexity of floating-point arithmetic. Implementing floatingpoint units on fpgas consume a large amount of resources. This makes fpgas less attractive for use in floating-point intensive applications. Therefore, there is a need for embedded floating-point units (FPUs) in fpgas. However, if unutilized, embedded FPUs waste space on the fpga die. To overcome this issue, we propose a flexible multi-mode embedded FPU for fpgas that can be configured to perform a wide range of operations. The floating-point adder and multiplier in our embedded FPU can each be configured to perform one double-precision operation or two single-precision operations in parallel. To increase flexibility further, access to the large integer multiplier, adder and shifters in the FPU is provided. Benchmark circuits were implemented on both a standard Xilinx Virtex-II fpga and on our fpga with embedded FPU blocks. The results using our embedded FPUs showed a mean area improvement of 5.2 times and a mean delay improvement of 5.8 times for the doubleprecision benchmarks, and a mean area improvement of 4.4 times and a mean delay improvement of 4.2 times for the single-precision benchmarks. Copyright 2009 acm.
Leakage power has been overshadowed by dynamic power minimization techniques in fpgas, and is a growing concern in programmable logic. This paper proposes a dual threshold voltage implementation of the fpga architectu...
详细信息
Leakage power has been overshadowed by dynamic power minimization techniques in fpgas, and is a growing concern in programmable logic. This paper proposes a dual threshold voltage implementation of the fpga architecture for leakage power reduction. A CAD flow is developed for assigning high threshold voltage to the logic elements within the logic blocks of the fpga for leakage power reduction. The CAD flow ensures that all the logic blocks remain identical with respect to the number of high and low threshold voltage logic elements that each logic block contains. This CAD flow leads to a dual threshold voltage implementation for the fpga architecture. Results indicate that over 95% of the logic elements in the fpga can be assigned high threshold voltage. On an average leakage savings of 60% and up to 70% for some benchmarks can be achieved. The proposed CAD flow forms a basis on which other dual threshold voltage implementations of fpga can be evaluated. We investigate the design trade-offs between the ratio of the number of high and number of low-Vt logic elements in a cluster and the leakage savings. We also investigate the impact of cluster size on leakage savings for the dual threshold voltage implementation.
The purpose of this paper is to introduce a modified packing and placement algorithm for fpgas that utilizes logic duplication to improve performance. The modified packing algorithm was designed to leave unused basic ...
详细信息
The purpose of this paper is to introduce a modified packing and placement algorithm for fpgas that utilizes logic duplication to improve performance. The modified packing algorithm was designed to leave unused basic logic elements (BLEs) in timing critical clusters, to allow potential targets for logic duplication. The modified placement algorithm consists of a new stage after placement in which logic duplication is performed to shorten the length of the critical path. In this paper, we show that in a representative fpga architecture using .18 μm technology, the length of the final critical path can be reduced by an average of 14.1%. Approximately half of this gain comes directly from the changes to the packing algorithm while the other half comes from the logic duplication performed during placement.
We consider active leakage power dissipation in fpgas and present a "no cost" approach for active leakage reduction. It is well-known that the leakage power consumed by a digital CMOS circuit depends strongl...
详细信息
We consider active leakage power dissipation in fpgas and present a "no cost" approach for active leakage reduction. It is well-known that the leakage power consumed by a digital CMOS circuit depends strongly on the state of its inputs. Our leakage reduction technique leverages a fundamental property of basic fpga logic elements (look-up-tables) that allows a logic signal in an fpga design to be interchanged with its complemented form without any area or delay penalty. We apply this property to select polarities for logic signals so that fpga hardware structures spend the majority of time in low leakage states. In an experimental study, we optimize active leakage power in circuits mapped into a state-of-the-art 90nm commercial fpga. Results show that the proposed approach reduces active leak-age by 25%, on average.
With increased logic density due to the shift towards Deep Submicron technologies (DSM), fpgas have become a viable option for implementing large designs. However, most commercial fpgas, due to their general purpose a...
详细信息
With increased logic density due to the shift towards Deep Submicron technologies (DSM), fpgas have become a viable option for implementing large designs. However, most commercial fpgas, due to their general purpose architectural nature, cannot handle designs which require very high throughput. In this paper, we propose a novel high throughput fpga architecture which tries to combine the high-performance of Application Specific Integrated Circuits (ASICs) and the flexibility afforded by the reconfigurability of fpgas. This architecture utilizes the concept of `Wave-Steering' and works best for designs which are highly regular and have almost equal delays along all paths. It has enormous potential in Digital Signal and Image Processing applications since a good portion of these applications are regular in nature. Preliminary results for some commonly used DSP designs are encouraging and yield throughputs in the neighborhood of 770 MHz in 0.5 μ CMOS technology.
暂无评论