In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implem...
详细信息
In this paper we evaluate the trade-offs between various low-leakage design techniques for fieldprogrammablegatearrays (FGPAs) in deep sub-micron technologies. Since multiplexers are widely used in fpgas for implementing look up tables (LUTs) and connection and routing switches, several low-leakage implementations of pass transistor based multiplexers and routing switches are proposed and their design trade-offs are presented based on transistor-level simulation, physical design, and impact on overall system performance. We find that gate biasing, the use of redundant SRAM cells, and integration of multi-Vt technology are ideal for fpgas, and they can reduce leakage current by 2X-4X compared to an implementation without any leakage reduction technique. For some of the potential low-leakage design techniques being evaluated in our study, the impact on chip area is very minimal to an increase of 15% - 30%.
We present the design of a high-performance, highly pipelined asynchronous fpga. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficient...
详细信息
We present the design of a high-performance, highly pipelined asynchronous fpga. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficiently take advantage of this large amount of pipelining. Our fpga, which does not use a clock to sequence computations, automatically "self-pipelines" its logic without the designer needing to be explicitly aware of all pipelining details. This property makes our fpga ideal for throughput-intensive applications and we require minimal place and route support to achieve good performance. Benchmark circuits taken from both the asynchronous and clocked design communities yield throughputs in the neighborhood of 300-400 MHz in a TSMC 0.25μm process and 500-700 MHz in a TSMC 0.18μm process.
This paper shows a method to verifying the thermal status of complex fpga-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to ext...
详细信息
This paper shows a method to verifying the thermal status of complex fpga-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to extract the output frequencies of an array of ring-oscillators previously distributed in the die, taking full advantage of the configuration port capabilities in Xilinx technology. As a result, it is shown that the fpga technology offers the designers of embedded systems the possibility of viewing a detailed thermal map of a circuit at a minimum cost. The verification can be done in actual working conditions;for example with heat sinks and fans attached to the chip, inside the system case, or even in an on-board satellite application. The main results of the work are unthinkable using other alternatives like IR cameras, external sensors, or embedded diodes.
fieldprogrammablegatearrays (fpgas) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using fpgas had less than optimal choices for a source o...
详细信息
fieldprogrammablegatearrays (fpgas) are an increasingly popular choice of platform for the implementation of cryptographic systems. Until recently, designers using fpgas had less than optimal choices for a source of truly random bits. In this paper we extend a technique that uses on-chip jitter and PLLs to a much larger class of fpgas that do not contain PLLs. Our design uses only the Configurable Logic Blocks (CLBs) common to all fpgas, and has a self-testing capability. Using the intrinsic jitter contained in digital circuits, we produce random bits at speeds of up to 0.5 Mbits/second with good statistical characteristics. We discuss the engineering challenges of extracting random bits from digital circuits, and we report the results of running standard statistical tests (NIST) on the output generated by our system.
We consider active leakage power dissipation in fpgas and present a "no cost" approach for active leakage reduction. It is well-known that the leakage power consumed by a digital CMOS circuit depends strongl...
详细信息
We consider active leakage power dissipation in fpgas and present a "no cost" approach for active leakage reduction. It is well-known that the leakage power consumed by a digital CMOS circuit depends strongly on the state of its inputs. Our leakage reduction technique leverages a fundamental property of basic fpga logic elements (look-up-tables) that allows a logic signal in an fpga design to be interchanged with its complemented form without any area or delay penalty. We apply this property to select polarities for logic signals so that fpga hardware structures spend the majority of time in low leakage states. In an experimental study, we optimize active leakage power in circuits mapped into a state-of-the-art 90nm commercial fpga. Results show that the proposed approach reduces active leak-age by 25%, on average.
This paper presents a new approach to timing optimization for fpga designs, namely incremental physical resynthesis, to answer the challenge of effectively integrating logic and physical optimizations without incurrin...
详细信息
This paper presents a new approach to timing optimization for fpga designs, namely incremental physical resynthesis, to answer the challenge of effectively integrating logic and physical optimizations without incurring unmanageable runtime complexity. Unlike previous approaches to this problem which limit the types of operations and/or architectural features, we take advantage of many architectural characteristics of modern fpga devices, and utilize many types of optimizations including cell repacking, signal rerouting, resource retargeting, and logic restructuring, accompanied by efficient incremental placement, to gradually transform a design via a series of localized logic and physical optimizations that verifiably improve overall compliance with timing constraints. This procedure works well on small and large designs, and can be administered through either an automatic optimizer, or an interactive user interface. Our preliminary experiments showed that this approach is very effective in fixing or reducing timing violations that cannot be reduced by other optimization techniques: For a set of test cases to which this is applicable, the worst timing violation is reduced by an average of 42.8%.
fpgas normally operate at whatever clock rate is appropriate for the loaded configuration. When fpgas are used as computational devices in a larger system, however, it is better to employ fixed-frequency fpgas operati...
详细信息
fpgas normally operate at whatever clock rate is appropriate for the loaded configuration. When fpgas are used as computational devices in a larger system, however, it is better to employ fixed-frequency fpgas operating at a high clock frequency. Such fixed-frequency arrays require pipelined interconnect structures, which are difficult to support in a traditional fpga architecture. We have developed a novel approach, called a "corner-turn" interconnect, based on a Manhattan array of logically depopulated S-boxes with full connectivity but limited routability. This interconnect supports new polynomial-time routing techniques while maintaining conventional placement and other upstream toolflow. We have used the corner-turn interconnect to define a fixed-frequency fpga architecture, the SFRA, that is largely compatible with the Xilinx Virtex while providing higher speed, pipelined operation. Our tools automatically repipeline designs to operate at the SFRA's intrinsic clock frequency. Since the arrays are largely compatible, we directly compare the SFRA with the Virtex on four benchmark designs. On these benchmarks, the SFRA offers higher throughput and competitive throughput per area. The SFRA routing and retiming tools also run one to two orders of magnitude faster than their Xilinx counterparts.
Understanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling t...
详细信息
Understanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling the electromagnetic space. The 3D FDTD buried object detection forward model is emerging as a useful application in mine detection and other subsurface sensing areas. However, the computation of this model is complex and time consuming. Implementing this algorithm in hardware will greatly increase its computational speed and widen its use in many other areas. We present an fpga implementation to speedup the pseudo-2D FDTD algorithm which is a simplified version of the 3D FDTD model. The pseudo-2D model can be upgraded to 3D with limited modification of structure. We implement the pseudo-2D FDTD model for layered media and complete boundary conditions on an fpga. The computational speed on the reconfigurable hardware design is about 24 times faster than a software implementation on a 3.0GHz PC. The speedup is due to pipelining, parallelism, use of fixed point arithmetic, and careful memory architecture design.
In a clustered programmable-reconfigurable processor, multiple programmable processors and blocks of reconfigurable logic communicate through a register-based communication mechanism, which reduces the impact of wire ...
详细信息
In a clustered programmable-reconfigurable processor, multiple programmable processors and blocks of reconfigurable logic communicate through a register-based communication mechanism, which reduces the impact of wire delay on clock cycle time. In this paper, we present a circuit-level design for the reconfigurable clusters used on the Amalgam programmable-reconfigurable processor. We outline our interleaved reconfigurable array design, which provides high bandwidth to and from the register file without requiring large amounts of register control logic. We characterize the latency of operations in our array, and present results that show the impact that this latency has on overall system performance in a range of fabrication processes. Finally, we present a pipelining scheme that enables the array to operate at clock rates closer to those of programmable processors and allows for better scaling in future technologies.
How can programmable Logic arrays (PLAs) be built without relying on lithography to pattern their smallest features? In this paper, we detail designs which exploit emerging, bottom-up material synthesis techniques to ...
详细信息
How can programmable Logic arrays (PLAs) be built without relying on lithography to pattern their smallest features? In this paper, we detail designs which exploit emerging, bottom-up material synthesis techniques to build PLAs using molecular-scale nanowires. Our new designs accommodate technologies where the only post-fabrication programmable element is a non-restoring diode. We introduce stochastic techniques which allow us to restore the diode logic at the nanoscale so that it can be cascaded and interconnected for general logic evaluation. Under conservative assumptions using 10nm nanowires and 90nm lithographic support, we project yielded logic density around 500,000nm 2/OR term for a 60 OR-term array;a complete 60-term, two-level PLA is roughly the same size as a single 4-LUT logic block in 22nm lithography. Each OR term is comparable in area to a 4-transistor hardwired gate at 22nm. Mapping sample datapaths and conventional programmable logic benchmarks, we estimate that each 60-OR-term PLA plane will provide equivalent logic to 5-10 4-input LUTs.
暂无评论