C-slow retiming is a process of automatically increasing the throughput of a design by enabling fine grained pipelining of problems with feedback loops. This transformation is especially appropriate when applied to FP...
详细信息
C-slow retiming is a process of automatically increasing the throughput of a design by enabling fine grained pipelining of problems with feedback loops. This transformation is especially appropriate when applied to FPGA designs because of the large number of available registers. To demonstrate and evaluate the benefits of C-slow retiming, we constructed an automatic tool which modifies designs targeting the Xilinx Virtex family of FPGAs. Applying our tool to three benchmarks: AES encryption. Smith/Waterman sequence matching, and the LEON 1 synthesized microprocessor core, we were able to substantially increase the total throughput. For some parameters, throughput is effectively doubled.
RapidSmith is an open-source framework that allows for the exploration of novel approaches to the FPGA CAD flow for Xilinx devices. However, RapidSmith has poor sup- port for manipulating designs below the slice level...
详细信息
The paper presents several improvements to state-of-the-art in FPGA technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD '04]. Improved cut enumeration computes all K-...
详细信息
ISBN:
(纸本)1595932925
The paper presents several improvements to state-of-the-art in FPGA technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD '04]. Improved cut enumeration computes all K-feasible cuts without pruning for up to 7 inputs for the largest MCNC benchmarks, A new technique for on-the-fly cut dropping reduces by orders of magnitude memory needed to represent cuts for large designs. Improved area recovery leads to mappings with area on average 7% smaller than DAOmap, while preserving delay optimality when starting from the same optimized netlists. Applying mapping with structural choices derived by a synthesis flow on average reduces delay by 7% and area by 14%, compared to DAOmap. Copyright 2006 acm.
While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for FPGA-based systems that comes at a reduced cost in terms of design time, volume, and ...
详细信息
ISBN:
(纸本)9780897919784
While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for FPGA-based systems that comes at a reduced cost in terms of design time, volume, and weight. We partition the physical design into a set of tiles. In response to a component failure, we capitalize on the unique reconfiguration capabilities of FPGAs and replace the affected tile with a functionally equivalent tile that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components. Experimental results conducted on a subset of the MCNC benchmarks demonstrate a high level of reliability with low timing and hardware overhead.
This paper presents an analysis of the potential yield loss in FPGA due to random defects in metal layers. A proven yield model is adapted to target the FPGA interconnect layers in order to predict the manufacturing y...
详细信息
ISBN:
(纸本)9781595930293
This paper presents an analysis of the potential yield loss in FPGA due to random defects in metal layers. A proven yield model is adapted to target the FPGA interconnect layers in order to predict the manufacturing yield. Defect parameters from the 2003 SIA roadmap are used to investigate the trend in yield loss due to defects in interconnect layers in the future. It is shown that the low yield predicted for the 45nm technology node and beyond is a cause for concern. The potential impact on yield using two different approaches, namely redundant circuits and fault tolerant design, is also presented. Copyright 2005 acm.
fieldprogrammablegatearrays (FPGAs) are being used to provide fast Internet Protocol (IP) packet routing and advanced queuing in a highly scalable network switch. A new module, called the field-programmable Port Ex...
详细信息
ISBN:
(纸本)9781581131932
fieldprogrammablegatearrays (FPGAs) are being used to provide fast Internet Protocol (IP) packet routing and advanced queuing in a highly scalable network switch. A new module, called the field-programmable Port Extender (FPX), is being built to augment the Washington University Gigabit Switch (WUGS) with reprogrammable logic. FPX modules reside at the edge of the WUGS switching fabric. Physically, the module is inserted between an optical line card and the WUGS gigabit switch back-plane. The hardware used for this project allows ports of the switch populated with an FPX to operate at rates up to 2.4 Gigabits/second. The aggregate throughput of the system scales with the number of switch ports. Logic on the FPX module is implemented with two FPGA devices. The first device is used to interface between the switch and the line card, while the second is used to prototype new networking functions and protocols. The logic on the second FPGA can be re-programmed dynamically via control cells sent over the network.
Advanced Microelectronics Department at Sandia National Laboratories. We present an automatic logic synthesis method targeted for high-performance asynchronous FPGA (AFPGA) architectures. Our method transforms sequent...
详细信息
ISBN:
(纸本)9781595930293
Advanced Microelectronics Department at Sandia National Laboratories. We present an automatic logic synthesis method targeted for high-performance asynchronous FPGA (AFPGA) architectures. Our method transforms sequential programs as well as high-level descriptions of asynchronous circuits into fine-grain asynchronous process netlists suitable for an AFPGA. The resulting circuits are inherently pipelined, and can be physically mapped onto our AFPGA with standard partitioning and place-and-route algorithms. For a wide variety of benchmarks, our automatic synthesis method not only yields comparable logic densities and performance to those achieved by hand placement, but also attains a throughput close to the peak performance of the FPGA. Copyright 2005 acm.
Although runtime dynamic reconfiguration of the FPGA devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no stra...
详细信息
Although runtime dynamic reconfiguration of the FPGA devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no straightforward design methodology, and the partitioning and CAD tool support is poor. This paper presents general concepts implemented in a placement and routing tool that provides an environment where designs that are partially and dynamically reconfigurable can be processed in order to be implemented on FPGAs that support this technology, such as the Armel AT40K and AT94K series. The function of the tool is demonstrated on a simple real-world example.
This paper presents an FPGA-specific implementation of the floating-point tangent function. The implementation inputs values in the interval [-π/2,π/2], targets the IEEE-754 single-precision format and has an accura...
详细信息
We present the design of a high-performance, highly pipelined asynchronous FPGA. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficient...
详细信息
We present the design of a high-performance, highly pipelined asynchronous FPGA. We describe a very fine-grain pipelined logic block and routing interconnect architecture, and show how asynchronous logic can efficiently take advantage of this large amount of pipelining. Our FPGA, which does not use a clock to sequence computations, automatically "self-pipelines" its logic without the designer needing to be explicitly aware of all pipelining details. This property makes our FPGA ideal for throughput-intensive applications and we require minimal place and route support to achieve good performance. Benchmark circuits taken from both the asynchronous and clocked design communities yield throughputs in the neighborhood of 300-400 MHz in a TSMC 0.25μm process and 500-700 MHz in a TSMC 0.18μm process.
暂无评论