Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states th...
详细信息
Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states that improved clock frequency plus improved architecture yields a doubling of CPU performance every 18 months. This paper examines the impact of Moore's Law on the peak floating-point performance of FPGAs. Performance trends for individual operations are analyzed as well as the performance trend of a common instruction mix (multiply accumulate). The important result is that peak FPGA floating-point performance is growing significantly faster than peak floating-point performance for a CPU.
The proceedings contain 25 papers. The topics discussed include: a routing fabric for monolithically stacked 3D-FPGA;design of a logic element for implementing an asynchronous FPGA;designing efficient input interconne...
详细信息
ISBN:
(纸本)1595936009
The proceedings contain 25 papers. The topics discussed include: a routing fabric for monolithically stacked 3D-FPGA;design of a logic element for implementing an asynchronous FPGA;designing efficient input interconnect blocks for LUT clusters using counting and entropy;a synthesizable datapath-oriented embedded FPGA fabric;a versatile, low latency HyperTransport core;an FPGA-based Pentium in a complete desktop system;a 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA;variation-aware routing for FPGAs;stochastic physical synthesis for FPGAs with pre-routing interconnect uncertainty and process variation;post-route LUT output polarity selection for timing optimization;synthesis of an application-specific soft multiprocessor system;FPGA-friendly code compression for horizontal microcoded custom IPs;and a practical FPGA-based framework for novel CMP research.
Packet classification is an important operation for applications such as routers, firewalls or intrusion detection systems. Many algorithms and hardware architectures for packet classification have been created, but n...
详细信息
ISBN:
(纸本)9781605584102
Packet classification is an important operation for applications such as routers, firewalls or intrusion detection systems. Many algorithms and hardware architectures for packet classification have been created, but none of them cancompete with the speed of TCAMs in the worst case. We propose new hardware-based algorithm for packet classification. The solution is based on problem decomposition and is aimed at the highest network speeds. A unique property of the algorithm is the constant time complexity in terms of external memory accesses. The algorithm performs exactly two external memory accesses to classify a packet. Using FPGA and one commodity SRAM chip, a throughput of 150 million packets per second can be achieved. This makes throughput of 100 Gbps for the shortest packets. Further performance scaling is possible with more or faster SRAM chips. Copyright 2009 acm.
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the...
详细信息
ISBN:
(纸本)9781450326711
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.
FPGA user clocks are slow enough that only a fraction of the interconnect's is actually used. There may be an opportunity use throughput-oriented interconnect to decrease routing and wire area using on-chip serial...
详细信息
ISBN:
(纸本)9781605584102
FPGA user clocks are slow enough that only a fraction of the interconnect's is actually used. There may be an opportunity use throughput-oriented interconnect to decrease routing and wire area using on-chip serial signaling, especially datapath designs which operate on words instead of bits. To so, these links must operate reliably at very high bit rates. We wave pipelining and surfing source-synchronous schemes the presence of power supply and crosstalk noise. In particular, noise is a critical modeling challenge;better models are for FPGA power grids. Our results show that wave pipelining operate at rates as high as 5Gbps for short links, but it is sensitive to noise in longer links and must run much slower to reliable. In contrast, surfing achieves a stable operating bit rate of 3Gbps and is relatively insensitive to noise. Copyright 2009 acm.
In designing FPGAs, it is important to achieve a good balance between the number of logic blocks, such as Look-Up Tables (LUTs), and wiring resources. It is difficult to find an optimal solution. In this paper, we pre...
详细信息
In designing FPGAs, it is important to achieve a good balance between the number of logic blocks, such as Look-Up Tables (LUTs), and wiring resources. It is difficult to find an optimal solution. In this paper, we present an FPGA design methodology to efficiently find well-balanced FPGA architectures. The method covers all aspects of FPGA development from the architecture-decision process to physical implementation. It has been used to develop a new FPGA that can implement circuits that are twice as large as those implementable with the previous version but with half the number of logic blocks. This indicates that the methodology is effective in developing well-balanced FPGAs.
Due to their generic and highly programmable nature, FPGAs provide the ability to implement a wide range of applications. However, it is this nonspecific nature that has limited the use of FPGAs in scientific applicat...
详细信息
ISBN:
(纸本)1595932925
Due to their generic and highly programmable nature, FPGAs provide the ability to implement a wide range of applications. However, it is this nonspecific nature that has limited the use of FPGAs in scientific applications that require floating-point arithmetic. Even simple floating-point operations consume a large amount of computational resources. In this paper, we introduce embedding floating-point multiply-add units in an island style FPGA. This has shown to have an average area savings of 55.0% and an average increase of 40.7% in clock rate over existing architectures. Copyright 2006 acm.
RapidSmith is an open-source framework that allows for the exploration of novel approaches to the FPGA CAD flow for Xilinx devices. However, RapidSmith has poor sup- port for manipulating designs below the slice level...
详细信息
We are proposing a shared-memory communication infrastructure that provides a common parallel programming interface for FPGA and CPU components in a heterogeneous system. Our intent is to ease the integration of recon...
详细信息
This paper discusses architectural issues arising from the use of dynamic reconfiguration and shows a possible use of dynamic reconfiguration to extend and accelerate a computation performed in system-on-a-chip design...
详细信息
This paper discusses architectural issues arising from the use of dynamic reconfiguration and shows a possible use of dynamic reconfiguration to extend and accelerate a computation performed in system-on-a-chip designs with microprocessors with fixed instruction sets. Further a sample application is discussed that uses a dynamically reconfigurable FPGA to implement different floating-point calculations in hardware, reconfigured as required by the execution of the user code. The implementation data for two dynamically reconfigurable platforms available on the market - the Xilinx Virtex2 family FPGAs and the Atmel FPSLIC family FPGAs - is compared in terms of resource requirements, operating frequency, and power consumption.
暂无评论