this paper presents a new architecture for time-to-digital enabling a time resolution of 17ps over a range 50ns with a conversion rate of 20MS/s. the proposed , implemented in a 65nm FPGA system, consists a pipelined ...
详细信息
ISBN:
(纸本)9781605584102
this paper presents a new architecture for time-to-digital enabling a time resolution of 17ps over a range 50ns with a conversion rate of 20MS/s. the proposed , implemented in a 65nm FPGA system, consists a pipelined interpolating time-to-digital converter (TDC). the TDC comprises a coarse time discriminator and ne delay line, capable of sustained operation at a clock of 300MHz. A Turbo version of the circuit implements pipelined interpolating TDC with suppressed dead to reach a conversion rate of 300MS/s at the expense a systematic asymmetry that requires fast error correction. TDCs proposed in this paper can be compensated process, voltage, and temperature (PVT) variations using conventional charge pump based feedback or a digital technique. Results demonstrate the suitability the approach for a variety of applications involving precision ultra-fast time discrimination, such as optical sensing, time-of-ight cameras, high throughput comlinks, RADARs, etc. Copyright 2009 acm.
Performance of fieldprogrammablegatearrays (FPGAs) used for floating-point applications is poor due to the complexity of floating-point arithmetic. Implementing floatingpoint units on FPGAs consume a large amount o...
详细信息
ISBN:
(纸本)9781605584102
Performance of fieldprogrammablegatearrays (FPGAs) used for floating-point applications is poor due to the complexity of floating-point arithmetic. Implementing floatingpoint units on FPGAs consume a large amount of resources. this makes FPGAs less attractive for use in floating-point intensive applications. therefore, there is a need for embedded floating-point units (FPUs) in FPGAs. However, if unutilized, embedded FPUs waste space on the FPGA die. To overcome this issue, we propose a flexible multi-mode embedded FPU for FPGAs that can be configured to perform a wide range of operations. the floating-point adder and multiplier in our embedded FPU can each be configured to perform one double-precision operation or two single-precision operations in parallel. To increase flexibility further, access to the large integer multiplier, adder and shifters in the FPU is provided. Benchmark circuits were implemented on both a standard Xilinx Virtex-II FPGA and on our FPGA with embedded FPU blocks. the results using our embedded FPUs showed a mean area improvement of 5.2 times and a mean delay improvement of 5.8 times for the doubleprecision benchmarks, and a mean area improvement of 4.4 times and a mean delay improvement of 4.2 times for the single-precision benchmarks. Copyright 2009 acm.
Packet classification is an important operation for applications such as routers, firewalls or intrusion detection systems. Many algorithms and hardware architectures for packet classification have been created, but n...
详细信息
ISBN:
(纸本)9781605584102
Packet classification is an important operation for applications such as routers, firewalls or intrusion detection systems. Many algorithms and hardware architectures for packet classification have been created, but none of them cancompete withthe speed of TCAMs in the worst case. We propose new hardware-based algorithm for packet classification. the solution is based on problem decomposition and is aimed at the highest network speeds. A unique property of the algorithm is the constant time complexity in terms of external memory accesses. the algorithm performs exactly two external memory accesses to classify a packet. Using FPGA and one commodity SRAM chip, a throughput of 150 million packets per second can be achieved. this makes throughput of 100 Gbps for the shortest packets. Further performance scaling is possible with more or faster SRAM chips. Copyright 2009 acm.
Architects of programmable logic devices (PLDs) face several challenges when optimizing a new device family for low manufacturing cost. When given an aggressive die-size goal, functional blocks that seem otherwise ins...
详细信息
ISBN:
(纸本)9780897919784
Architects of programmable logic devices (PLDs) face several challenges when optimizing a new device family for low manufacturing cost. When given an aggressive die-size goal, functional blocks that seem otherwise insignificant become targets for area reduction. Once low die cost is achieved, it is seen that testing and packaging costs must be considered. Interactions among these three cost contributors pose trade-offs that prevent independent optimization. this paper discusses solutions discovered by the architects optimizing the Altera FLEX 6000 architecture.
FPGA user clocks are slow enough that only a fraction of the interconnect's is actually used. there may be an opportunity use throughput-oriented interconnect to decrease routing and wire area using on-chip serial...
详细信息
ISBN:
(纸本)9781605584102
FPGA user clocks are slow enough that only a fraction of the interconnect's is actually used. there may be an opportunity use throughput-oriented interconnect to decrease routing and wire area using on-chip serial signaling, especially datapath designs which operate on words instead of bits. To so, these links must operate reliably at very high bit rates. We wave pipelining and surfing source-synchronous schemes the presence of power supply and crosstalk noise. In particular, noise is a critical modeling challenge;better models are for FPGA power grids. Our results show that wave pipelining operate at rates as high as 5Gbps for short links, but it is sensitive to noise in longer links and must run much slower to reliable. In contrast, surfing achieves a stable operating bit rate of 3Gbps and is relatively insensitive to noise. Copyright 2009 acm.
While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications. Hardware virtualization can be used to sign...
详细信息
ISBN:
(纸本)9780897919784
While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications. Hardware virtualization can be used to significantly reduce boththese costs. In this paper we describe the benefits of hardware virtualization, and show how it can be achieved using a combination of pipeline reconfiguration and run-time scheduling of both configuration streams and data streams. the result is PipeRench, an architecture that supports robust compilation and provides forward compatibility. Our preliminary performance analysis predicts that PipeRench will outperform commercial FPGAs and DSPs in both overall performance and in performance per mm2.
Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstra...
详细信息
ISBN:
(纸本)9780897919784
Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstrate how more advanced carry constructs can be embedded into FPGAs, providing significantly higher performance carry computations. We redesign the standard ripple carry chain to reduce the number of logic levels in each cell. We also develop entirely new carry structures based on high performance adders such as Carry Select, Carry Lookahead, and Brent-Kung. Overall, these optimizations achieve a speedup in carry performance of 3.8 times over current architectures.
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each s...
详细信息
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each subcircuit can be executed at a different time. In this paper, the partitioning of sequential circuits for execution on a DRFPGA is studied. To determine how to correctly partition a sequential circuit, and what are the costs in doing so, we propose a new gate-level model that handles time-multiplexed computation. We also introduce an enhanced force directed scheduling (FDS) algorithm to partition sequential circuits that finds a correct partition with low logic and communication costs, under the assumption that maximum performance is desired. We use our algorithm to partition seven large ISC AS'89 sequential benchmark circuits. the experimental results show that the enhanced FDS reduces communication costs by 27.5% with only a 1.1% increase in the gate cost compared to traditional FDS.
Carbon nanotubes (CNTs), withtheir unique electronic properties, are promising materials for building nanoscale circuits. In this paper, we present a new CNT-based FPGA architecture known as FPCNA. We define novel CN...
详细信息
ISBN:
(纸本)9781605584102
Carbon nanotubes (CNTs), withtheir unique electronic properties, are promising materials for building nanoscale circuits. In this paper, we present a new CNT-based FPGA architecture known as FPCNA. We define novel CNT and nanoswitch based components and characterize these components considering nanospecific process variations, including the variation caused by the random mixture of metallic and semiconducting CNTs. To evaluate the architecture, we develop a variation-aware physicaldesign flow which can handle both Gaussian and non-Gaussian random variables using variation-aware placement and routing. When FPCNA is evaluated withthis CAD flow, we see a 2.67 performance gain over a baseline CMOS FPGA at the same technology node (at a 95% performance yield). In addition, FPCNA offers a 4.5 footprint reduction compared to the baseline FPGA. these results demonstrate the potential of using CNTs and nanoswitches to build high performance FPGA circuits. Copyright 2009 acm.
this paper presents a vector generation approach for testing interconnects in configurable (SRAM-based) fieldprogrammablegatearrays (FPGAs). the proposed approach detects bridging faults and is based on quiescent c...
详细信息
ISBN:
(纸本)9780897919784
this paper presents a vector generation approach for testing interconnects in configurable (SRAM-based) fieldprogrammablegatearrays (FPGAs). the proposed approach detects bridging faults and is based on quiescent current (IDDQ) monitoring. Compared with previous voltage-based methods, IDDQ testing has the advantage of utilizing a small number of programming phases for configuring the FPGA during the test process with negligible observability requirements, even under multiple faults. Algorithms for test generation which exploit the homogeneous nature of the FPGA array, are described. An example using the XC4000 is described in detail. For testing the XC4000 series interconnect, a total of 20 phases and 11 vectors are required: 11 phases for S (switch) block testing, and 9 phases for C (connection) block testing.
暂无评论