Pattern matching for network security and intrusion detection demands exceptionally high performance. Much work has been done in this field, and yet there is still significant room for improvement in efficiency, flexi...
详细信息
Pattern matching for network security and intrusion detection demands exceptionally high performance. Much work has been done in this field, and yet there is still significant room for improvement in efficiency, flexibility, and throughput. We develop a novel linear-array string matching architecture using a buffered, two-comparator variation on the Knuth-Morris-Pratt(KMP) algorithm. For small (16 or fewer characters) patterns, it compares favorably with the state-of-the-art while providing better scalability and reconfiguration, and more efficient hardware utilization. KMP is a well-known, efficient string matching technique using a single comparator and a precomputed transition table. We add a second comparator and an input buffer, allowing the system to accept at least one character in each cycle and terminate after a number of clock cycles at maximum equal to the length of the input string plus the size of the buffer. The system also provides a clean, modular route to reconfiguring the patterns on-the-fly and scaling the system to support more units, using several rows of linear array elements. In this paper, we prove the bound on the buffer size and running time, and provide performance comparisons against other approaches.
Understanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling t...
详细信息
Understanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling the electromagnetic space. The 3D FDTD buried object detection forward model is emerging as a useful application in mine detection and other subsurface sensing areas. However, the computation of this model is complex and time consuming. Implementing this algorithm in hardware will greatly increase its computational speed and widen its use in many other areas. We present an FPGA implementation to speedup the pseudo-2D FDTD algorithm which is a simplified version of the 3D FDTD model. The pseudo-2D model can be upgraded to 3D with limited modification of structure. We implement the pseudo-2D FDTD model for layered media and complete boundary conditions on an FPGA. The computational speed on the reconfigurable hardware design is about 24 times faster than a software implementation on a 3.0GHz PC. The speedup is due to pipelining, parallelism, use of fixed point arithmetic, and careful memory architecture design.
The cost functions used to evaluate logic synthesis transformations for FPGAs are far removed from the final speed and routability determined after placement, routing and timing analysis. This distance has given rise ...
详细信息
The cost functions used to evaluate logic synthesis transformations for FPGAs are far removed from the final speed and routability determined after placement, routing and timing analysis. This distance has given rise to the field of physical synthesis, which attempts to improve logic synthesis by employing cost functions that contain placement, routing and/or timing analysis information. In this work we take this notion to an extreme that we call omniscience, in which post-routing timing analysis is provided in the context of a manual editor in which the user selects logical and physical transformations. After each incremental circuit modification, the user is informed of the circuit performance after routing and timing analysis. Since the computations involved in providing this level of information are large, we restrict the application to relatively small circuits, no larger than 1000 logic elements. Using this approach on a commercial FPGA, we propose a set of logic transformations specific to the logic and routing architecture of the Xilinx Virtex-E device. On a set of 10 circuits we have achieved an average performance improvement of 10% when both logical and physical changes are used. Another value of the editor is that it reveals new types of automatable physical-synthesis transformations and optimization strategies that arise from architectural properties of the target device.
FPGAs are being increasingly used in a wide variety of applications. While power optimization has been only of secondary importance in many FPGA applications, growing importance of leakage in FPGAs designed in 90nm an...
详细信息
FPGAs are being increasingly used in a wide variety of applications. While power optimization has been only of secondary importance in many FPGA applications, growing importance of leakage in FPGAs designed in 90nm and below makes it imperative to treat power optimization as a first class citizen. In this paper, we propose a leakage-saving technique for FPGAs that involves dividing the FPGA fabric into small regions and switching on/off the power supply to each region using a sleep transistor in order to conserve leakage energy. Specifically, the regions not used by the placed design are supply gated. Next, we present a new placement strategy to increase the number of regions that can be supply gated. Finally, the supply gating technique is extended to exploit idleness in different parts of the same design during different time periods. Our experiments with different region sizes using various commercial and academic designs indicate that the proposed optimization outperforms conventional placement, and reduces leakage power consumption significantly.
Traditional FPGAs use uniform supply voltage Vdd and uniform threshold voltage Vt. We propose to use pre-defined dual-Vdd and dual-Vt fabrics to reduce FPGA power. We design FPGA circuits with dual-Vdd/dual-Vt to effe...
详细信息
Traditional FPGAs use uniform supply voltage Vdd and uniform threshold voltage Vt. We propose to use pre-defined dual-Vdd and dual-Vt fabrics to reduce FPGA power. We design FPGA circuits with dual-Vdd/dual-Vt to effectively reduce both dynamic power and leakage power, and define dual-Vdd/dual-Vt FPGA fabrics based on the profiling of benchmark circuits. We further develop CAD algorithms including power-sensitivity based voltage assignment and simulated-annealing based placement to leverage such fabrics. Compared to the conventional fabric using uniform Vdd/Vt at the same target clock frequency, our new fabric using dual Vt achieves 9% to 20% power reduction. How-ever, the pre-defined FPGA fabric using both dual Vdd and dual Vt only achieves on average 2% extra power reduction. It is because that the pre-designed dual-Vdd layout pattern introduces non-negligible performance penalty. Therefore, programmability of supply voltage is needed to achieve significant power saving for dual-Vdd FPGAs. To our best knowledge, it is the first in-depth study on applying both dual-Vdd and dual-Vt to FPGA considering circuits, fabrics and CAD algorithms.
The Arithmetic-Logic-Unit (ALU) is at the heart of a modern microprocessor, and its size and speed are often significant contributors to the overall processor's cost and performance. This paper presents the design...
详细信息
The Arithmetic-Logic-Unit (ALU) is at the heart of a modern microprocessor, and its size and speed are often significant contributors to the overall processor's cost and performance. This paper presents the design of the ALU used in Altera's NIOS 2.0 soft processor implemented on Altera's Apex 20KE FPGA architecture. This ALU enabled the 32-bit NIOS 2.0 to consume only 1200 LEs and run at 85MHz. This is a 50% size reduction and 70% speed improvement over its predecessor, NIOS 1.1. The Logic-element (LE) is the basic building block within the Apex architecture. Making full use of the advanced features of the LE has resulted in this novel ALU design. A functional representation of the logic is used to describe how the ALU performs the core set of NIOS instructions, and an LE representation shows the amount of logic-resources needed for the implementation. The cost of additional features such as a barrel-shifter and custom instructions is also described. Likely worst-case delays for different routing and logic elements are used to estimate the ALU's speed. Further speed and size optimizations are also presented from which it is possible to create ALU ranging in speed from 87 MHz to over 100 MHz.
Reconfigurable logic devices that are based on an FPGA substrate are gaining widespread acceptance. As such devices are used in many different configurations, manufacturers need to ensure that each potential configura...
详细信息
ISBN:
(纸本)0769520936
Reconfigurable logic devices that are based on an FPGA substrate are gaining widespread acceptance. As such devices are used in many different configurations, manufacturers need to ensure that each potential configuration will not fail due to device defects. This flexibility leads to severely increased test time. We show how to use reconfigurability to speed up test and diagnosis times of individual FPGA blocks. We present a scheme to incorporate our test architecture, reducing diagnostic and test times of individual FPGA blocks. The test architecture includes added Feedback Shift Registers (FSRs) that change the circuit configuration during test. Algorithms are presented to produce test and diagnosis test sets with a minimized number of test configurations, along with the creation of an FSR that produces the test and diagnosis sets by dynamic reconfiguration of the device.
The recent past has seen a tremendous increase in the size of design circuits that can be implemented in a single FPGA. The size and complexity of modern FPGAs has far outpaced the innovations in FPGA physical design....
详细信息
The recent past has seen a tremendous increase in the size of design circuits that can be implemented in a single FPGA. The size and complexity of modern FPGAs has far outpaced the innovations in FPGA physical design. The problems faced by FPGA designers are similar in nature to those that preoccupy ASIC designers, namely, interconnect delays and design management. However, this paper will show that a simple retargeting of ASIC physical design methodologies and algorithms to the FPGA domain will not suffice. We will show that several well researched problems in the ASIC world need new problem formulations and algorithms research to be useful for today 's FPGAs. Partitioning, floorplanning, placement, delay estimation schemes are only some of the topics that need complete overhaul. We will give problem formulations, motivated by experimental results, for some of these topics as applicable in the FPGA domain.
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (FPGAs). In this paper, we use the negotiation-based paradigm to pa...
详细信息
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (FPGAs). In this paper, we use the negotiation-based paradigm to parallelize placement. Our new FPGA placer, NAP (Negotiated Analytical Placement), uses an analytical technique for coarse placement and the negotiation paradigm for detailed placement. We describe the serial algorithm and report results. We also report findings related to parallelizing NAP under a multicast networking and multi-threaded operating system environment;the parallel placer is tolerant to multicast packet loss as well as out-of-order packet delivery. Our parallel placer exhibits little performance degradation while attaining speedups of 2 using 3 processors.
暂无评论