In this work, the design of an energy-efficient FPGA interconnect architecture has been investigated. It concerns a dual-supply solution, where the logic blocks are powered by a nominal voltage supply and the intercon...
详细信息
In this work, the design of an energy-efficient FPGA interconnect architecture has been investigated. It concerns a dual-supply solution, where the logic blocks are powered by a nominal voltage supply and the interconnect part is powered by a reduced voltage supply. The behaviour of a fully-buffered, a fully pass-transistor based and a hybrid buffer and pass-transistor architecture has been investigated over a range of power supply voltages. It is found that there exists an optimal ratio between the number of pass-transistor and tri-state buffer switches depending on the load and power supply involved. By reducing the signal voltage swing on the interconnect, the need for a fully tri-state buffer-based interconnect is eliminated, thus saving valuable area and power. Through benchmark studies, it is confirmed that using an optimal composite of pass-transistor and tri-state buffer switches operating at a reduced power supply can meet the same speed as compared to the full-swing scenario at a much lower power consumption. An average reduction in power-delay of 4.4x for low-load critical paths and 2.7x for high-load critical paths is achieved using buffer receivers. Using levelshifter receivers, an average reduction in power-delay of 4.7x for low-load critical paths and 2.8x for high-load critical paths is obtained. It is also found that due to partially replacing tri-state buffers by pass-transistor switches and inspite of using levelshifters, we can save up to a factor of 4x in interconnect area as compared to fully-buffered architectures. The results have been validated over various benchmarks in a 0.1 μm CMOS technology.
A high-speed and low-power fieldprogrammablegate Array (FPGA) is the dream of digital designers. The availability of Silicon Germanium (SiGe) Heterojunction Bipolar Transistor (HBT) devices has opened a door for GHz...
详细信息
A high-speed and low-power fieldprogrammablegate Array (FPGA) is the dream of digital designers. The availability of Silicon Germanium (SiGe) Heterojunction Bipolar Transistor (HBT) devices has opened a door for GHz FPGAs [3, 4], In the past, high static power consumption discouraged the significant scale-up of bipolar FPGAs. This paper details new ideas to reduce power and layout area in designing high-speed SiGe BiCMOS FPGAs. The paper explains new methods to reduce circuitry and utilize novel serial dual configuration planes to achieve an efficient programmability. In addition, new layout techniques are developed to reduce the bipolar areas. Several SiGe FPGA test chips based on Xilinx 6200 and Virtex Configurable Logic Blocks (CLBs) have been fabricated for demonstration. Copyright 2005 acm.
We shall present a simple but useful method which detects all multiple stuck-at faults in the application and configuration inputs of LUTs. A novel method for testing of stuck-at faults at control bits of flip flops h...
详细信息
ISBN:
(纸本)0769522645
We shall present a simple but useful method which detects all multiple stuck-at faults in the application and configuration inputs of LUTs. A novel method for testing of stuck-at faults at control bits of flip flops has also been proposed. The aim is to integrate testing of LUTs, flip flops and multiplexers which will reduce the number of configurations and hence minimize the testing time.
We develop a novel on-line built-in self-test (BIST) technique for testing FPGAs that has a very high diagnosability even in presence of clustered faults, a fault pattern for which previous BIST methods proved ineffec...
详细信息
We develop a novel on-line built-in self-test (BIST) technique for testing FPGAs that has a very high diagnosability even in presence of clustered faults, a fault pattern for which previous BIST methods proved ineffective. Using an iterative bootstrapping process, our method first finds a fault-free test circuit in each BISTer tile and then tests the PLBs functionally using a fault-detection-and-gross-diagnosis phase followed by a time-efficient adaptive diagnosis phase. We establish the correctness of the deterministic phases of our BIST technique. We also analyze the probability of correct diagnosis by our BISTer in the presence of multiple random faults. Simulation results show that our BIST technique has very high fault coverage (98.7% for 25% fault density for random faults and 98.9% for 8.8% fault density for clustered faults) and low fault latency, and supports the theoretical analysis. Copyright 2005 acm.
The proceedings contain 41 papers. The topics discussed include: the diagnostic method of detecting and assessing the impact of physical design optimizations on routing;routing of analog busses with parasitic symmetry...
详细信息
The proceedings contain 41 papers. The topics discussed include: the diagnostic method of detecting and assessing the impact of physical design optimizations on routing;routing of analog busses with parasitic symmetry;coupling aware timing optimization and antenna avoidance in layer assignment;a fast algorithm for power grid design;an efficient surface-based low-power buffer insertion algorithm;multi-bend bus driven floorplanning;mapping algorithm for large-scale fieldprogrammable analog array;thermal via placement in 3D ICs;unified quadratic programming approach for mixed mode placement;a semi-persistent clustering technique for VLSI circuit placement;and evaluation of placer soboptimality via zero-change netlist transformers.
In this paper we propose an FPGA implementation of a multi protocol Weighted Fair (WF) queuing algorithm able to handle variable length packets targeted for Packet Over Sonet (POS) interfaces and ideal for the design ...
详细信息
In this paper we propose an FPGA implementation of a multi protocol Weighted Fair (WF) queuing algorithm able to handle variable length packets targeted for Packet Over Sonet (POS) interfaces and ideal for the design of hybrid IP/ATM switches. Our contributions is an extension to an existing 4 channel scheduler architecture that combines the Highest Value First scheme and Round Robin scheme, to a modular multi channel scheduler design. The improvement we offer here compared to the previuous implementation is that we have used the existing 4 channel core module to build a higher order WF queuing system without decreasing its overall performance. As a result, our scheduler is general enough to accommodate ATM (UTOPIA Level3/4), POS Phy Level3 (or PL3 for OC48) as well as POS Phy Level4 (or PL4 for OC192) interfaces. Copyright 2005 acm.
This paper proposes a new CLB architecture for FPGAs and an associated testing technique that detects routing errors caused by SEUs in the SRAM configuration memory of the FPGA. The proposed testing technique detects ...
详细信息
ISBN:
(纸本)0769522645
This paper proposes a new CLB architecture for FPGAs and an associated testing technique that detects routing errors caused by SEUs in the SRAM configuration memory of the FPGA. The proposed testing technique detects all possible routing errors including bridging faults, and requires a single configuration of only the LUTs of the FPGA. Any routing error that affects the logic of the circuit is detected by the proposed technique in a maximum of 8 clock cycles. It Is noteworthy that the time required for error detection is independent of both the number of switch matrices and the number of logic blocks in the FPGA.
An important application of dynamically and partially reconfigurable computing platforms is in dynamic task allocation and execution. On-line synthesis, on-line placement and on-line routing are the three essential st...
详细信息
ISBN:
(纸本)0769522645
An important application of dynamically and partially reconfigurable computing platforms is in dynamic task allocation and execution. On-line synthesis, on-line placement and on-line routing are the three essential steps in implementing an incoming task on the FPGA during run-time. Whereas there has been some research in on-line placement, on-line synthesis received relatively little attention. In this paper we present what is believed to be the first on-line synthesis methodology for partially reconfigurable FPGAs. In on-line synthesis time for synthesise should be kept low while ensuring the placeability of the synthesized design on the FPGA in the available empty area and meeting the performance requirements. We ensure placeability by considering and maintaining the available area on the FPGA surface as a collection of maximal empty rectangles. The proposed synthesizer allocates the FPGA resources adaptively and is incremental in nature. The algorithm is designed to be linear in terms of the number of operations to ensure its on-line usage. Our experimental results demonstrate the advantages of the proposed approach.
We propose a speculative multi-threading processor architecture called Pinot. Pinot exploits parallelism over a wide range of granularities without modifying program sources. Since exploitation of fine-grain paralleli...
详细信息
ISBN:
(纸本)0769524400
We propose a speculative multi-threading processor architecture called Pinot. Pinot exploits parallelism over a wide range of granularities without modifying program sources. Since exploitation of fine-grain parallelism suffers from limits of parallelism and overhead incurred by parallelization, it is better to extract coarse-grain parallelism. Coarse-grain parallelism is biased in some programs (mainly, numerical ones) and some program portions. Therefore, exploiting both coarsen and fine-grain parallelism is a key to the performance of speculative multithreading. The features of Pinot are as follows: (1) A parallelizing tool extracts parallelism at any level of granularity (e.g. even ten thousand instructions) from any program sub-structures (e.g. loops, calls, or basic blocks). The tool utilizes formulation in which the parallelization process is reduced to a combinatorial optimization problem. (2) A parallel execution model with extension of thread control instructions is designed in order to minimize the increase of the dynamic instruction count. The model employs implicit thread termination and cancellation, as well as register value transfer without synchronization. (3) A versioning cache called Version Resolution Cache (VRC) accomplishes both coarse- and fine-grained speculative multithreading. VRC operates as a large buffer for coarse-grained multi-threading. In addition, it provides low latency inter-thread communication with an update-based protocol for fine-grained multi-threading. We performed cycle-accurate simulations with 38 programs from the SPEC and MiBench benchmarks. The speedup with 4-processor-element-Pinot is up to 3.7 times, and 2.2 times on geometric mean against a conventional processor The speedup in a program (susan) drops from 3.7 to 1.6 when the speculative buffer size is limited to 256 bytes. It confirms that exploiting coarse-grain parallelism is essential to the improved performance. FPGA implementation shows 32% overhead of area and
暂无评论