This paper presents a new universal test approach for FPGA logic resources. It includes a new greedy configuration-generating algorithm, and a new FPGA Configurable Logic Block (CLB) test model. The model is based on ...
详细信息
This paper presents a new universal test approach for FPGA logic resources. It includes a new greedy configuration-generating algorithm, and a new FPGA Configurable Logic Block (CLB) test model. The model is based on two directed graphs: a structure graph and a configuration graph, which convey the important information from the CLB gate level circuit to the greedy configuration- generating algorithm, so the algorithm can generate minimum the number of test configurations to achieve a given fault coverage. With this new approach, researchers can easily get test patterns optimized both in test time and fault coverage for different FPGA architectures. At the end, we compare experiment results with other test approaches, and the results show test pattern from the new approach is even more efficient than pattern from manual optimization. It also proves that the approach can deal with different types of FPGAs very well.
The general computing world settled on radix 2 floating point representations over three decades ago. The analyses which led to this choice were all based on the underlying premise that the goal of a floating-point re...
详细信息
The general computing world settled on radix 2 floating point representations over three decades ago. The analyses which led to this choice were all based on the underlying premise that the goal of a floating-point representation is to maximize numerical accuracy per bit of data. However, the unique nature of FPGA-based computations makes numerical accuracy per unit of FPGA resources a more important measure by which to judge the usefulness of a given floating point representation. Due to the high cost of shifters as implemented on FPGAs, higher radix floating-point representations are uniquely suited to FPGA-based computations, especially high precision calculations which require the support of denormalized numbers. Higher radix representations use FPGA resources more efficiently. For example, a radix 16 adder requires 20% less LUTs than its radix 2 counterpart, while delivering equal worst-case and better average case numerical accuracy.
Current FPGA placement algorithms estimate the routability of a placement using architecture-specific metrics. The shortcoming of using architecture-specific routability estimates is limited adaptability. A placement ...
详细信息
Current FPGA placement algorithms estimate the routability of a placement using architecture-specific metrics. The shortcoming of using architecture-specific routability estimates is limited adaptability. A placement algorithm that is targeted to a class of architecturally similar FPGAs may not be easily adapted to other architectures. The subject of this paper is the development of a routability-driven architecture adaptive FPGA placement algorithm called Independence. The core of the Independence algorithm is a simultaneous place-and-route approach that tightly couples a simulated annealing placement algorithm with an architecture adaptive FPGA router (Pathfinder). The results of our experiments demonstrate Independence's adaptability to island-style and hierarchical FPGA architectures. The quality of the placements produced by Independence is within 5% of the quality of VPR's placements and 17% better than the placements produced by HSRA's place-and-route tool. Further, our results show that Independence produces clearly superior placements on routing-poor island-style FPGA architectures.
Reconfigurable architectures are well suited for wireless applications since they provide high performance computation together with the capability to adapt to changing communication protocols. Moving to 90nm technolo...
详细信息
ISBN:
(纸本)9781595930293
Reconfigurable architectures are well suited for wireless applications since they provide high performance computation together with the capability to adapt to changing communication protocols. Moving to 90nm technology and below, FPGAs could suffer from leakage energy consumption due to the large number of inactive transistors. We propose to combine super cut-off, body biasing and multi-threshold techniques to reduce the leakage current of programmable interconnections, which give by far the main contribution to static power dissipation. Super cut-off (gate biasing) technique is well suited for high-speed pass-transistors, while body biasing can be adopted for large buffers. On the other hand high threshold transistors can be used where delays are not critical. We show the design of SRAM cells generating the required high swing signals for gate-biased and body-biased transistors without affecting transistor reliability. Compared to a standard design the adoption of a mixed technique reduces routing leakage by more than one order of magnitude with only a 5% increase in switch delay and a 9% in tile area, while with respect to a full dual-threshold approach, delay is reduced by 23%. Copyright 2005 acm.
This paper introduces a methodology for prototyping Globally Asynchronous Locally Synchronous (GALS) circuits on synchronous commercial FPGAs. A library of required elements for implementing GALS circuits is proposed ...
详细信息
This paper introduces a methodology for prototyping Globally Asynchronous Locally Synchronous (GALS) circuits on synchronous commercial FPGAs. A library of required elements for implementing GALS circuits is proposed and general design considerations to successfully implement a GALS circuit on FPGA are discussed. The library includes clock generators and arbiters, and different port controllers. Different implementations of these circuits and their advantages and disadvantages are explored. At the end we present a GALS Reed-Solomon decoder as a practical example. The results show that the GALS approach improves the performance of the circuit by 11% and reduces the power consumption by 18.7% to 19.6% considering different error rates. On the other hand, the area of the circuit is increased by 51% which is acceptable considering that a pure synchronous circuit including a central controller is decomposed to generate GALS system and 29% of this overhead belongs to distributing controller in different modules. Deploying better decomposition methods can reduce this overhead substantially.
Today high-end video and multimedia processing applications require huge amounts of memory. For cost reasons, the usage of conventional dynamic RAM (SDRAM) is preferred. However, SDRAM access optimization is a complex...
详细信息
Today high-end video and multimedia processing applications require huge amounts of memory. For cost reasons, the usage of conventional dynamic RAM (SDRAM) is preferred. However, SDRAM access optimization is a complex task, especially if multi-stream access with different QoS (Quality of Service) requirements is involved. At SIPS 2003 conference, we presented a multi-stream DDR-SDRAM controller IP covering combinations of low latency requirements for processor cache access, hard real-time constraints for periodic video signals and hard real-time bursty accesses for video coprocessors. To handle these contradictory QoS requirements at high system performance, a combination of an 2-stage scheduling algorithm and static priorities was used. This poster describes an additional flow control which greatly enhances the overall performance and controlability. The efficient but simple controller design makes the controller well suited for FPGA based designs. Experiments with our FPGA based high-end video platform demonstrate the superiority of this architecture.
Protein sequences with unknown functionality are often compared to a set of known sequences to detect functional similarities. Efficient dynamic-programming algorithms exist for solving this problem, however current s...
详细信息
ISBN:
(纸本)9781595930293
Protein sequences with unknown functionality are often compared to a set of known sequences to detect functional similarities. Efficient dynamic-programming algorithms exist for solving this problem, however current solutions still require significant scan times. These scan time requirements are likely to become even more severe due to exponential database growth. In this paper we present a new approach to bio-sequence database scanning using re-configurable FPGA-based hardware platforms to gain high performance at low cost. Efficient mappings of the Smith-Waterman algorithm using fine-grained parallel processing elements (PEs) that are tailored towards the parameters of a query have been designed. We use customization opportunities available at run-time to dynamically hyper customize the systolic array to make better use of available resource. Our FPGA implementation achieves a speedup of approximately 170 for linear gap penalties and 125 for affine gap penalties as compared to a standard desktop computing platform. We show how hyper-customization at run-time can be used to further improve the performance. Copyright 2005 acm.
For several years now, modern FPGAs have included onchip network related hard cores. These cores include Xilinx's RocketIO and Altera's RapidIO serial transceivers. However, to use these cores in a complete ne...
详细信息
For several years now, modern FPGAs have included onchip network related hard cores. These cores include Xilinx's RocketIO and Altera's RapidIO serial transceivers. However, to use these cores in a complete networking application may be a daunting task to a non-networking expert. In addition to the complicated use of these components, the high performance needs of modern networking applications require designs that are optimized for low latency and a moderately high clock rate. Therefore to meet these challenges, we present CUSP (Click Utilizing Speculation and Parallelism) for reconfigurable hardware platforms. Click is an accepted software network router framework that is similar to CUSP, but specifically built for a Linux platform and software network routers. CUSP, while also having a modular design of reusable components, additionally provides automated speculation and parallelism to gain better performance on FPGAs. An accompanying scripting language allows quick creation of these routers from a body of existing components. We have implemented an example network application through the CUSP design flow and its performance will be compared against alternative network design methods. Copyright 2005 acm.
This paper proposes a new CLB architecture for FPGAs and associated testing and reconfiguration techniques that detect single routing/interconnect errors and correct them using partial reconfiguration. The results of ...
详细信息
This paper proposes a new CLB architecture for FPGAs and associated testing and reconfiguration techniques that detect single routing/interconnect errors and correct them using partial reconfiguration. The results of error detection are propagated to a single output port by a chain-like shift register, which are used to reduce the segment of the routing architecture that has to be reconfigured. The error is corrected by partially reconfiguring the above minimal segment alone, thereby reducing the time for reconfiguration. The proposed testing technique detects all possible routing errors that affects the logic of the circuit, including bridging faults. It is noteworthy that the time required for error detection is independent of both the number of switch matrices and the number of logic blocks in the FPGA. Empirically, our technique detected all single interconnect errors in benchmark circuits. In addition, for the majority of errors, our correction technique required less than 10% of the switch matrices to be reconfigured to correct the errors.
This paper introduces a novel 3-Dimensional (3D) vertically integrated adaptive computing system. This 3D-SoftChip is a combination of state-of-the-art processing and interconnection technology. It comprises the verti...
详细信息
This paper introduces a novel 3-Dimensional (3D) vertically integrated adaptive computing system. This 3D-SoftChip is a combination of state-of-the-art processing and interconnection technology. It comprises the vertical integration of two chips (a Configurable Array Processor and an Intelligent Configurable Switch) through indium bump 3D interconnections. The Configurable Array Processor (CAP) is an array of heterogeneous processing elements (PEs) while the Intelligent Configurable Switch (ICS) comprises a switch block, 32-bit dedicated RISC processor for control, on-chip program/data memory, data frame buffer along with a Direct Memory Access (DMA) controller. This paper introduces the 3D-Softchip architecture for real-time communication and multimedia signal processing as a next gene! ration computing system. The paper further describes the up-to-date HW/SW co-design and verification methodology including high level system modeling and architecture exploration of 3D-SoftChip using SystemC in order to determine the optimum hardware specification in the early design stage.
暂无评论