The Arithmetic-Logic-Unit (ALU) is at the heart of a modern microprocessor, and its size and speed are often significant contributors to the overall processor's cost and performance. This paper presents the design...
详细信息
The Arithmetic-Logic-Unit (ALU) is at the heart of a modern microprocessor, and its size and speed are often significant contributors to the overall processor's cost and performance. This paper presents the design of the ALU used in Altera's NIOS 2.0 soft processor implemented on Altera's Apex 20KE FPGA architecture. This ALU enabled the 32-bit NIOS 2.0 to consume only 1200 LEs and run at 85MHz. This is a 50% size reduction and 70% speed improvement over its predecessor, NIOS 1.1. The Logic-element (LE) is the basic building block within the Apex architecture. Making full use of the advanced features of the LE has resulted in this novel ALU design. A functional representation of the logic is used to describe how the ALU performs the core set of NIOS instructions, and an LE representation shows the amount of logic-resources needed for the implementation. The cost of additional features such as a barrel-shifter and custom instructions is also described. Likely worst-case delays for different routing and logic elements are used to estimate the ALU's speed. Further speed and size optimizations are also presented from which it is possible to create ALU ranging in speed from 87 MHz to over 100 MHz.
Reconfigurable logic devices that are based on an FPGA substrate are gaining widespread acceptance. As such devices are used in many different configurations, manufacturers need to ensure that each potential configura...
详细信息
ISBN:
(纸本)0769520936
Reconfigurable logic devices that are based on an FPGA substrate are gaining widespread acceptance. As such devices are used in many different configurations, manufacturers need to ensure that each potential configuration will not fail due to device defects. This flexibility leads to severely increased test time. We show how to use reconfigurability to speed up test and diagnosis times of individual FPGA blocks. We present a scheme to incorporate our test architecture, reducing diagnostic and test times of individual FPGA blocks. The test architecture includes added Feedback Shift Registers (FSRs) that change the circuit configuration during test. Algorithms are presented to produce test and diagnosis test sets with a minimized number of test configurations, along with the creation of an FSR that produces the test and diagnosis sets by dynamic reconfiguration of the device.
The recent past has seen a tremendous increase in the size of design circuits that can be implemented in a single FPGA. The size and complexity of modern FPGAs has far outpaced the innovations in FPGA physical design....
详细信息
The recent past has seen a tremendous increase in the size of design circuits that can be implemented in a single FPGA. The size and complexity of modern FPGAs has far outpaced the innovations in FPGA physical design. The problems faced by FPGA designers are similar in nature to those that preoccupy ASIC designers, namely, interconnect delays and design management. However, this paper will show that a simple retargeting of ASIC physical design methodologies and algorithms to the FPGA domain will not suffice. We will show that several well researched problems in the ASIC world need new problem formulations and algorithms research to be useful for today 's FPGAs. Partitioning, floorplanning, placement, delay estimation schemes are only some of the topics that need complete overhaul. We will give problem formulations, motivated by experimental results, for some of these topics as applicable in the FPGA domain.
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (FPGAs). In this paper, we use the negotiation-based paradigm to pa...
详细信息
Placement and routing are the most time-consuming processes in automatically synthesizing and configuring circuits for field-programmablegatearrays (FPGAs). In this paper, we use the negotiation-based paradigm to parallelize placement. Our new FPGA placer, NAP (Negotiated Analytical Placement), uses an analytical technique for coarse placement and the negotiation paradigm for detailed placement. We describe the serial algorithm and report results. We also report findings related to parallelizing NAP under a multicast networking and multi-threaded operating system environment;the parallel placer is tolerant to multicast packet loss as well as out-of-order packet delivery. Our parallel placer exhibits little performance degradation while attaining speedups of 2 using 3 processors.
In recent years the challenge of high performance, low power retargettable embedded system has been faced with different technological and architectural solutions. In this paper we present a new configurable unit expl...
详细信息
In recent years the challenge of high performance, low power retargettable embedded system has been faced with different technological and architectural solutions. In this paper we present a new configurable unit explicitly designed to implement additional reconfigurable pipelined datapaths, suitable for the design of reconfigurable processors. A VLIW reconfigurable processor has been implemented on silicon in a standard 0.18 μm CMOS technology to prove the effectiveness of the proposed unit. Testing on a signal processing algorithms benchmark showed speedups from 4.3x to 13.5x and energy consumption reduction up to 92%.
C-slow retiming is a process of automatically increasing the throughput of a design by enabling fine grained pipelining of problems with feedback loops. This transformation is especially appropriate when applied to FP...
详细信息
C-slow retiming is a process of automatically increasing the throughput of a design by enabling fine grained pipelining of problems with feedback loops. This transformation is especially appropriate when applied to FPGA designs because of the large number of available registers. To demonstrate and evaluate the benefits of C-slow retiming, we constructed an automatic tool which modifies designs targeting the Xilinx Virtex family of FPGAs. Applying our tool to three benchmarks: AES encryption. Smith/Waterman sequence matching, and the LEON 1 synthesized microprocessor core, we were able to substantially increase the total throughput. For some parameters, throughput is effectively doubled.
In this paper, we present the first exact algorithm to solve the constrained I/O placement problem for FPGAs that support multiple I/O standards. We derive a compact integer linear programming formulation for the cons...
详细信息
In this paper, we present the first exact algorithm to solve the constrained I/O placement problem for FPGAs that support multiple I/O standards. We derive a compact integer linear programming formulation for the constrained I/O placement problem. The size of the integer linear program derived is independent of the number of I/O objects to be placed and hence is scalable to very large design instances. For example, for a Xilinx Virtex-E FPGA, the number of integer variables required is never more than 32 and is much smaller for practical design instances. Extensive experimental results using a non-commercial integer linear program solver shows that it only takes seconds to solve the resultant integer linear program in practice.
Though verification is significantly easier for FPGA-based digital systems than for ASIC or full-custom hardware, there are nonetheless many places for errors to occur. In this paper we discuss the verification proble...
详细信息
Though verification is significantly easier for FPGA-based digital systems than for ASIC or full-custom hardware, there are nonetheless many places for errors to occur. In this paper we discuss the verification problem for FPGAs and describe several methods for verifying end-to-end correctness of synthesis algorithms, a particularly complex portion of the CAD flow. Though the primary contribution of this paper is the analysis of the overall problem, we also give an algorithm for the automatic generation of test-vectors for simulation using information from the synthesis tool, and describe a second testing method that generates purposefully difficult designs in combination with input vectors to test them. We will show the validity of these methods by standard metrics such as simulation node-coverage and through the ability for the method to locate forced errors introduced by the synthesis tool.
This paper proposes a new high-level technique for designing fault tolerant systems in SRAM-based FPGAs, without modifications in the FPGA architecture. Traditionally, TMR has been successfully applied in FPGAs to mit...
详细信息
This paper proposes a new high-level technique for designing fault tolerant systems in SRAM-based FPGAs, without modifications in the FPGA architecture. Traditionally, TMR has been successfully applied in FPGAs to mitigate transient faults, which are likely to occur in space applications. However. TMR comes with high area and power dissipation penalties. The proposed technique was specifically developed for FPGAs to cope with transient faults in the user combinational and sequential logic, while also reducing pin count, area and power dissipation. The methodology was validated by fault injection experiments in an emulation board. We present some fault coverage results and a comparison with the TMR approach.
暂无评论