A FPGA configuration method named configuration cloning is developed to exploit spatial and temporal regularity and locality in algorithms and architectures by copying and operating on the configuration bit-stream alr...
详细信息
A FPGA configuration method named configuration cloning is developed to exploit spatial and temporal regularity and locality in algorithms and architectures by copying and operating on the configuration bit-stream already resident in a FPGA. The method resulted in speed and power improvement over off-chip partial reconfiguration techniques, while not requiring additional interconnects and control hardware. Cloning requires only a small amount of hardware overhead. Digital signal processing applications are discussed to demonstrate the order of magnitude reductions in configuration time and power.
A reconfigurable architecture optimized for media processing, and based on 4-bit arithmetic logic unit (ALU) and interconnect is described. Together, these allow the area devoted to configuration bits and routing swit...
详细信息
A reconfigurable architecture optimized for media processing, and based on 4-bit arithmetic logic unit (ALU) and interconnect is described. Together, these allow the area devoted to configuration bits and routing switches to be about 50% of the area of the basic CHESS array, leaving the rest available for user-visible functional units. CHESS flexibility in application mapping is largely due to the ability to feed ALU with instruction streams generated within the array, generous provision of embedded block random access memory, and the ability to trade routing switches for small memories.
The Embedded System Block (ESB) of the APEX20K programmable logic device family from Altera Corporation includes the capability of implementing product term macrocells in addition to flexibly configurable ROM and dual...
详细信息
The Embedded System Block (ESB) of the APEX20K programmable logic device family from Altera Corporation includes the capability of implementing product term macrocells in addition to flexibly configurable ROM and dual port RAM. In product term mode, each ESB has 16 macrocells built out of 32 product terms with 32 literal inputs. The ability to reconfigure memory blocks in this way represents a new and innovative use of resources in a programmable logic device, requiring creative solutions in both the hardware and software domains. The architecture and features of this Embedded System Block are described.
One of the major overheads in reconfigurable computing is the time it takes to reconfigure the devices in the system. The configuration compression algorithm presented in our previous paper [Hauck98c] is one efficient...
详细信息
One of the major overheads in reconfigurable computing is the time it takes to reconfigure the devices in the system. The configuration compression algorithm presented in our previous paper [Hauck98c] is one efficient technique for reducing this overhead. In this paper, we develop an algorithm for finding Don't Care bits in configurations to improve the compatibility of the configuration data. With the help of the Don't Cares, higher configuration compression ratios can be achieved by using our modified configuration compression algorithm. This improves compression ratios of a factor of 7, where our original algorithm only achieved a factor of 4.
FPGA users often view the ability of an FPGA to route designs with high LUT (gate) utilization as a feature, leading them to demand high gate utilization from vendors. We present initial evidence from a hierarchical a...
详细信息
FPGA users often view the ability of an FPGA to route designs with high LUT (gate) utilization as a feature, leading them to demand high gate utilization from vendors. We present initial evidence from a hierarchical array design showing that high LUT utilization is not directly correlated with efficient silicon usage. Rather, since interconnect resources consume most of the area on these devices (often 80-90%), we can achieve more area efficient designs by allowing some LUTs to go unused - allowing us to use the dominant resource, interconnect, more efficiently. This extends the `Sea-of-gates' philosophy, familiar to mask programmablegatearrays, to FPGAs. Also introduced in this work is an algorithm for `depopulating' the gates in a hierarchical network to match the limited wiring resources.
Striped FPGA, or pipeline-reconfigurable FPGA provides hardware virtualization by supporting fast run-time reconfiguration. In this paper we show that the performance of striped FPGA depends on the reconfiguration pat...
详细信息
Striped FPGA, or pipeline-reconfigurable FPGA provides hardware virtualization by supporting fast run-time reconfiguration. In this paper we show that the performance of striped FPGA depends on the reconfiguration pattern, the run time scheduling of configurations through the FPGA. We study two main configuration scheduling approaches- Configuration Caching and Data Caching. We present the quantitative analysis of these scheduling techniques to compute their total execution cycles taking into account the overhead caused by the IO with the external memory. Based on the analysis we can determine which scheduling technique works better for the given application and for the given hardware parameters.
A new reprogrammable FPGA architecture is described which is specifically designed to be of very low cost. It covers a range of 35 K to a million usable gates. In addition, it delivers high performance and it is synth...
详细信息
A new reprogrammable FPGA architecture is described which is specifically designed to be of very low cost. It covers a range of 35 K to a million usable gates. In addition, it delivers high performance and it is synthesis efficient. This architecture is loosely based on an earlier reprogrammable Actel architecture named ES. By changing the structure of the interconnect and by making other improvements, we achieved an average cost reduction by a factor of three per usable gate. The first member of the family based on this architecture is fabricated on a 2.5 V standard 0.25μ CMOS technology with a gate count of up to 130 K which also includes 36 K bits of two port RAM. The gate count of this part is verified in a fully automatic design flow starting from a high level description followed by synthesis, technology mapping, place and route, and timing extraction.
This paper presents the emulation of an embedded system with hard real time constraints and response times of about 220 μs. We show that for such fast reactive systems, the software overhead of a Real Time Operating ...
详细信息
This paper presents the emulation of an embedded system with hard real time constraints and response times of about 220 μs. We show that for such fast reactive systems, the software overhead of a Real Time Operating System (RTOS) becomes a limiting factor, consuming up to 77% of the total execution performance. We analyze features of different FPGA architectures in order to solve the system performance bottleneck. We show that moving functionality from software to hardware through exploiting the fine grained on-chip SRAM capability of the Xilinx XC4000 architecture, that feature eliminates the RTOS overhead by only a slight increase of about 28% of the used FPGA CLB resources. These investigations have been conducted using our own emulation environment called SPYDER-CORE-P1.
This paper describes the Vantis VF1 FPGA architecture, an innovative architecture based on 0.25 u (drawn) (0.18 u Leff)/4-metal technology. It was designed from scratch for high performance, routability and ease-of-us...
详细信息
This paper describes the Vantis VF1 FPGA architecture, an innovative architecture based on 0.25 u (drawn) (0.18 u Leff)/4-metal technology. It was designed from scratch for high performance, routability and ease-of-use. It supports system level functions (including wide gating functions, dual-port SRAMs, high speed carry chains, and high speed IO blocks) with a symmetrical structure. Additionally, the architecture of each of the critical elements including: variable-grain logic blocks, variable-length-interconnects, dual-port embedded SRAM blocks, I/O blocks and on-chip PLL functions will be described.
Floorplanning is an important problem in FPGA circuit mapping. As FPGA capacity grows, new innovative approaches will be required for efficiently mapping circuits to FPGAs. In this paper we present a macro based floor...
详细信息
Floorplanning is an important problem in FPGA circuit mapping. As FPGA capacity grows, new innovative approaches will be required for efficiently mapping circuits to FPGAs. In this paper we present a macro based floorplanning methodology suitable for mapping large circuits to large, high density FPGAs. Our method uses clustering techniques to combine macros into clusters, and then uses a tabu search based approach to place clusters while enhancing both circuit routability and performance. Our method is capable of handling both hard (fixed size and shape) macros and sob (fixed size and variable shape) macros. We demonstrate our methodology on several macro based circuit designs and compare the execution speed and quality of results with commercially available CAE tools. Our approach shows a dramatic speedup in execution time without any negative impact on quality.
暂无评论