This paper describes how the massive parallelism of the rapidly reconfigurable Xilinx XC6216 FPGA (in conjunction with Virtual Computing's H.O.T. Works board) can be exploited to accelerate the time-consuming fitn...
详细信息
ISBN:
(纸本)9780897919784
This paper describes how the massive parallelism of the rapidly reconfigurable Xilinx XC6216 FPGA (in conjunction with Virtual Computing's H.O.T. Works board) can be exploited to accelerate the time-consuming fitness measurement task of genetic algorithms and genetic programming. This acceleration is accomplished by embodying each individual of the evolving population into hardware in order to perform the fitness measurement task. A 16-step sorting network for seven items was evolved that has two fewer steps than the sorting network described in the 1962 O'Connor and Nelson patent on sorting networks (and the same number of steps as a 7-sorter that was devised by Floyd and Knuth subsequent to the patent and that is now known to be minimal). Other minimal sorters have been evolved.
This paper presents a vector generation approach for testing interconnects in configurable (SRAM-based) fieldprogrammablegatearrays (FPGAs). The proposed approach detects bridging faults and is based on quiescent c...
详细信息
ISBN:
(纸本)9780897919784
This paper presents a vector generation approach for testing interconnects in configurable (SRAM-based) fieldprogrammablegatearrays (FPGAs). The proposed approach detects bridging faults and is based on quiescent current (IDDQ) monitoring. Compared with previous voltage-based methods, IDDQ testing has the advantage of utilizing a small number of programming phases for configuring the FPGA during the test process with negligible observability requirements, even under multiple faults. Algorithms for test generation which exploit the homogeneous nature of the FPGA array, are described. An example using the XC4000 is described in detail. For testing the XC4000 series interconnect, a total of 20 phases and 11 vectors are required: 11 phases for S (switch) block testing, and 9 phases for C (connection) block testing.
Architects of programmable logic devices (PLDs) face several challenges when optimizing a new device family for low manufacturing cost. When given an aggressive die-size goal, functional blocks that seem otherwise ins...
详细信息
ISBN:
(纸本)9780897919784
Architects of programmable logic devices (PLDs) face several challenges when optimizing a new device family for low manufacturing cost. When given an aggressive die-size goal, functional blocks that seem otherwise insignificant become targets for area reduction. Once low die cost is achieved, it is seen that testing and packaging costs must be considered. Interactions among these three cost contributors pose trade-offs that prevent independent optimization. This paper discusses solutions discovered by the architects optimizing the Altera FLEX 6000 architecture.
In designing FPGAs, it is important to achieve a good balance between the number of logic blocks, such as Look-Up Tables (LUTs), and wiring resources. It is difficult to find an optimal solution. In this paper, we pre...
详细信息
In designing FPGAs, it is important to achieve a good balance between the number of logic blocks, such as Look-Up Tables (LUTs), and wiring resources. It is difficult to find an optimal solution. In this paper, we present an FPGA design methodology to efficiently find well-balanced FPGA architectures. The method covers all aspects of FPGA development from the architecture-decision process to physical implementation. It has been used to develop a new FPGA that can implement circuits that are twice as large as those implementable with the previous version but with half the number of logic blocks. This indicates that the methodology is effective in developing well-balanced FPGAs.
Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstra...
详细信息
ISBN:
(纸本)9780897919784
Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstrate how more advanced carry constructs can be embedded into FPGAs, providing significantly higher performance carry computations. We redesign the standard ripple carry chain to reduce the number of logic levels in each cell. We also develop entirely new carry structures based on high performance adders such as Carry Select, Carry Lookahead, and Brent-Kung. Overall, these optimizations achieve a speedup in carry performance of 3.8 times over current architectures.
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. Thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each s...
详细信息
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. Thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each subcircuit can be executed at a different time. In this paper, the partitioning of sequential circuits for execution on a DRFPGA is studied. To determine how to correctly partition a sequential circuit, and what are the costs in doing so, we propose a new gate-level model that handles time-multiplexed computation. We also introduce an enhanced force directed scheduling (FDS) algorithm to partition sequential circuits that finds a correct partition with low logic and communication costs, under the assumption that maximum performance is desired. We use our algorithm to partition seven large ISC AS'89 sequential benchmark circuits. The experimental results show that the enhanced FDS reduces communication costs by 27.5% with only a 1.1% increase in the gate cost compared to traditional FDS.
While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for FPGA-based systems that comes at a reduced cost in terms of design time, volume, and ...
详细信息
ISBN:
(纸本)9780897919784
While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for FPGA-based systems that comes at a reduced cost in terms of design time, volume, and weight. We partition the physical design into a set of tiles. In response to a component failure, we capitalize on the unique reconfiguration capabilities of FPGAs and replace the affected tile with a functionally equivalent tile that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components. Experimental results conducted on a subset of the MCNC benchmarks demonstrate a high level of reliability with low timing and hardware overhead.
The goal of this paper is to perform a timing optimization of a circuit described by a network of cells on a target structure whose connection delays have discrete values following its hierarchy. The circuits is model...
详细信息
ISBN:
(纸本)9780897919784
The goal of this paper is to perform a timing optimization of a circuit described by a network of cells on a target structure whose connection delays have discrete values following its hierarchy. The circuits is modelled by a set of timed cones whose delay histograms allow their classification into critical, potential critical and neutral cones according to predicted delays. The floorplanning is then guided by this cone structuring and has two innovative features: first, it is shown that the placement of the elements of the neutral cones has no impact on timing results, thus a significant reduction is obtained;second, despite a greedy approach, a near optimal floorplan is achieved in a large number of examples.
It has become clear that large embedded configurable memory arrays will be essential in future FPGAs. Embedded arrays provide high-density high-speed implementations of the storage parts of circuits. Unfortunately, th...
详细信息
It has become clear that large embedded configurable memory arrays will be essential in future FPGAs. Embedded arrays provide high-density high-speed implementations of the storage parts of circuits. Unfortunately, they require the FPGA vendor to partition the device into memory and logic resources at manufacture-time. This leads to a waste of chip area for customers that do not use all of the storage provided. This chip area need not be wasted, and can in fact be used very efficiently, if the arrays are configured as large multi-output ROMs, and used to implement logic. In order to efficiently use the embedded arrays in this way, a technology mapping algorithm that identifies parts of circuits that can be efficiently mapped to an embedded array is required. In this paper, we describe such an algorithm. The new tool, called SMAP, packs as much circuit information as possible into the available memory arrays, and maps the rest of the circuit into four-input lookup-tables. On a set of 29 sequential and combinational benchmarks, the tool is able to map, on average, 60 4-LUTs into a single 2-Kbit memory array. If there are 16 arrays available, it can map, on average, 358 4-LUTs to the 16 arrays.
In the development of new FPGA architectures, a designer must balance speed, density and routing flexibility. In this paper, we discuss a new FPGA architecture based on a patented, novel, segmented routing fabric that...
详细信息
In the development of new FPGA architectures, a designer must balance speed, density and routing flexibility. In this paper, we discuss a new FPGA architecture based on a patented, novel, segmented routing fabric that is targeted to high performance and predictability but does not sacrifice routability or area efficiency. Current segmented architectures allow much flexibility in routing, but incur large delay penalties when a signal has high fanout or must traverse medium to long distances to reach its target. Reducing the number of programmable interconnect points (PIPs) that a signal must traverse to reach its target, while eliminating the RC delay buildup due to signal fanout, improves design performance and offers highly predictable signal delays.
暂无评论