Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstra...
详细信息
ISBN:
(纸本)9780897919784
Carry chains are an important consideration for most computations, including FPGAs. Current FPGAs dedicate a portion of their logic to support these demands via a simple ripple carry scheme. In this paper we demonstrate how more advanced carry constructs can be embedded into FPGAs, providing significantly higher performance carry computations. We redesign the standard ripple carry chain to reduce the number of logic levels in each cell. We also develop entirely new carry structures based on high performance adders such as Carry Select, Carry Lookahead, and Brent-Kung. Overall, these optimizations achieve a speedup in carry performance of 3.8 times over current architectures.
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. Thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each s...
详细信息
A fundamental feature of Dynamically Reconfigurable FPGAs (DRFPGAs) is that the logic and interconnect is time-multiplexed. Thus for a circuit to be implemented on a DRFPGA, it needs to be partitioned such that each subcircuit can be executed at a different time. In this paper, the partitioning of sequential circuits for execution on a DRFPGA is studied. To determine how to correctly partition a sequential circuit, and what are the costs in doing so, we propose a new gate-level model that handles time-multiplexed computation. We also introduce an enhanced force directed scheduling (FDS) algorithm to partition sequential circuits that finds a correct partition with low logic and communication costs, under the assumption that maximum performance is desired. We use our algorithm to partition seven large ISC AS'89 sequential benchmark circuits. The experimental results show that the enhanced FDS reduces communication costs by 27.5% with only a 1.1% increase in the gate cost compared to traditional FDS.
While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for FPGA-based systems that comes at a reduced cost in terms of design time, volume, and ...
详细信息
ISBN:
(纸本)9780897919784
While system reliability is conventionally achieved through component replication, we have developed a fault-tolerance approach for FPGA-based systems that comes at a reduced cost in terms of design time, volume, and weight. We partition the physical design into a set of tiles. In response to a component failure, we capitalize on the unique reconfiguration capabilities of FPGAs and replace the affected tile with a functionally equivalent tile that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components. Experimental results conducted on a subset of the MCNC benchmarks demonstrate a high level of reliability with low timing and hardware overhead.
The goal of this paper is to perform a timing optimization of a circuit described by a network of cells on a target structure whose connection delays have discrete values following its hierarchy. The circuits is model...
详细信息
ISBN:
(纸本)9780897919784
The goal of this paper is to perform a timing optimization of a circuit described by a network of cells on a target structure whose connection delays have discrete values following its hierarchy. The circuits is modelled by a set of timed cones whose delay histograms allow their classification into critical, potential critical and neutral cones according to predicted delays. The floorplanning is then guided by this cone structuring and has two innovative features: first, it is shown that the placement of the elements of the neutral cones has no impact on timing results, thus a significant reduction is obtained;second, despite a greedy approach, a near optimal floorplan is achieved in a large number of examples.
It has become clear that large embedded configurable memory arrays will be essential in future FPGAs. Embedded arrays provide high-density high-speed implementations of the storage parts of circuits. Unfortunately, th...
详细信息
It has become clear that large embedded configurable memory arrays will be essential in future FPGAs. Embedded arrays provide high-density high-speed implementations of the storage parts of circuits. Unfortunately, they require the FPGA vendor to partition the device into memory and logic resources at manufacture-time. This leads to a waste of chip area for customers that do not use all of the storage provided. This chip area need not be wasted, and can in fact be used very efficiently, if the arrays are configured as large multi-output ROMs, and used to implement logic. In order to efficiently use the embedded arrays in this way, a technology mapping algorithm that identifies parts of circuits that can be efficiently mapped to an embedded array is required. In this paper, we describe such an algorithm. The new tool, called SMAP, packs as much circuit information as possible into the available memory arrays, and maps the rest of the circuit into four-input lookup-tables. On a set of 29 sequential and combinational benchmarks, the tool is able to map, on average, 60 4-LUTs into a single 2-Kbit memory array. If there are 16 arrays available, it can map, on average, 358 4-LUTs to the 16 arrays.
In the development of new FPGA architectures, a designer must balance speed, density and routing flexibility. In this paper, we discuss a new FPGA architecture based on a patented, novel, segmented routing fabric that...
详细信息
In the development of new FPGA architectures, a designer must balance speed, density and routing flexibility. In this paper, we discuss a new FPGA architecture based on a patented, novel, segmented routing fabric that is targeted to high performance and predictability but does not sacrifice routability or area efficiency. Current segmented architectures allow much flexibility in routing, but incur large delay penalties when a signal has high fanout or must traverse medium to long distances to reach its target. Reducing the number of programmable interconnect points (PIPs) that a signal must traverse to reach its target, while eliminating the RC delay buildup due to signal fanout, improves design performance and offers highly predictable signal delays.
In this paper, we present an algorithm for circuit partitioning with complex resource constraints in large FPGAs. Traditional partitioning methods estimate the capacity of an FPGA device by counting the number of logi...
详细信息
ISBN:
(纸本)9780897919784
In this paper, we present an algorithm for circuit partitioning with complex resource constraints in large FPGAs. Traditional partitioning methods estimate the capacity of an FPGA device by counting the number of logic blocks, however this is not accurate with the increasing capacity and diverse resource types in the new FPGA architectures. We propose a network flow based method to optimally check whether a circuit or a sub-circuit is feasible for a set of available heterogeneous resources. The feasibility checking procedure is integrated in the FM-based algorithm for circuit partitioning. Incremental flow technique is employed for efficient implementation. Experimental results on the MCNC benchmark circuits show that our partitioning algorithm not only yields good results, but also is efficient. Our algorithm for partitioning with complex resource constraints is applicable for both multiple FPGA designs (e.g. logic emulation systems) and partitioning-based placement algorithms for a single large hierarchical FPGA (e.g. Actel's ES6500 FPGA family).
By tailoring a compiler tree-parsing tool for datapath module mapping, we produce good quality results for datapath synthesis in very fast run time. Rather than flattening the design to gates, we preserve the datapath...
详细信息
ISBN:
(纸本)9780897919784
By tailoring a compiler tree-parsing tool for datapath module mapping, we produce good quality results for datapath synthesis in very fast run time. Rather than flattening the design to gates, we preserve the datapath structure;this allows exploitation of specialized datapath features in FPGAs, retains regularity, and also results in a smaller problem size. To further achieve high mapping speed, we formulate the problem as tree covering and solve it efficiently with a linear-time dynamic programming algorithm. In a novel extension to the tree-covering algorithm, we perform module placement simultaneously with the mapping, still in linear time. Integrating placement has the potential to increase the quality of the result since we can optimize total delay including routing delays. To our knowledge this is the first effort to leverage a grammar-based tree covering tool for datapath module mapping. Further, it is the first work to integrate simultaneous placement with module mapping in a way that preserves linear time complexity.
Modern fieldprogrammablegatearrays (FPGAs) provide embedded memory blocks (EMBs) to be used as on-chip memories. In this paper, we explore the possibility of using EMBs to implement logic functions when they are no...
详细信息
ISBN:
(纸本)9780897919784
Modern fieldprogrammablegatearrays (FPGAs) provide embedded memory blocks (EMBs) to be used as on-chip memories. In this paper, we explore the possibility of using EMBs to implement logic functions when they are not used as on-chip memory. We propose a general technology mapping problem for FPGAs with EMBs for area and delay minimization and develop an efficient algorithm based on the concepts of Maximum Fanout Free Cone (MFFC) and Maximum Fanout Free Subgraph (MFFS), named EMB_Pack, which minimizes the area after or before technology mapping by using EMBs while maintaining the circuit delay. We have tested EMB_Pack on MCNC benchmarks on Altera's FLEX10K device family. The experimental results show that compared with the original mapped circuits generated from CutMap without using EMBs, EMB_Pack as postprocessing can further reduce up to 10% of the area on the mapped circuits while maintaining the layout delay by making efficient use of available EMB resources. Compared with CutMap-e without using EMBs, EMB_Pack as pre-mapping processing followed by CutMap-e can reduce 6% of the area while maintaining the circuit optimal delay.
In this paper, we developed Boolean matching techniques for complex programmable logic blocks (PLBs) in LUT-based FPGAs. A complex PLB can not only be used as a K-input LUT, but also can implement some wide functions ...
详细信息
ISBN:
(纸本)9780897919784
In this paper, we developed Boolean matching techniques for complex programmable logic blocks (PLBs) in LUT-based FPGAs. A complex PLB can not only be used as a K-input LUT, but also can implement some wide functions of more than K variables. We apply previous and develop new functional decomposition methods to match wide functions to PLBs. We can determine exactly whether a given wide function can be implemented with a XC4000 CLB or other three PLB architectures (including the XC5200 CLB). We evaluate functional capabilities of the four PLB architectures on implementing wide functions in MCNC benchmarks. Experiments show that the XC4000 CLB can be used to implement up to 98% of 6-cuts and 88% of 7-cuts in MCNC benchmarks, while two of the other three PLB architectures have a smaller cost in terms of logic capability per silicon area. Our results are useful for designing future logic unit architectures in LUT based FPGAs.
暂无评论