作者:
DeHon, AndréDept. of CS
256-80 California Institute of Technology Pasadena CA 91125 United States
Sublithographic programmable Logic arrays can be interconnected and restored using nanoscale wires. Building on a hybrid of bottom-up assembly techniques supported by conventional lithographic patterning, we show how ...
详细信息
ISBN:
(纸本)9781595930293
Sublithographic programmable Logic arrays can be interconnected and restored using nanoscale wires. Building on a hybrid of bottom-up assembly techniques supported by conventional lithographic patterning, we show how modestsized PLA logic blocks, which are efficient for implementing logic, can be organized into a segmented, Manhattan mesh interconnection scheme. The resulting programmable architecture has a macro-scale view which is reminiscent of lithographic FPGA and CPLD designs despite the fact that the low-level, sublithographic fabrication techniques used are much more highly constrained than conventional lithography and are prone to high defect rates. Using the Toronto 20 benchmark set, we begin to explore the design space for these sublithographic architectures and show that they may allow us to exploit nanowire building blocks to reach one to two orders of magnitude greater density than 22nm CMOS lithography. Copyright 2005 acm.
Although runtime dynamic reconfiguration of the FPGA devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no stra...
详细信息
Although runtime dynamic reconfiguration of the FPGA devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no straightforward design methodology, and the partitioning and CAD tool support is poor. This paper presents general concepts implemented in a placement and routing tool that provides an environment where designs that are partially and dynamically reconfigurable can be processed in order to be implemented on FPGAs that support this technology, such as the Armel AT40K and AT94K series. The function of the tool is demonstrated on a simple real-world example.
Area-IO provide a way to eliminate the IO bottleneck of fieldprogrammable logic devices (FPLDs) created the mismatch between the ability of perimeter bonds to provide IO and the propensity of logic to demand it. Whet...
详细信息
ISBN:
(纸本)9780897917438
Area-IO provide a way to eliminate the IO bottleneck of fieldprogrammable logic devices (FPLDs) created the mismatch between the ability of perimeter bonds to provide IO and the propensity of logic to demand it. Whether the incorporation of area IO into FPLD architectures has undesirable side effects is a question that has not yet been answered. In this paper, we examine the architectural impact of area-IO on FPLDs from a theoretical and experimental standpoint and show that the introduction of area IO generally improves the routability and delay of a set of benchmark circuits.
This paper shows a method to verifying the thermal status of complex FPGA-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to ext...
详细信息
This paper shows a method to verifying the thermal status of complex FPGA-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to extract the output frequencies of an array of ring-oscillators previously distributed in the die, taking full advantage of the configuration port capabilities in Xilinx technology. As a result, it is shown that the FPGA technology offers the designers of embedded systems the possibility of viewing a detailed thermal map of a circuit at a minimum cost. The verification can be done in actual working conditions;for example with heat sinks and fans attached to the chip, inside the system case, or even in an on-board satellite application. The main results of the work are unthinkable using other alternatives like IR cameras, external sensors, or embedded diodes.
In this work, we parameterize and explore the interconnect structure of pipelined FPGAs. Specifically, we explore the effects of interconnect register population, length of registered routing track segments, registere...
详细信息
In this work, we parameterize and explore the interconnect structure of pipelined FPGAs. Specifically, we explore the effects of interconnect register population, length of registered routing track segments, registered 10 terminals of logic units, and the flexibility of the interconnect structure on the performance of a pipelined FPGA. Our experiments with the RaPiD architecture identify tradeoffs that must be made while designing the interconnect structure of a pipelined FPGA. The post-exploration architecture that we found shows a 19% improvement over RaPiD, while the area overhead incurred in placing and routing benchmarks netlists on the post-exploration architecture is 18%.
It has become clear that large embedded configurable memory arrays will be essential in future FPGAs. Embedded arrays provide high-density high-speed implementations of the storage parts of circuits. Unfortunately, th...
详细信息
It has become clear that large embedded configurable memory arrays will be essential in future FPGAs. Embedded arrays provide high-density high-speed implementations of the storage parts of circuits. Unfortunately, they require the FPGA vendor to partition the device into memory and logic resources at manufacture-time. This leads to a waste of chip area for customers that do not use all of the storage provided. This chip area need not be wasted, and can in fact be used very efficiently, if the arrays are configured as large multi-output ROMs, and used to implement logic. In order to efficiently use the embedded arrays in this way, a technology mapping algorithm that identifies parts of circuits that can be efficiently mapped to an embedded array is required. In this paper, we describe such an algorithm. The new tool, called SMAP, packs as much circuit information as possible into the available memory arrays, and maps the rest of the circuit into four-input lookup-tables. On a set of 29 sequential and combinational benchmarks, the tool is able to map, on average, 60 4-LUTs into a single 2-Kbit memory array. If there are 16 arrays available, it can map, on average, 358 4-LUTs to the 16 arrays.
In this paper, we present the first exact algorithm to solve the constrained I/O placement problem for FPGAs that support multiple I/O standards. We derive a compact integer linear programming formulation for the cons...
详细信息
In this paper, we present the first exact algorithm to solve the constrained I/O placement problem for FPGAs that support multiple I/O standards. We derive a compact integer linear programming formulation for the constrained I/O placement problem. The size of the integer linear program derived is independent of the number of I/O objects to be placed and hence is scalable to very large design instances. For example, for a Xilinx Virtex-E FPGA, the number of integer variables required is never more than 32 and is much smaller for practical design instances. Extensive experimental results using a non-commercial integer linear program solver shows that it only takes seconds to solve the resultant integer linear program in practice.
FPGAs provide a speed advantage in processing for embedded systems, especially when processing is moved close to the sensors. Perhaps the ultimate embedded system is a neural prosthetic, where probes are inserted into...
详细信息
FPGAs provide a speed advantage in processing for embedded systems, especially when processing is moved close to the sensors. Perhaps the ultimate embedded system is a neural prosthetic, where probes are inserted into the brain and recorded electrical activity is analyzed to determine which neurons have fired. In turn, this information can be used to manipulate an external device such as a robot arm or a computer mouse. To make the detection of these signals possible, some baseline data must be processed to correlate impulses to particular neurons. One method for processing this data uses a statistical clustering algorithm called Expectation Maximization, or EM. In this paper, we examine the EM clustering algorithm, determine the most computationally intensive portion, map it onto a reconfigurable device, and show several areas of performance gain.
To improve FPGA performance for arithmetic circuits, this paper proposes a new architecture for FPGA logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adja...
详细信息
ISBN:
(纸本)9781595939340
To improve FPGA performance for arithmetic circuits, this paper proposes a new architecture for FPGA logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed locally without the global routing network. Unlike previous carry-chains for binary and ternary addition, the carry chain used by the new cell only spans 2 logic blocks, which significantly improves the delay of multi-input addition operations mapped onto the FPGA. The delay and area overhead that arises from augmenting a traditional FPGA logic cell with the new compressor structure is minimal. Using this new cell, we observed an average speedup in combinational delay of 1.41 compared to adder trees synthesized using ternary adders. Copyright 2008 acm.
The paper presents several improvements to state-of-the-art in FPGA technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD '04]. Improved cut enumeration computes all K-...
详细信息
ISBN:
(纸本)1595932925
The paper presents several improvements to state-of-the-art in FPGA technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD '04]. Improved cut enumeration computes all K-feasible cuts without pruning for up to 7 inputs for the largest MCNC benchmarks, A new technique for on-the-fly cut dropping reduces by orders of magnitude memory needed to represent cuts for large designs. Improved area recovery leads to mappings with area on average 7% smaller than DAOmap, while preserving delay optimality when starting from the same optimized netlists. Applying mapping with structural choices derived by a synthesis flow on average reduces delay by 7% and area by 14%, compared to DAOmap. Copyright 2006 acm.
暂无评论