In this paper, we present a new retiming-based technology mapping algorithm for look-up table-based fieldprogrammablegatearrays. The algorithm is based on a novel iterative procedure for computing all k-cuts of all...
详细信息
In this paper, we present a new retiming-based technology mapping algorithm for look-up table-based fieldprogrammablegatearrays. The algorithm is based on a novel iterative procedure for computing all k-cuts of all nodes in a sequential circuit, in the presence of retiming. The algorithm completely avoids flow computation which is the bottleneck of previous algorithms. Due to the fact that k is very small in practice, the procedure for computing all k-cuts is very fast. Experimental results indicate the overall algorithm is very efficient in practice.
Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states th...
详细信息
Moore's Law states that the number of transistors on a device doubles every two years: however, it is often (mis)quoted based on its impact on CPU performance. This important corollary of Moore's Law states that improved clock frequency plus improved architecture yields a doubling of CPU performance every 18 months. This paper examines the impact of Moore's Law on the peak floating-point performance of fpgas. Performance trends for individual operations are analyzed as well as the performance trend of a common instruction mix (multiply accumulate). The important result is that peak fpga floating-point performance is growing significantly faster than peak floating-point performance for a CPU.
Today's SRAM-based fpgas provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM...
详细信息
ISBN:
(纸本)9781450326711
Today's SRAM-based fpgas provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM configuration memory arise reliability issues due to the single-event upsets (SEUs). Considering the criticality of these applications, the vulnerability analysis of fpga designs to SEUs becomes essential part of the design flow. In this context, we present an open-source framework for the soft error vulnerability analysis of Xilinx fpga devices. The proposed framework will allow researchers to evaluate their reliability-aware CAD algorithms and estimate the soft error susceptibility of the designs at early stages of the implementation flow for the latest Xilinx architectures.
field-programmablegatearrays (fpgas) are used in a wide range of markets that have differing cost, performance and power Consumption requirements. It would be advantageous if a single device family could serve these...
详细信息
ISBN:
(纸本)9781595939340
field-programmablegatearrays (fpgas) are used in a wide range of markets that have differing cost, performance and power Consumption requirements. It would be advantageous if a single device family could serve these varied needs but the economics of catering to this wide distribution of market demands suggest more than one family is appropriate. Consequently, fpga vendors have moved to provide a more diverse set, of families that sit at different points in the area-speed-power design space. In this work, our goal is to understand the circuit and architectural design attributes of an fpga that enable trade-offs between area and speed, and to determine the magnitude of the possible trade-offs. This will be useful for architects seeking to determine the number of device families in a suite of offerings, as well as the changes to make between families. We have found that varying both architecture and transistor sizing of art fpga allows the effective area to change by a factor of 3.6 from largest to smallest and the speed to change by a factor of 2.6 from fastest to slowest. It is interesting to observe that the range of area and delay trade-offs possible by varying only the transistor sizing of a single architecture is larger than the ranges observed in past architectural experiments. In addition to transistor size, we note that LUT size is one of the most useful parameters for trading off area and delay.
As the logic capacity of field-programmablegatearrays (fpgas) increases, they are being increasingly used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circ...
详细信息
ISBN:
(纸本)9781595930293
As the logic capacity of field-programmablegatearrays (fpgas) increases, they are being increasingly used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through fpga architectural innovations. This paper describes such an fpga routing architecture, called the multi-bit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, comparing to conventional fpga routing architectures, the multi-bit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall fpga area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections. Copyright 2005 acm.
Combining multi-processing with the high level of configurability possible with fpga-based soft-processors, this paper presents a multiprocessing framework based on the MicroBlaze soft-processor that provides multicor...
详细信息
ISBN:
(纸本)9781450333153
Combining multi-processing with the high level of configurability possible with fpga-based soft-processors, this paper presents a multiprocessing framework based on the MicroBlaze soft-processor that provides multicore support and fully coherent, independently configurable Level 1 Caches with Linux multicore support. This architecture allows for finegrain configurability of the system, allowing for fpga resources to be better optimized for a specific embedded application. We use our framework to explore the L1 Data Cache configuration, developing a metric for efficiency based on resource usage and static application runtime. We find that a Pseudo-Random replacement policy is consistently the more efficient choice for fpga systems.
Vdd-programmablefpgas have been proposed recently to reduce fpga power, where Vdd levels can be customized for different circuit elements and unused circuit elements can be power-gated. In this paper, we first develo...
详细信息
ISBN:
(纸本)9781595930293
Vdd-programmablefpgas have been proposed recently to reduce fpga power, where Vdd levels can be customized for different circuit elements and unused circuit elements can be power-gated. In this paper, we first develop an accurate fpga power model and then design novel Vdd-programmable interconnect switches with minimum number of configuration SRAM cells. Applying our power model to placed and routed benchmark circuits, we evaluate Vddprogrammablefpga architecture using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, the best architecture reduces the minimal energy-delay product by 44.14% with 48% area overhead and 3% SRAM cell increase. Our evaluation results also show that LUT size 4 always gives the lowest energy consumption while LUT size 7 always leads to the highest performance for all evaluated architectures. Copyright 2005 acm.
This paper shows a method to verifying the thermal status of complex fpga-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to ext...
详细信息
This paper shows a method to verifying the thermal status of complex fpga-based circuits like microprocessors. Thus, the designer can evaluate if a particular block is working beyond specifications. The idea is to extract the output frequencies of an array of ring-oscillators previously distributed in the die, taking full advantage of the configuration port capabilities in Xilinx technology. As a result, it is shown that the fpga technology offers the designers of embedded systems the possibility of viewing a detailed thermal map of a circuit at a minimum cost. The verification can be done in actual working conditions;for example with heat sinks and fans attached to the chip, inside the system case, or even in an on-board satellite application. The main results of the work are unthinkable using other alternatives like IR cameras, external sensors, or embedded diodes.
This paper analyses different hardware sorting architectures in order to implement a highly scaleable sorter for solving huge problems at high performance up to the GB range in linear time complexity. It will be prove...
详细信息
ISBN:
(纸本)9781450305549
This paper analyses different hardware sorting architectures in order to implement a highly scaleable sorter for solving huge problems at high performance up to the GB range in linear time complexity. It will be proven that a combination of a FIFO-based merge sorter and a tree-based merge sorter results in the best performance at low cost. Moreover, we will demonstrate how partial run-time reconfiguration can be used for saving almost half the fpga resources or alternatively for improving the speed. Experiments show a sustainable sorting throughput of 2GB/s for problems fitting into the on-chip fpga memory and 1 GB/s when using external memory. These values surpass the best published results on large problem sorting implementations on fpgas, GPUs, and the Cell processor.
A new method for improving the timing yield of field-programmablegate array (fpga) devices affected by random within-die variation is proposed. By selection of an appropriate configuration from a set of functionally ...
详细信息
ISBN:
(纸本)9781595936004
A new method for improving the timing yield of field-programmablegate array (fpga) devices affected by random within-die variation is proposed. By selection of an appropriate configuration from a set of functionally equivalent configurations such that the critical paths do not share same circuit resources oil the fpga, both the average critical path delay and its standard deviation are reduced substantially under conditions of large random variation. Large within-die variations of device parameters such as transistor threshold voltage are anticipated in future semiconductor technologies, resulting in degradation of parametric yields. Comparing to the previous approach which compensates for such within-die variation by designing circuit placement for cacti chip using variation information measured before, our method does not require the measurement of process variations and execution of design tools for cacti chip. The average critical path delay is reduced by up to 5% assuming 30% (sigma/mu) variation in threshold voltage, with a corresponding 50% decrease in standard deviation.
暂无评论