this paper presents a technique to fix timing violations caused by process variations in FPGAs by adjusting the clock skews of flip-flops. this involves making the clock distribution network tunable by adding programm...
详细信息
ISBN:
(纸本)9781424410590
this paper presents a technique to fix timing violations caused by process variations in FPGAs by adjusting the clock skews of flip-flops. this involves making the clock distribution network tunable by adding programmable delay elements to compensate for variations. We propose generic as well as chip-specific skew assignment schemes that are robust to variations. the two proposed schemes result in recovering about 80% and 82% of the failed chips respectively with conservative timing constraints. With more aggressive constraints, the corresponding numbers are 69% and 77% respectively. Our technique causes a 39% increase in the number of chips in the fast bin when speed-binning is performed. the area and power overhead associated withthis technique are 3.5% and 5.6% respectively.
programmablelogic cores (PLCs) offer a means of providing post-fabrication re-configurability to a SoC design. Circuits implemented in a PLC will inevitably have lower timing performance and logic density than fixed ...
详细信息
ISBN:
(纸本)9781424410590
programmablelogic cores (PLCs) offer a means of providing post-fabrication re-configurability to a SoC design. Circuits implemented in a PLC will inevitably have lower timing performance and logic density than fixed Junction circuits. this fundamental mismatch makes the design of the interface between the PLC and the rest of the SoC a challenging problem. In this paper we focus on interfaces between circuits implemented in PLCs and SoC system busses. We demonstrate problems with existing implementation options and then propose modifications to parts of the PLC architecture to enable more efficient system bus interfaces. Our results show that, on average, this modified architecture improves interface timing by 36.4%, reduces CLB usage by 7.9% and improves routability by 28.8% for circuits that require system bus interfaces. We show that the area overhead is less than 0.5% for circuits that do not require bits interfaces.
We describe the optimization of power consumption obtained by a high level environment developed for the automatic generation of application specific circuits on FPGA. the methodology used is based on the transformati...
详细信息
ISBN:
(纸本)9781424410590
We describe the optimization of power consumption obtained by a high level environment developed for the automatic generation of application specific circuits on FPGA. the methodology used is based on the transformation of the whole algorithm in a graph of LUTs that implements all the required operations without the use of library components. the quality of the obtained circuitry is guaranteed by the use of "type inference". Our environment automatically optimizes the word-length and size of operators, and at the same time, reduces the internal data paths and the switching activity. thus, in the extreme cases tested, the resulting generated circuits offer an important improvement in area usage of up to 95%, and power consumption is reduced by up to 98%.
the ability of partial reconfiguration of today's FPGAs allows the exchange of dynamic system components at run-time, which enables the realization of self-reconfigurable systems. To ease the design of a partially...
详细信息
ISBN:
(纸本)9781424410590
the ability of partial reconfiguration of today's FPGAs allows the exchange of dynamic system components at run-time, which enables the realization of self-reconfigurable systems. To ease the design of a partially reconfigurable system this paper presents an integrated design flow for reconfigurable architectures. the design flow includes tools for system partitioning, floorplanning, and automatic generation of configuration data for the static and the dynamic system components. Furthermore, the design flow comprises the implementation of a homogeneous on-chip communication infrastructure, which is used to interconnect the dynamic system components placed at run-time. For the design of such an on-chip communication infrastructure a layer model is introduced, which divides the communication into five different layers of abstraction. As an example a communication infrastructure is realized on a Xilinx Virtex-2 FPGA based on the Wishbone protocol. A tristate-based and a slice-based implementation are presented and analyzed with respect to efficiency.
A significant challenge in designing algorithms for FPGA-based reconfigurable computers is the exposed, non-cached memory subsystem. In the absence of dedicated hardware to manage a cached memory hierarchy, the algori...
详细信息
ISBN:
(纸本)9781424410590
A significant challenge in designing algorithms for FPGA-based reconfigurable computers is the exposed, non-cached memory subsystem. In the absence of dedicated hardware to manage a cached memory hierarchy, the algorithm designer must explicitly allocate data within a collection of memory banks, and schedule access to the memories in the algorithm's datapaths. the physical location in memory affects the datapath schedule, yet data dependencies in the algorithm can suggest allocation strategies to increase instruction level parallelism. In this work, we present three algorithms that automatically allocate arrays to memory banks and schedule datapaths that use those memories. Our algorithm allows the user to trade-off optimal results versus longer iterative analysis.
Encryption is the basic means to enforce confidentiality in digital communications. this work explores a hardware design alternative and a cost assessment of an FPGA-based brute force attack against RSA Secret-Key Cha...
详细信息
ISBN:
(纸本)9781424410590
Encryption is the basic means to enforce confidentiality in digital communications. this work explores a hardware design alternative and a cost assessment of an FPGA-based brute force attack against RSA Secret-Key Challenge RC5-72. the aim is to develop an alternative to software-based solutions for distributed. net. Implementation results show that an 80 US$ FPGA can yield a throughput of 145 Mkeys/sec with a power consumption of 10 Watts. this is roughly an order of magnitude faster cheaper and lower power, when compared with fully dedicated general purpose computers.
the current generations of FPGA comprise of many specialized hardware cores, like embedded processors, multipliers, RAMs and FIFOs, along withthe regular arrays of reconfigurable logic. On any FPGA device, these embe...
详细信息
ISBN:
(纸本)9781424410590
the current generations of FPGA comprise of many specialized hardware cores, like embedded processors, multipliers, RAMs and FIFOs, along withthe regular arrays of reconfigurable logic. On any FPGA device, these embedded cores are located at fixed locations only. this makes the task of floorplanning for the applications with heterogeneous components very difficult. Recently, some researchers have started looking into this problem of heterogeneous floorplanning on FPGA. However, all these work suffer from a fundamental flaw which affects the quality of solutions leading to higher device areas or excessively high runtime. In [1], we propose a heterogeneous floorplanner for FPGA, HPlan, which is highly efficient in finding floorplans of variety of resources. In this paper, we extend the floorplanner to include an adaptive placer algorithm. We also perform our experiments on the MCNC benchmarks for the floorplan with random heterogeneous resource allocations. We observe that as the statistical variation in the heterogeneous resource allocations is increased, the traditional floorplanner gives an increasing area of all the benchmarks whereas the HPlan floorplanner does not. the proposed floorplanner thus provides an efficient way to handle floorplans with large variations in the heterogeneous resources.
the watershed transformation is a popular image segmentation technique for grey scale images. this paper describes a pipeline implementation of a watershed algorithm designed for hardware implementation. In the algori...
详细信息
ISBN:
(纸本)9781424410590
the watershed transformation is a popular image segmentation technique for grey scale images. this paper describes a pipeline implementation of a watershed algorithm designed for hardware implementation. In the algorithm, pixels in a Given image are repeatedly scanned from top-left to bottom-right, and then from bottom-right to top-left in order to propagate the value of each pixel to its neighbors. In the implementation, w-sets of k-lines are buffered on the FPGA, and the algorithm is repeatedly applied to w-sets, shifting in a new set from the external memory banks and shifting out the oldest set to other external memory banks. w and k can be chosen according to the number of the external memory banks and the size of the FPGA. therefore, it is possible to realize the best performance on a given hardware platform.
Process variations of deep sub-micron technologies have created significant timing uncertainty. this generates the need for a new variability-aware physical synthesis tool for field-programmable Gate-Arrays (FPGAs). I...
详细信息
ISBN:
(纸本)9781424410590
Process variations of deep sub-micron technologies have created significant timing uncertainty. this generates the need for a new variability-aware physical synthesis tool for field-programmable Gate-Arrays (FPGAs). Ideally, variability-aware tools should be able to perform both timing variability estimation during the synthesis and timing variability analysis after the synthesis. Statistical static timing analysis (SSTA) methods are developed to perform timing variability analysis, but are computationally expensive and not fast enough. We propose a fast and accurate interval-based method for the timing variability estimation. this method uses correlation-aware affine intervals instead of probability density distributions to model timing uncertainties. Our model estimates the mean of timing variation within an accuracy of 99.9% and an average range looseness of -7.5% for the Monte Carlo (MC) model. A speed-up of about 80X and 4900X is achieved for the Correlation Aware Canonical Timing (CACT) model and MC model respectively.
WSAT and its variants are one of the best performing stochastic local search algorithms for the satisfiability (SAT) problem. In this paper, we propose an FPGA solver for very large SAT problems based on a WSAT algori...
详细信息
ISBN:
(纸本)9781424410590
WSAT and its variants are one of the best performing stochastic local search algorithms for the satisfiability (SAT) problem. In this paper, we propose an FPGA solver for very large SAT problems based on a WSAT algorithm. In our solver, parallel and multi-thread processing are combined (1) to fully utilize parallel accesses to external memory banks, and (2) to enhance the utilization of internal memory banks by fully utilizing their dual-port accesses, in order to solve very large problems on the pipelined circuit. Our solver on Xilinx XC2V6000 can solve problems up to K variables and K clauses, which is more than ten times larger than previous solvers on the same size FPGA.
暂无评论