This paper presents a reconfigurable computing environment while addressing the problem of porting High Performance Computing (HPC) applications directly to fieldprogrammablegatearrays (fpgas)-based architectures. ...
详细信息
ISBN:
(纸本)9781450333153
This paper presents a reconfigurable computing environment while addressing the problem of porting High Performance Computing (HPC) applications directly to fieldprogrammablegatearrays (fpgas)-based architectures. The objectives of this research are developing a comprehensive floating point library of essential functions for scientific applications; demonstrate order of magnitude speedup of reconfigurable computing applications, demonstrating the effectiveness of automated design framework for both development and test of scientific algorithms. The developed framework can be reused in various scientific applications which shares kernel functions. The study of this research has identified an exponential function as a kernel for cellular ophthalmoscopy camera processing, traffic monitoring and light wave simulation. The paper demonstrates 30x speedup of these kernels in three algorithms using its novel architecture and its automated toolset. Exponential kernel generation case study and its flexible hardware implementation on an fpga has been validated onto a Xilinx LX-100 device and the Nallatech H101-PCIXM fpga board.
Algorithms for radar signal processing, such as Synthetic Aperture Radar (SAR) are computationally intensive and require considerable execution time on a general purpose processor. Reconfigurable logic can be used to ...
详细信息
ISBN:
(纸本)9781450333153
Algorithms for radar signal processing, such as Synthetic Aperture Radar (SAR) are computationally intensive and require considerable execution time on a general purpose processor. Reconfigurable logic can be used to off-load the primary computational kernel onto a custom computing machine in order to reduce execution time by an order of magnitude as compared to kernel execution on a general purpose processor. Specifically, fieldprogrammablegatearrays (fpgas) can be used to house hardware-based custom implementations of these kernels to speed up these applications. In this paper, we demonstrate a methodology for algorithm acceleration. We used SAR as a case study to illustrate the tremendous potential for algorithm acceleration offered by fpgas. Initially, we profiled the SAR algorithm and implemented a homomorphic filter using a hardware implementation of the natural logarithm. Experimental results show an average speed-up of 188 when using the fpga-based hardware accelerator as opposed to using a software implementation running on a typical general purpose processor.
The proceedings contain 36 papers. The topics discussed include: speedy fpga-based packet classifiers with low on-chip memory requirements;a real-time stereo vision system using a tree-structured dynamic programming o...
ISBN:
(纸本)9781450311557
The proceedings contain 36 papers. The topics discussed include: speedy fpga-based packet classifiers with low on-chip memory requirements;a real-time stereo vision system using a tree-structured dynamic programming on fpga;incremental clustering applied to radar deinterleaving: a parameterized fpga implementation;communication visualization for bottleneck detection of high-level synthesis applications;a mixed precision Monte Carlo methodology for reconfigurable accelerator systems;saturating the transceiver bandwidth: switch fabric design on fpgas;saturating the transceiver bandwidth: switch fabric design on fpgas;limit study of energy & delay benefits of component-specific routing;impact of fpga architecture on resource sharing in high-level synthesis;and securing netlist-level fpga design through exploiting process variation and degradation.
Latency insensitive communication oers many potential benets for fpga designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to u...
详细信息
ISBN:
(纸本)9781450326711
Latency insensitive communication oers many potential benets for fpga designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to understand the costs and trade-os associated with any new design style. This paper presents optimized implementations of latency insensitive communication building blocks, quanties their overheads in terms of area and frequency, and provides guidance to designers on how to generate high-speed and areae cient latency insensitive systems.
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an fpga by disabling/enabling short circuits programmed into the interconnect. The power pin of the fpga serves as the output of the...
详细信息
ISBN:
(纸本)9781450326711
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an fpga by disabling/enabling short circuits programmed into the interconnect. The power pin of the fpga serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.
Today's SRAM-based fpgas provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM...
详细信息
ISBN:
(纸本)9781450326711
Today's SRAM-based fpgas provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM configuration memory arise reliability issues due to the single-event upsets (SEUs). Considering the criticality of these applications, the vulnerability analysis of fpga designs to SEUs becomes essential part of the design flow. In this context, we present an open-source framework for the soft error vulnerability analysis of Xilinx fpga devices. The proposed framework will allow researchers to evaluate their reliability-aware CAD algorithms and estimate the soft error susceptibility of the designs at early stages of the implementation flow for the latest Xilinx architectures.
Frequent item counting is one of the most important operations in time series data mining algorithms, and the space saving algorithm is a widely used approach to solving this problem. With the rapid rising of data inp...
详细信息
ISBN:
(纸本)9781450326711
Frequent item counting is one of the most important operations in time series data mining algorithms, and the space saving algorithm is a widely used approach to solving this problem. With the rapid rising of data input speeds, the most challenging problem in frequent item counting is to meet the requirement of wire-speed processing. In this paper, we propose a streaming oriented PE-ring framework on fpga for counting frequent items. Compared with the best existing fpga implementation, our basic PE-ring framework saves 50% lookup table resources cost and achieves the same throughput in a more scalable way. Furthermore, we adopt SIMD-like cascaded filter for further performance improvements, which outperforms the previous work by up to 3.24 times in some data distributions.
We propose a new kind of fpga architecture with a routing network that not only provides interconnections between the functional blocks but also performs some logic operation. More specifically we replaced the routing...
详细信息
ISBN:
(纸本)9781450333153
We propose a new kind of fpga architecture with a routing network that not only provides interconnections between the functional blocks but also performs some logic operation. More specifically we replaced the routing multiplexer node in the conventional architecture with an element that can be used as both AND gate and multiplexer. A conventional routing multiplexer node consists of a multiplexer and a two stage buffer. In our new architecture a NAND gate replaces the first inverter stage of the buffer and two multiplexers half the size of the original multiplexer replace the original multiplexer. The aim of this study is to determine if this kind of architecture is feasible and if it is worth to implement pack, placement and routing tools in the future. We developed a new technology-mapping algorithm and sized the transistors in this new architecture to evaluate the area and delay. Preliminary results indicate that the gain in logic depth and area achieved by mapping to not only LUTs but also to AND gates outweighs the overhead of introducing AND gates in the routing network with a net reduction in area-delay product of 5.6. Designs implemented on the proposed architecture would require 11.2 % more area, but they will have a 14 % decreased logic depth and the architecture has a slightly faster representative critical path. These results are preliminary because the pack, place and route routines are not implemented yet.
Timing margins in fpgas are already significant and as process scaling continues they will have to grow to guarantee operation under increased variation. Margins enforce worst-case operation even in typical conditions...
详细信息
ISBN:
(纸本)9781450326711
Timing margins in fpgas are already significant and as process scaling continues they will have to grow to guarantee operation under increased variation. Margins enforce worst-case operation even in typical conditions and result in devices operating more slowly and consuming more energy than necessary. This paper presents a method of dynamic voltage and frequency scaling that uses online slack measurement to determine timing headroom in a circuit while it is operating and scale the voltage and/or frequency in response. Doing so can significantly reduce power consumption or increase throughput with a minimal overhead. The method is demonstrated on a number of benchmark circuits under a range of operating conditions, constraints and optimisation targets.
Packing is a critical step in the CAD flow for cluster-based fpga architectures, and has a significant impact on the quality of the final placement and routing results. One basic quality metric is routability. Traditi...
详细信息
ISBN:
(纸本)9781450326711
Packing is a critical step in the CAD flow for cluster-based fpga architectures, and has a significant impact on the quality of the final placement and routing results. One basic quality metric is routability. Traditionally, minimizing cut (the number of external signals) has been used as the main criterion in packing for routability optimization. This paper shows that minimizing cut is a sub-optimal criterion, and argues to use the Rent characteristic as the new criterion for fpga packing. We further propose using a recursive bipartitioning-based k-way partitioner to optimize the Rent characteristic during packing. We developed a new packer, PPack2, based on this approach. Compared to T-VPack, PPack2 achieves 35.4%, 35.6%, and 11.2% reduction in wire length, minimal channel width, and critical path delay, respectively. These improvements show that PPack2 outperforms all previous leading packing tools (including iRAC, HDPack, and the original PPack) by a wide margin.
暂无评论