In order to investigate new FPGA logic blocks, FPGA ar-chitects have traditionally needed to customize CAD tools to make use of the new features and characteristics of those blocks. The software development effort nec...
详细信息
ISBN:
(纸本)9781450326711
In order to investigate new FPGA logic blocks, FPGA ar-chitects have traditionally needed to customize CAD tools to make use of the new features and characteristics of those blocks. The software development effort necessary to cre-ate such CAD tools can be a time-consuming process that can significantly limit the number and variety of architec-tures explored. Thus, architects want flexible CAD tools that can, with few or no software modifications, explore a diverse space. Existing flexible CAD tools suffer from im-practically long runtimes and/or fail to efficiently make use of the important new features of the logic blocks being in-vestigated. This work is a step towards addressing these concerns by enhancing the packing stage of the open-source VTR CAD flow [17] to efficiently deal with common inter-connect structures that are used to create many kinds of use-ful novel blocks. These structures include crossbars, carry chains, dedicated signals, and others. To accomplish this, we employ three techniques in this work: speculative packing , pre-packing, and interconnect-aware pin counting. We show that these techniques, along with three minor modi-fications, result in improvements to runtime and quality of results across a spectrum of architectures, while simultane-ously expanding the scope of architectures that can be ex-plored. Compared with VTR 1.0 [17], we show an average 12-fold speedup in packing for fracturable LUT architectures with 20% lower minimum channel width and 6% lower crit-ical path delay. We obtain a 6 to 7-fold speedup for archi-tectures with non-fracturable LUTs and architectures with depopulated crossbars. In addition, we demonstrate packing support for logic blocks with carry chains.
OmpSs is an OpenMP-like directive-based programming model that includes heterogeneous execution (MIC, GPU, SMP, etc.) and runtime task dependencies management. Indeed, OmpSs has largely influenced the recently appeare...
详细信息
ISBN:
(纸本)9781450326711
OmpSs is an OpenMP-like directive-based programming model that includes heterogeneous execution (MIC, GPU, SMP, etc.) and runtime task dependencies management. Indeed, OmpSs has largely influenced the recently appeared OpenMP 4.0 specification. Zynq All-programmable SoC combines the features of a SMP and a FPGA and benefits DLP, ILP and TLP parallelisms in order to efficiently exploit the new technology improvements and chip resource capacities. In this paper, we focus on programmability and heterogeneous execution support, presenting a successful combination of the OmpSs programming model and the Zynq All-programmable SoC platforms.
Systolic arrays (SA) in a FPGA provide a significant speed up on many scientific calculations through massive parallelism exploitation. The low-level hardware design of such complex SA is becoming more time-consuming ...
详细信息
ISBN:
(纸本)9781450326711
Systolic arrays (SA) in a FPGA provide a significant speed up on many scientific calculations through massive parallelism exploitation. The low-level hardware design of such complex SA is becoming more time-consuming and non-scalable with more transistors being available on a single chip. In this paper we present a novel methodology to generate multi-dimensional SA for FPGAs using a well-accepted high-level language, OpenCL. Kernels written in OpenCL can then be compiled directly into hardware using an OpenCL high-level synthesis tool. A complex case study using our methodology is presented. We were able to design, generate, verify and optimize the entire FPGA based hardware accelerator using the Smith-Waterman, in only three man weeks. The accelerator's top performance was 32.6 GCUPS (Giga-Cell-Updates-Per-Second) on a DNA similarity search with 1.3 GCUPS/watt efficiency. The result is superior to most state-of-the-art CPU/GPU implementations and competitive against a hand-crafted hardware design which took many months to develop.
The proceedings contain 37 papers. The topics discussed include: comparing FPGA vs. custom CMOs and the impact on processor microarchitecture;VEGAS: soft vector processor with scratchpad memory;leap scratchpads: autom...
ISBN:
(纸本)9781450305549
The proceedings contain 37 papers. The topics discussed include: comparing FPGA vs. custom CMOs and the impact on processor microarchitecture;VEGAS: soft vector processor with scratchpad memory;leap scratchpads: automatic memory and cache management for reconfigurable logic;NETTM: faster and easier synchronization for soft multicores via transactional memory;LegUp: high-level synthesis for FPGA-based processor/accelerator systems;automatic SoC design flow on many-core processors: a software hardware co-design approach for FPGAs;Torc: towards an open-source tool flow;FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting;a platform for high level synthesis of memory-intensive image processing algorithms;energy-efficient specialization of functional units in a coarse-grained reconfigurable array;and DEEP: an iterative FPGA-based many-core emulation system for chip verification and architecture research.
The size of configuration bitstreams of field-programmablegatearrays (FPGA) is increasing rapidly. Compression techniques are used to decrease the size of bitstreams. In this paper, an appropriate bitstream format a...
详细信息
Locality exploitation is essential to asymptotic energy minimization for gate array netlist evaluation. Naive implementations that ignore locality, including flat crossbars and simple processors based on monolithic me...
详细信息
This paper presents an FPGA-specific implementation of the floating-point tangent function. The implementation inputs values in the interval [-π/2,π/2], targets the IEEE-754 single-precision format and has an accura...
详细信息
We are proposing a shared-memory communication infrastructure that provides a common parallel programming interface for FPGA and CPU components in a heterogeneous system. Our intent is to ease the integration of recon...
详细信息
The rising complexity of verification has led to an increase in the use of FPGA prototyping, which can run at significantly higher operating frequencies and achieve much higher coverage than logic simulations. However...
详细信息
This paper describes architectural enhancements in the Altera Stratix-V" FPGA architecture, built on a 28nm TSMC process, together with the data supporting those choices. Among the key features are time borrowing...
详细信息
暂无评论