OmpSs is an OpenMP-like directive-based programming model that includes heterogeneous execution (MIC, GPU, SMP, etc.) and runtime task dependencies management. Indeed, OmpSs has largely influenced the recently appeare...
详细信息
ISBN:
(纸本)9781450326711
OmpSs is an OpenMP-like directive-based programming model that includes heterogeneous execution (MIC, GPU, SMP, etc.) and runtime task dependencies management. Indeed, OmpSs has largely influenced the recently appeared OpenMP 4.0 specification. Zynq All-programmable SoC combines the features of a SMP and a FPGA and benefits DLP, ILP and TLP parallelisms in order to efficiently exploit the new technology improvements and chip resource capacities. In this paper, we focus on programmability and heterogeneous execution support, presenting a successful combination of the OmpSs programming model and the Zynq All-programmable SoC platforms.
Systolic arrays (SA) in a FPGA provide a significant speed up on many scientific calculations through massive parallelism exploitation. The low-level hardware design of such complex SA is becoming more time-consuming ...
详细信息
ISBN:
(纸本)9781450326711
Systolic arrays (SA) in a FPGA provide a significant speed up on many scientific calculations through massive parallelism exploitation. The low-level hardware design of such complex SA is becoming more time-consuming and non-scalable with more transistors being available on a single chip. In this paper we present a novel methodology to generate multi-dimensional SA for FPGAs using a well-accepted high-level language, OpenCL. Kernels written in OpenCL can then be compiled directly into hardware using an OpenCL high-level synthesis tool. A complex case study using our methodology is presented. We were able to design, generate, verify and optimize the entire FPGA based hardware accelerator using the Smith-Waterman, in only three man weeks. The accelerator's top performance was 32.6 GCUPS (Giga-Cell-Updates-Per-Second) on a DNA similarity search with 1.3 GCUPS/watt efficiency. The result is superior to most state-of-the-art CPU/GPU implementations and competitive against a hand-crafted hardware design which took many months to develop.
The proceedings contain 37 papers. The topics discussed include: comparing FPGA vs. custom CMOs and the impact on processor microarchitecture;VEGAS: soft vector processor with scratchpad memory;leap scratchpads: autom...
ISBN:
(纸本)9781450305549
The proceedings contain 37 papers. The topics discussed include: comparing FPGA vs. custom CMOs and the impact on processor microarchitecture;VEGAS: soft vector processor with scratchpad memory;leap scratchpads: automatic memory and cache management for reconfigurable logic;NETTM: faster and easier synchronization for soft multicores via transactional memory;LegUp: high-level synthesis for FPGA-based processor/accelerator systems;automatic SoC design flow on many-core processors: a software hardware co-design approach for FPGAs;Torc: towards an open-source tool flow;FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting;a platform for high level synthesis of memory-intensive image processing algorithms;energy-efficient specialization of functional units in a coarse-grained reconfigurable array;and DEEP: an iterative FPGA-based many-core emulation system for chip verification and architecture research.
The size of configuration bitstreams of field-programmablegatearrays (FPGA) is increasing rapidly. Compression techniques are used to decrease the size of bitstreams. In this paper, an appropriate bitstream format a...
详细信息
Locality exploitation is essential to asymptotic energy minimization for gate array netlist evaluation. Naive implementations that ignore locality, including flat crossbars and simple processors based on monolithic me...
详细信息
This paper presents an FPGA-specific implementation of the floating-point tangent function. The implementation inputs values in the interval [-π/2,π/2], targets the IEEE-754 single-precision format and has an accura...
详细信息
We are proposing a shared-memory communication infrastructure that provides a common parallel programming interface for FPGA and CPU components in a heterogeneous system. Our intent is to ease the integration of recon...
详细信息
The rising complexity of verification has led to an increase in the use of FPGA prototyping, which can run at significantly higher operating frequencies and achieve much higher coverage than logic simulations. However...
详细信息
This paper describes architectural enhancements in the Altera Stratix-V" FPGA architecture, built on a 28nm TSMC process, together with the data supporting those choices. Among the key features are time borrowing...
详细信息
High-level synthesis (HLS) has been gaining traction recently as a design methodology for FPGAs, with the promise of raising the productivity of FPGA hardware designers, and ultimately, opening the door to the use of ...
详细信息
暂无评论