This paper investigates VLSI architectures for digital processing (dsp) functions amenable to low energy operation with scalable performance for H.265 high efficiency video coding (HEVC) applications. First, we descri...
详细信息
ISBN:
(纸本)9781479941322
This paper investigates VLSI architectures for digital processing (dsp) functions amenable to low energy operation with scalable performance for H.265 high efficiency video coding (HEVC) applications. First, we describe and experimentally evaluate a novel adaptive computing fabric. Second, we propose an energy-efficient method to scale the performance of the fabric for large images or for meeting stringent real-time computation requirements. A series of tradeoffs for exploiting efficiently the application space for general purpose dsp acceleration are proposed. We experimentally show how the proposed computing fabric is reusable for Filters, FFT and DCT acceleration with a scalable throughput. We report on the design and implementation of the fabric on a Xilinx FPGA device and show how regulated-parallelism augmented with in-memory processing techniques impact performance and power efficiency. The FPGA prototype demonstrates a sustained throughput exceeding 10Gbps irrespective of the kernel and image size for H.265 HEVC applications.
In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every...
详细信息
In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every digital signal processor (dsp) developer's toolset. In general, such a library provides high-level interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work. Complex signal transformations then become function calls, e.g., C-callable functions. Considering the two-dimensional (2-D) convolver function as an example of great significance for dsp's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming. Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of dsp-oriented hardware configurations intended to significantly speed up the performance of general dsp processors. Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose dsp and accelerators are proposed, In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 dsp microprocessor available from Texas Instruments Incorporated, Dallas, TX, However, the proposed concept is not limited to a particular processor.
暂无评论