In this study, we propose a hardware-oriented Gaussian mixture model - multiresolution co-occurrence histograms of oriented gradients (GMM-MRCoHOG) algorithm for efficient human detection by a field-programmable gate ...
详细信息
ISBN:
(数字)9781510643659
ISBN:
(纸本)9781510643659
In this study, we propose a hardware-oriented Gaussian mixture model - multiresolution co-occurrence histograms of oriented gradients (GMM-MRCoHOG) algorithm for efficient human detection by a field-programmable gate array (FPGA). GMM- MRCoHOG is a HOG-based human detection method in which the computation of angles is quantized to 36 directions and 2D Gaussian distribution computation causes a decrease in processing speed and an increase in hardware resource usage. We propose a hardware-oriented algorithm to solve these problems. First, we propose a rough angle computation method of comparison with a tangent table. Second, we propose a bit-shifting-based Gaussian distribution computation method. Experimental results show that the proposed hardware-oriented algorithm does not significantly reduce the detection accuracy of GMM-MRCoHOG. High-level synthesis results of the FPGA implementation show that fast, low-resource processing is possible.
作者:
Ye, XinDing, DandanYu, LuZhejiang Univ
Inst Informat & Commun Engn Zhejiang Prov Key Lab Informat Network Technol Hangzhou 310027 Zhejiang Peoples R China
The flexible coding structure in High Efficiency Video Coding (HEVC) introduces many challenges to real-time implementation of the integer-pel motion estimation (IME). In this paper, a hardware-oriented IME algorithm ...
详细信息
ISBN:
(纸本)9781479961399
The flexible coding structure in High Efficiency Video Coding (HEVC) introduces many challenges to real-time implementation of the integer-pel motion estimation (IME). In this paper, a hardware-oriented IME algorithm naming parallel clustering tree search (PCTS) is proposed, where various prediction units (PU) are processed simultaneously with a parallel scheme. The PCTS consists of four hierarchical search steps. After each search step, PUs with the same MV candidate are clustered to one group. And the next search step is shared by PUs in the same group. Owing to the top-down tree-structure search strategy of the PCTS, search processes are highly shared among different PUs and system throughput is thus significantly increased. As a result, the hardware implementation based on the proposed algorithm can support real-time video applications of QFHD (3840x2160) at 30fps.
The promotion of the HEVC standard has significantly alleviated the burden of network transmission and video storage. However, its inherent complexity and data dependencies pose a significant challenge in achieving hi...
详细信息
The promotion of the HEVC standard has significantly alleviated the burden of network transmission and video storage. However, its inherent complexity and data dependencies pose a significant challenge in achieving high compression efficiency hardware encoder. To tackle this challenge, we propose several hardware-oriented algorithms and achieve a hardware encoder supporting both intra and inter coding. In terms of algorithms, our optimizations focus on intra mode decision, motion estimation (ME), rate estimation, and merge mode estimation. These optimizations reduce the computational complexity and address the data dependencies within and between encoder modules while maintaining an acceptable compression efficiency. As for hardware, we propose an encoder architecture that supports not only 35 intra prediction modes but also ME with an extensive search range of [+/- 64, +/- 64]. The uniform 4x4 engine, 2-D data reuse, and timing schedule for intra and inter coding are presented in this architecture to optimize the hardware resource consumption and throughput. Compared with HM 15.0, the proposed hardware-oriented algorithms lead to a 1.88% and 14.57% increase in BD-Rate under the configurations of all intra and low delay P, respectively. Notably, the BD-Rate outperforms all existing hardware encoders supporting 4K resolution. In a GF 28nm fabrication process, the hardware design achieves a clock frequency of 550MHz, supporting 4K@30fps throughput with a hardware gate count of 3154K and memory usage of 1.02MB, and the proposed architecture demonstrates substantial advantages in terms of area, throughput, and power compared to other studies.
A hardware-oriented algorithm for generating permutations is presented that takes as a theoretic base an iterative decomposition of the symmetric group S(n) into cosets. It generates permutations in a new order. Simpl...
详细信息
A hardware-oriented algorithm for generating permutations is presented that takes as a theoretic base an iterative decomposition of the symmetric group S(n) into cosets. It generates permutations in a new order. Simple ranking and unranking algorithms are given. The construction of a permutation generator is proposed which contains a cellular permutation network as a main component. The application of the permutation generator for solving a class of combinatorial problems on parallel computers is suggested.
暂无评论