Obtaining a real-time implementation for a facedetection system is the first step towards human-machine interaction. This paper presents an architecture, implementable on an FPGA, for accelerating the haar-basedface...
详细信息
ISBN:
(纸本)9781479964994
Obtaining a real-time implementation for a facedetection system is the first step towards human-machine interaction. This paper presents an architecture, implementable on an FPGA, for accelerating the haar-based face detection algorithm through use of multiple dedicated processing units by utilizing the inherent parallelism in the algorithm. The architecture is designed to be scalable and the facedetection load has been distributed among the processing units so as to reduce the idle time. The design has been synthesized for the Xilinx Virtex-5 board. Use of a single processing unit gives an improvement in the facedetection frame rate of 5.45 times over an Intel i5, 2.4 GHz processor. The frame rate is further doubled by scaling the architecture to include four processing units running in parallel.
Integral Image generation is an important step in the haar-based face detection algorithm and plays a vital role in the optimization of the algorithm for real-time implementation by evaluating each of the thousands of...
详细信息
ISBN:
(纸本)9781479940752
Integral Image generation is an important step in the haar-based face detection algorithm and plays a vital role in the optimization of the algorithm for real-time implementation by evaluating each of the thousands of haar-features in constant time. The integral image of a frame needs to be generated as a pre-processing stage before running the haar-classifier on the detection windows within the frame. The delay required in the pre-processing stage is directly proportional to the resolution of the frame and becomes substantial with increasing resolution, limiting the real-time operation of the entire algorithm. This paper presents a novel architecture for generating the integral images of the current detection window dynamically, such that the pre-processing delay is reduced substantially. It is shown that an improvement of 13.71 times is obtained for VGA frames by this approach. The usage of hardware resources is also reduced by around 50% compared to previous implementations.
暂无评论