The convolutional neural network (CNN)-based super-resolution (SR) has shown outstanding performance in the field of computer vision. The implementation of inference hardware for CNN-based SR has suffered from the int...
详细信息
The convolutional neural network (CNN)-based super-resolution (SR) has shown outstanding performance in the field of computer vision. The implementation of inference hardware for CNN-based SR has suffered from the intensive computation with severely unbalanced computation load among layers. Various light-weighted SR networks have been researched with little performance degradation. However, the hardware-efficient dataflow is also required to efficiently accelerate inference hardware within limited resources. In this article, we propose the hardware-efficient dataflow of CNN-based SR that reduces computation load by increasing data reuse and increases processelement (PE) utilization by balancing the computation load among layers for high throughput. In the proposed dataflow, row-wise pixels in the receptive field are computed by circularly shifting memory addresses to maximize data reuse. The partial convolution is exploited in a layer-based pipeline architecture to relieve intensive computation in a single pipeline stage. The delay-balancing with adjusting parallelism is employed for balancing computations precisely in the overall layers. Furthermore, the inference hardware of CNN-based SR is implemented for 4K ultrahigh definition at 60 fps on a field-programmable gate array (FPGA). For hardware-friendly computation, the quantization of activation and weight is adopted. The proposed hardware shows an average peak signal-to-noise ratio of 36.42 dB in the Set-5 dataset with a memory usage of 53 KB and an average PE utilization of 76.7% in the overall layers. Thus, it achieves the lowest memory usage and highest PE utilization compared with other inference hardware for CNN-based SR.
暂无评论