Precision-scalable convolutional neural networks (CNNs) offer a promising solution to balance network accuracy and hardware efficiency, facilitating high-performance execution on embedded devices. However, the require...
详细信息
Precision-scalable convolutional neural networks (CNNs) offer a promising solution to balance network accuracy and hardware efficiency, facilitating high-performance execution on embedded devices. However, the requirement for small fine-grained multiplication calculations in precision-scalable (PS) networks has resulted in limited exploration on FPGA platforms. It is found that the deployment of PS accelerators encounters the following challenges: LUT-based multiply-accumulates (MACs) fail to make full use of DSP, and DSP-based MACs support limited precision combinations and cannot efficiently utilize DSP. Therefore, this brief proposes a DSP-based precision-scalable MAC with hybrid dataflow that supports most precision combinations and ensures high-efficiency utilization of DSP and LUT resources. Evaluating on mixed 4 b/8b VGG16, compared with 8b baseline, the proposed accelerator achieves 3.97x improvement in performance with only a 0.37% accuracy degradation. Additionally, compared with state-of-the-art accelerators, the proposed accelerator achieves 1.20 x -2.69x improvement in DSP efficiency and 1.63 x -6.34x improvement in LUT efficiency.
暂无评论