Convolutional neural networks (CNNs) are computationally demanding due to expensive Multiply-ACcumulate (MAC) operations. Emerging neural network models, such as AdderNet, exploit efficient arithmetic alternatives lik...
详细信息
Convolutional neural networks (CNNs) are computationally demanding due to expensive Multiply-ACcumulate (MAC) operations. Emerging neural network models, such as AdderNet, exploit efficient arithmetic alternatives like sum-of-absolute-difference (SAD) operations to replace the costly MAC operations, while still achieving competitive model accuracy as compared with the CNN counterparts. Nevertheless, existing AdderNet accelerators still face critical implementation challenges to achieve maximal hardware and energy efficiency at the cost of model inference accuracy loss. This paper presents AdderNet 2.0, an algorithm-hardware co-design framework featuring a novel activation-oriented quantization (AOQ) strategy, a Fused Bias Removal (FBR) scheme for on-chip feature map memory bitwidth reduction, and optimal PE designs to improve the overall resource utilization towards optimal AdderNet accelerator designs. Multiple AdderNet 2.0 accelerator design variants were implemented on Xilinx KV-260 FPGA. Experimental results show that the INT6 AdderNet 2.0 accelerators achieve significant hardware resource and energy savings when compared to prior CNN and AdderNet designs.
暂无评论