convolutionalneuralnetwork(CNN)has been widely adopted in many tasks. Its inference process is usually applied on edge devices where the computing resources and power consumption are *** present, the performance of ...
详细信息
convolutionalneuralnetwork(CNN)has been widely adopted in many tasks. Its inference process is usually applied on edge devices where the computing resources and power consumption are *** present, the performance of general processors cannot meet the requirement for CNN models with high computation complexity and large number of parameters. Field-programmable gate array(FPGA)-based custom computing architecture is a promising solution to further enhance the CNN inference *** software/hardware co-design can effectively reduce the computing overhead, and improve the inference performance while ensuring accuracy. In this paper, the mainstream methods of CNN structure design, hardwareoriented model compression and FPGA-based custom architecture design are summarized, and the improvement of CNN inference performance is demonstrated through an example. Challenges and possible research directions in the future are concluded to foster research efforts in this domain.
Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutionalneuralnetwork (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-pr...
详细信息
Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutionalneuralnetwork (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take the advantages of low precision while maintaining accuracy. In this article, we propose a high-performance, highly flexible W(8)A(8) (INT8 weight and INT8 activation) and W(T)A(2) (TERNARY weight and INT2 activation) mixed-precision CNN inference hardware architecture, DPUmxp, designed and implemented on Xilinx Virtex UltraScale+13P FPGA with peak performance up to 58.9 TOPS.
In this study, a very-large-scale integration implementation of a convolutionalneuralnetwork (CNN) inference for abnormal heartbeat detection was proposed. Four-lead electrocardiogram signals were used to detect abn...
详细信息
In this study, a very-large-scale integration implementation of a convolutionalneuralnetwork (CNN) inference for abnormal heartbeat detection was proposed. Four-lead electrocardiogram signals were used to detect abnormal heartbeat conditions, such as premature ventricular complex. 1D CNNs and fully connected layers were utilised in the proposed chip to achieve high-speed, small-area, and high-accuracy arrhythmia detection. The proposed chip was implemented using a 90-nm complementary metal-oxide-semiconductor process and operated at 125 MHz with a 0.67mm(2) core area. The power consumption was 4.18mW at high-speed operation frequency (125 MHz) and 3.79 mu W at 10 kHz for low-power applications. The detection accuracy was 95.14% based on the MIT-BIH arrhythmia database. Consequently, the properties of high speed, low power, small area, and high accuracy were established in the proposed accelerator chip.
暂无评论