检索结果-内蒙古大学图书馆

Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

作者： Mingqiang Huang Yucen Liu Sixiao Huang Kai Li Qiuping Wu Hao Yu Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China Southern University of Science and Technology Shenzhen China

ISBN: (纸本)9781450394178

multi-bit-width neural network enlightens a promising method for high performance yet energy efficient edge computing due to its balance between software algorithm accuracy and hardware efficiency. To date, FPGA has been one of the core hardware platforms for deploying various neural networks. However, it is still difficult to fully make use of the dedicated digital signal processing (DSP) blocks in FPGA for accelerating the multi-bit-width network. In this work, we develop state-of-the-art multi-bit-width convolutional neural network accelerator with novel systolic-in-systolic type of dataflow and single DSP multiple multiplication (SDMM) INT2/4/8 execution scheme. multi-level optimizations have also been adopted to further improve the performance, including group-vector systolic array for maximizing the circuit efficiency as well as minimizing the systolic delay, and differential neural architecture search (NAS) method for the high accuracy multi-bit-width network generation. The proposed accelerator has been practically deployed on Xilinx ZCU102 with accelerating NAS optimized VGG16 and Resnet18 networks as case studies. Average performance on accelerating the convolutional layer in VGG16 and Resnet18 is 1289GOPs and 1155GOPs, respectively. Throughput for running the full multi-bit-width VGG16 network is 870.73 GOPS at 250MHz, which has exceeded all of previous cnn accelerators on the same platform.

关键词： systolic array FPGA cnn accelerator multi-bit-width cnn

来源：评论

学校读者我要写书评

暂无评论

A High Performance multi-bit-width Booth Vector Systolic Accelerator for NAS Optimized Deep Learning Neural Networks

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2022年第9期69卷 3619-3631页

作者： Huang, Mingqiang Liu, Yucen Man, Changhai Li, Kai Cheng, Quan Mao, Wei Yu, Hao Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518055 Peoples R China Southern Univ Sci & Technol Sch Microelect Shenzhen 518055 Peoples R China Kyoto Univ Dept Commun & Comp Engn Kyoto 6068501 Japan

multi-bit-width convolutional neural network (cnn) maintains the balance between network accuracy and hardware efficiency, thus enlightening a promising method for accurate yet energy-efficient edge computing. In this work, we develop state-of-the-art multi-bit-width accelerator for NAS Optimized deep learning neural networks. To efficiently process the multi-bit-width network inferencing, multi-level optimizations have been proposed. Firstly, differential Neural Architecture Search (NAS) method is adopted for the high accuracy multi-bit-width network generation. Secondly, hybrid Booth based multi-bit-width multiply-add-accumulation (MAC) unit is developed for data processing. Thirdly, vector systolic array is proposed for effectively accelerating the matrix multiplications. With vector-style systolic dataflow, both the processing time and logic resources consumption can be reduced when compared with the classical systolic array. Finally, The proposed multi-bit-width cnn acceleration scheme has been practically deployed on FPGA platform of Xilinx ZCU102. Average performance on accelerating the full NAS optimized VGG16 network is 784.2 GOPS, and peek performance of the convolutional layer can reach as high as 871.26 GOPS for INT8, 1676.96 GOPS for INT4, and 2863.29 GOPS for INT2 respectively, which is among the best results in previous cnn accelerator benchmarks.

关键词： Hardware Convolutional neural networks Computer architecture Field programmable gate arrays Systolic arrays Neural networks Training multi-bit-width cnn systolic array NAS FPGA cnn

来源：评论

学校读者我要写书评

暂无评论

A High Throughput multi-bit-width 3D Systolic Accelerator for NAS Optimized Deep Neural Networks on FPGA 22

A High Throughput Multi-bit-width 3D Systolic Accelerator fo...

引用

Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

作者： Mingqiang Huang Yucen Liu Quan Cheng Shuxin Yang Kai Li Junyi Luo Zhengke Yang Qiufeng Li Hao Yu Changhai Man Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China Southern University of Science and Technology Shenzhen China

ISBN: (纸本)9781450391498

Neural architecture search (NAS) optimized multi-bit-width convolutional neural network (cnn) maintains the balance between network performance and efficiency, thus enlightening a promising method for accurate yet energy-efficient edge computing. In this work, we propose a high throughput three-dimensional (3D) systolic accelerator for NAS optimized cnns, in which the input feature matrix, weight matrix and output feature matrix are delivering vertically, horizontally and perpendicularly through the systolic array respectively. With 3D systolic data flow, the processing time and logic resources consumption can be both reduced compared to the classical non-stationary systolic array. Besides, Booth-based multi-bit-width (INT2/4/8) multiply-add-accumulation (MAC) unit is developed within the 3D systolic accelerator. Deployed on FPGA platform Xilinx ZCU102, peek performance of the convolutional layer can reach as high as 2775 GOPS for INT2, 1650 GOPS for INT4, and 816 GOPS for INT8 respectively. The average performance on accelerating full NAS VGG16 network is 647 GOPS.

关键词： nas multi-bit-width cnn systolic array

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：