检索结果-内蒙古大学图书馆

NASA-F: FPGA-Oriented Search and Acceleration for Multiplication-Reduced Hybrid Networks

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2024年第1期71卷 306-319页

作者： Shi, Huihong Xu, Yang Wang, Yuefei Mao, Wendong Wang, Zhongfeng Nanjing Univ Sch Elect Sci & Engn Nanjing 210023 Peoples R China Sun Yat sen Univ Sch Integrated Circuits Shenzhen 518107 Peoples R China

The costly multiplications challenge the deployment of modern deep neural networks (DNNs) on resource-constrained devices. To promote hardware efficiency, prior works have built multiplication-free models. However, they are generally inferior to their multiplication-based counterparts in accuracy, calling for multiplication-reduced hybrid models to marry the benefits of both approaches. To achieve this goal, recent works, i.e., NASA and NASA+ , have developed Neural Architecture Search (NAS) and Acceleration frameworks to search for and accelerate such hybrid models via a tailored differentiable NAS (DNAS) engine and dedicated ASIC-based accelerators. In this paper, we delve deeper into the inherent advantages of FPGAs and present an enhanced approach called NASA-F, which focuses on FPGA-oriented search and acceleration for hybrid models. Specifically, on the algorithm level, we develop a tailored one-shot supernet-based NAS engine to streamline the search for hybrid models, eliminating the need for executing NAS for each deployment as well as additional training/finetuning steps. On the hardware level, we develop a chunk-based accelerator to fully leverage the diverse hardware resources available on FPGAs for the acceleration of heterogeneous layers in hybrid models, aiming to enhance both hardware utilization and throughput. Extensive experimental results consistently validate the superiority of our NASA-F framework, e.g., we can gain up arrow 0.67% top-1 accuracy over the prior work NASA on CIFAR100 even without additional training steps for searched models. Additionally, we can achieve up to up arrow 1.86x throughout and up arrow 2.16x FPS with up arrow 0.39% top-1 accuracy over the state-of-the-art multiplication-based system on Tiny-ImageNet. Codes are available at https://***/shihuihong214/NASA-F.

关键词： Multiplication-reduced hybrid networks neural architecture search chunk-based accelerator FPGA accelerator algorithm-hardware co-design

来源：评论

学校读者我要写书评

暂无评论

NASA plus : Neural Architecture Search and Acceleration for Multiplication-Reduced Hybrid Networks

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2023年第6期70卷 2523-2536页

作者： Shi, Huihong You, Haoran Wang, Zhongfeng Lin, Yingyan Georgia Inst Technol Atlanta GA 30332 USA Nanjing Univ Sch Elect Sci & Engn Nanjing 210093 Peoples R China Georgia Inst Technol Sch Comp Sci Atlanta GA 30332 USA Nanjing Univ Sch Elect Sci & Engn Nanjing 210093 Peoples R China

Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafted multiplication-free DNNs, which are hardware-efficient but generally inferior to their multiplication-based counterparts in task accuracy, calling for multiplication-reduced hybrid DNNs to marry the best of both worlds. To this end, we propose a Neural Architecture Search and Acceleration (NASA) framework for the above hybrid models, dubbed NASA+, to boost both task accuracy and hardware efficiency. Specifically, NASA+ augments the state-of-the-art (SOTA) search space with multiplication-free operators to construct hybrid ones, and then adopts a novel progressive pretraining strategy to enable the effective search. Furthermore, NASA+ develops a chunk-based accelerator with novel reconfigurable processing elements to better support searched hybrid models, and integrates an auto-mapper to search for optimal dataflows. Experimental results and ablation studies consistently validate the effectiveness of our NASA+ algorithm-hardware co-design framework, e.g., we can achieve up to 65.1% lower energy-delay-product with comparable accuracy over the SOTA multiplication-based system on CIFAR100. Codes are available at https://***/GATECH-EIC/NASA.

关键词： Multiplication-reduced hybrid networks neural architecture search chunk-based accelerator reconfigurable PE algorithm-hardware co-design

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：