The costly multiplications challenge the deployment of modern deep neural networks (DNNs) on resource-constrained devices. To promote hardware efficiency, prior works have built multiplication-free models. However, th...
详细信息
The costly multiplications challenge the deployment of modern deep neural networks (DNNs) on resource-constrained devices. To promote hardware efficiency, prior works have built multiplication-free models. However, they are generally inferior to their multiplication-based counterparts in accuracy, calling for multiplication-reduced hybrid models to marry the benefits of both approaches. To achieve this goal, recent works, i.e., NASA and NASA+ , have developed Neural Architecture Search (NAS) and Acceleration frameworks to search for and accelerate such hybrid models via a tailored differentiable NAS (DNAS) engine and dedicated ASIC-basedaccelerators. In this paper, we delve deeper into the inherent advantages of FPGAs and present an enhanced approach called NASA-F, which focuses on FPGA-oriented search and acceleration for hybrid models. Specifically, on the algorithm level, we develop a tailored one-shot supernet-based NAS engine to streamline the search for hybrid models, eliminating the need for executing NAS for each deployment as well as additional training/finetuning steps. On the hardware level, we develop a chunk-based accelerator to fully leverage the diverse hardware resources available on FPGAs for the acceleration of heterogeneous layers in hybrid models, aiming to enhance both hardware utilization and throughput. Extensive experimental results consistently validate the superiority of our NASA-F framework, e.g., we can gain up arrow 0.67% top-1 accuracy over the prior work NASA on CIFAR100 even without additional training steps for searched models. Additionally, we can achieve up to up arrow 1.86x throughout and up arrow 2.16x FPS with up arrow 0.39% top-1 accuracy over the state-of-the-art multiplication-based system on Tiny-ImageNet. Codes are available at https://***/shihuihong214/NASA-F.
Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafte...
详细信息
Multiplication is arguably the most computation-intensive operation in modern deep neural networks (DNNs), limiting their extensive deployment on resource-constrained devices. Thereby, pioneering works have handcrafted multiplication-free DNNs, which are hardware-efficient but generally inferior to their multiplication-based counterparts in task accuracy, calling for multiplication-reduced hybrid DNNs to marry the best of both worlds. To this end, we propose a Neural Architecture Search and Acceleration (NASA) framework for the above hybrid models, dubbed NASA+, to boost both task accuracy and hardware efficiency. Specifically, NASA+ augments the state-of-the-art (SOTA) search space with multiplication-free operators to construct hybrid ones, and then adopts a novel progressive pretraining strategy to enable the effective search. Furthermore, NASA+ develops a chunk-based accelerator with novel reconfigurable processing elements to better support searched hybrid models, and integrates an auto-mapper to search for optimal dataflows. Experimental results and ablation studies consistently validate the effectiveness of our NASA+ algorithm-hardware co-design framework, e.g., we can achieve up to 65.1% lower energy-delay-product with comparable accuracy over the SOTA multiplication-based system on CIFAR100. Codes are available at https://***/GATECH-EIC/NASA.
暂无评论