版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Nanjing Univ Sch Elect Sci & Engn Nanjing 210023 Peoples R China Sun Yat Sen Univ Sch Integrated Circuits Shenzhen 518107 Peoples R China
出 版 物:《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS》 (IEEE Trans. Circuits Syst. Regul. Pap.)
年 卷 期:2025年
核心收录:
基 金:National Key Research and Development Program of China [2022YFB4400600]
主 题:Algorithm-hardware co-design vision transformer pruning field-programmable gate array (FPGA) Algorithm-hardware co-design vision transformer pruning field-programmable gate array (FPGA)
摘 要:Vision Transformers (ViTs) have achieved excellent performance on various computer vision tasks, while their high computation and memory costs pose challenges for practical deployment. To address this issue, token-level pruning is used as an effective method to compress ViTs, discarding unimportant image tokens that contribute little to predictions. However, directly applying unstructured token pruning to window-based ViTs damages their regular feature map structure, resulting in load imbalance when deployed on mobile devices. In this work, we propose an efficient algorithm-hardware co-optimized framework to accelerate window-based ViTs via adaptive Mixed-Granularity Sparsity (MGS). At the algorithm level, a hardware-friendly MGS algorithm is developed by integrating the inherent sparsity, global window pruning, and local N:M token pruning to balance model accuracy and its computational complexity. At the hardware level, we present a dedicated accelerator equipped with a sparse computing core and two lightweight auxiliary processing units to execute window-based calculations efficiently using MGS. Additionally, we devise a dynamic pipeline interleaving dataflow to achieve on-chip layer fusion, which reduces the processing latency and maximizes data reuse. Experimental results demonstrate that, with similar computational complexity, our highly structured MGS algorithm can achieve comparable or even better accuracy than previous compression methods. Moreover, compared to existing FPGA-based accelerators for Transformers, our design can achieve 1.80 similar to 6.52x and 1.16 similar to 12.05x improvements in terms of throughput and energy efficiency, respectively.