咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >An Efficient Window-Based Visi... 收藏

An Efficient Window-Based Vision Transformer Accelerator via Mixed-Granularity Sparsity

作     者:Dong, Qiwei Zhang, Siyu Wang, Zhongfeng 

作者机构:Nanjing Univ Sch Elect Sci & Engn Nanjing 210023 Peoples R China Sun Yat Sen Univ Sch Integrated Circuits Shenzhen 518107 Peoples R China 

出 版 物:《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS》 (IEEE Trans. Circuits Syst. Regul. Pap.)

年 卷 期:2025年

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 

基  金:National Key Research and Development Program of China [2022YFB4400600] 

主  题:Algorithm-hardware co-design vision transformer pruning field-programmable gate array (FPGA) Algorithm-hardware co-design vision transformer pruning field-programmable gate array (FPGA) 

摘      要:Vision Transformers (ViTs) have achieved excellent performance on various computer vision tasks, while their high computation and memory costs pose challenges for practical deployment. To address this issue, token-level pruning is used as an effective method to compress ViTs, discarding unimportant image tokens that contribute little to predictions. However, directly applying unstructured token pruning to window-based ViTs damages their regular feature map structure, resulting in load imbalance when deployed on mobile devices. In this work, we propose an efficient algorithm-hardware co-optimized framework to accelerate window-based ViTs via adaptive Mixed-Granularity Sparsity (MGS). At the algorithm level, a hardware-friendly MGS algorithm is developed by integrating the inherent sparsity, global window pruning, and local N:M token pruning to balance model accuracy and its computational complexity. At the hardware level, we present a dedicated accelerator equipped with a sparse computing core and two lightweight auxiliary processing units to execute window-based calculations efficiently using MGS. Additionally, we devise a dynamic pipeline interleaving dataflow to achieve on-chip layer fusion, which reduces the processing latency and maximizes data reuse. Experimental results demonstrate that, with similar computational complexity, our highly structured MGS algorithm can achieve comparable or even better accuracy than previous compression methods. Moreover, compared to existing FPGA-based accelerators for Transformers, our design can achieve 1.80 similar to 6.52x and 1.16 similar to 12.05x improvements in terms of throughput and energy efficiency, respectively.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分