文献详情 >An Efficient Window-Based Visi... 收藏

An Efficient Window-Based Vision Transformer Accelerator via Mixed-Granularity Sparsity

作者：Dong, Qiwei Zhang, Siyu Wang, Zhongfeng

作者机构：Nanjing Univ Sch Elect Sci & Engn Nanjing 210023 Peoples R China Sun Yat Sen Univ Sch Integrated Circuits Shenzhen 518107 Peoples R China

出版物：《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS》 (IEEE Trans. Circuits Syst. Regul. Pap.)

年卷期：2025年

核心收录：

学科分类：0808[工学-电气工程] 08[工学]

基　　金：National Key Research and Development Program of China [2022YFB4400600]

主　　题：Algorithm-hardware co-design vision transformer pruning field-programmable gate array (FPGA) Algorithm-hardware co-design vision transformer pruning field-programmable gate array (FPGA)

摘要：Vision Transformers (ViTs) have achieved excellent performance on various computer vision tasks, while their high computation and memory costs pose challenges for practical deployment. To address this issue, token-level pruning is used as an effective method to compress ViTs, discarding unimportant image tokens that contribute little to predictions. However, directly applying unstructured token pruning to window-based ViTs damages their regular feature map structure, resulting in load imbalance when deployed on mobile devices. In this work, we propose an efficient algorithm-hardware co-optimized framework to accelerate window-based ViTs via adaptive Mixed-Granularity Sparsity (MGS). At the algorithm level, a hardware-friendly MGS algorithm is developed by integrating the inherent sparsity, global window pruning, and local N:M token pruning to balance model accuracy and its computational complexity. At the hardware level, we present a dedicated accelerator equipped with a sparse computing core and two lightweight auxiliary processing units to execute window-based calculations efficiently using MGS. Additionally, we devise a dynamic pipeline interleaving dataflow to achieve on-chip layer fusion, which reduces the processing latency and maximizes data reuse. Experimental results demonstrate that, with similar computational complexity, our highly structured MGS algorithm can achieve comparable or even better accuracy than previous compression methods. Moreover, compared to existing FPGA-based accelerators for Transformers, our design can achieve 1.80 similar to 6.52x and 1.16 similar to 12.05x improvements in terms of throughput and energy efficiency, respectively.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

An Efficient Window-Based Vision Transformer Accelerator via Mixed-Granularity Sparsity

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

An Efficient Window-Based Vision Transformer Accelerator via Mixed-Granularity Sparsity

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：