咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Augmented shortcuts for vision... 收藏
arXiv

Augmented shortcuts for vision transformers

作     者:Tang, Yehui Han, Kai Xu, Chang Xiao, An Deng, Yiping Xu, Chao Wang, Yunhe 

作者机构: Dept. of Machine Intelligence Peking University Noah's Ark Lab Huawei Technologies Central Software Institution Huawei Technologies School of Computer Science Faculty of Engineering University of Sydney 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2021年

核心收录:

主  题:Machine learning 

摘      要:Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases, i.e., feature collapse. In this paper, we theoretically analyze the feature collapse phenomenon and study the relationship between shortcuts and feature diversity in these transformer models. Then, we present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. To save the computational costs, we further explore an efficient approach that uses the block-circulant projection to implement augmented shortcuts. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs. Copyright © 2021, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分