咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >What Dense Graph Do You Need f... 收藏
arXiv

What Dense Graph Do You Need for Self-Attention?

作     者:Wang, Yuxin Lee, Chu-Tak Guo, Qipeng Yin, Zhangyue Zhou, Yunhua Huang, Xuanjing Qiu, Xipeng 

作者机构:School of Computer Science Fudan University China Institute of Modern Languages and Linguistics Fudan University China Peng Cheng Laboratory China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2022年

核心收录:

主  题:Economic and social effects 

摘      要:Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and memory complexities. Recent works propose sparse Transformers with attention on sparse graphs to reduce complexity and remain strong performance. While effective, the crucial parts of how dense a graph needs to be to perform well are not fully explored. In this paper, we propose Normalized Information Payload (NIP), a graph scoring function measuring information transfer on graph, which provides an analysis tool for trade-offs between performance and complexity. Guided by this theoretical analysis, we present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer while yielding O(N log N) complexity with sequence length N. Experiments on tasks requiring various sequence lengths lay validation for our graph function well. Copyright © 2022, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分