检索结果-内蒙古大学图书馆

HARDSEA: Hybrid Analog-ReRAM Clustering and Digital-SRAM In-Memory computing Accelerator for Dynamic Sparse Self-Attention in Transformer

引用

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 2024年第2期32卷 269-282页

作者： Liu, Shiwei Mu, Chen Jiang, Hao Wang, Yunzhengmao Zhang, Jinshan Lin, Feng Zhou, Keji Liu, Qi Chen, Chixiao Fudan Univ State Key Lab Integrated Chips & Syst Shanghai 200433 Peoples R China Zhangjiang Lab Shanghai 201210 Peoples R China

Self-attention-based transformers have outperformed recurrent and convolutional neural networks (RNN/ CNNs) in many applications. Despite the effectiveness, calculating self-attention is prohibitively costly due to quadratic computation and memory requirements. To solve this challenge, this article proposes a hybrid analog-ReRAM and digital-SRAM in-memory computing accelerator (HARDSEA), a computing-in-memory (CIM) accelerator supporting self-attention in transformer applications. To trade off between energy efficiency and algorithm accuracy, HARDSEA features an algorithm-architecture-circuit codesign. A product-quantization-based scheme dynamically facilitates self-attention sparsity by predicting lightweight token relevance. A hybrid in-memory computing architecture employs both high-efficiency analog ReRAM-CIM and high-precision digital SRAM-CIM to implement the proposed new scheme. The ReRAM-CIM, whose precision is sensitive to circuit nonidealities, takes charge of token relevance prediction where only computing monotonicity is demanded. The SRAM-CIM, utilized for exact sparse attention computing, is reorganized as an on-memory-boundary computing scheme, thus adapting to irregular sparsity patterns. In addition, we propose a time-domain winner-take-all (WTA) circuit to replace the expensive ADCs in ReRAM-CIM macros. Experimental results show that HARDSEA prunes BERT and GPT-2 models to 12%-33% sparsity without accuracy loss, achieving 13.5x13.5x- 28.5x28.5x speedup and 291.6x291.6x- 1894.3x1894.3x energy efficiency over GPU. compared to state-of-the-art transformer accelerators, HARDSEA has 1.2x1.2x- 14.9x14.9x better energy efficiency at the same level of throughput.

关键词： algorithm-architecture-circuit co-design in-memory computing self-attention transformer

来源：评论

学校读者我要写书评

暂无评论

An Energy-Efficient Deep Belief Network Processor Based on Heterogeneous Multi-core architecture With Transposable Memory and On-Chip Learning

引用

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN circuitS AND SYSTEMS 2021年第4期11卷 725-738页

作者： Wu, Jiajun Huang, Xuan Yang, Le Wang, Jipeng Liu, Bingqiang Wen, Ziyuan Li, Juhui Yu, Guoyi Chong, Kwen-Siong Wang, Chao Huazhong Univ Sci & Technol Sch Opt & Elect Informat Wuhan 430074 Hubei Peoples R China Nations Innovat Technol Pte Ltd Singapore 117674 Singapore Nanyang Technol Univ Singapore Temasek Labs Singapore 637553 Singapore Wuhan Natl Lab Optoelect Wuhan 430074 Hubei Peoples R China

With the growing interest of edge computing in the Internet of Things (IoT), Deep Neural Network (DNN) hardware processors/accelerators face challenges of low energy consumption, low latency, and data privacy issues. This paper proposes an energy-efficient processor design based on Deep Belief Network (DBN), which is one of the most suitable DNN models for on- chip learning. In this study, a thorough algorithm-architecture-circuit design optimization method is used for efficient design. The characteristics of data reuse and data sparsity in the DBN learning algorithm inspires this study to propose a heterogeneous multi-core architecture with local learning. In addition, novel circuits of transposable weight memory and sparse address generator are proposed to reduce weight memory access and exploit neuron state sparsity, respectively, for maximizing the energy efficiency. The DBN processor is implemented and thoroughly evaluated on Xilinx Zynq FPGA. Implementation results confirm that the proposed DBN processor has excellent energy efficiency of 45.0 pJ per neuron-weight update, which has been improved by 74% against the conventional design.

关键词： Neurons Energy efficiency computational modeling Unsupervised learning System-on-chip Integrated circuit modeling computer architecture Edge computing Deep Belief Network (DBN) on-chip learning algorithm-architecture-circuit co-design data reuse data sparsity heterogeneous multi-core architecture transposable memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：