Self-attention-based transformers have outperformed recurrent and convolutional neural networks (RNN/ CNNs) in many applications. Despite the effectiveness, calculating self-attention is prohibitively costly due to qu...
详细信息
Self-attention-based transformers have outperformed recurrent and convolutional neural networks (RNN/ CNNs) in many applications. Despite the effectiveness, calculating self-attention is prohibitively costly due to quadratic computation and memory requirements. To solve this challenge, this article proposes a hybrid analog-ReRAM and digital-SRAM in-memory computing accelerator (HARDSEA), a computing-in-memory (CIM) accelerator supporting self-attention in transformer applications. To trade off between energy efficiency and algorithm accuracy, HARDSEA features an algorithm-architecture-circuitcodesign. A product-quantization-based scheme dynamically facilitates self-attention sparsity by predicting lightweight token relevance. A hybrid in-memory computing architecture employs both high-efficiency analog ReRAM-CIM and high-precision digital SRAM-CIM to implement the proposed new scheme. The ReRAM-CIM, whose precision is sensitive to circuit nonidealities, takes charge of token relevance prediction where only computing monotonicity is demanded. The SRAM-CIM, utilized for exact sparse attention computing, is reorganized as an on-memory-boundary computing scheme, thus adapting to irregular sparsity patterns. In addition, we propose a time-domain winner-take-all (WTA) circuit to replace the expensive ADCs in ReRAM-CIM macros. Experimental results show that HARDSEA prunes BERT and GPT-2 models to 12%-33% sparsity without accuracy loss, achieving 13.5x13.5x- 28.5x28.5x speedup and 291.6x291.6x- 1894.3x1894.3x energy efficiency over GPU. compared to state-of-the-art transformer accelerators, HARDSEA has 1.2x1.2x- 14.9x14.9x better energy efficiency at the same level of throughput.
With the growing interest of edge computing in the Internet of Things (IoT), Deep Neural Network (DNN) hardware processors/accelerators face challenges of low energy consumption, low latency, and data privacy issues. ...
详细信息
With the growing interest of edge computing in the Internet of Things (IoT), Deep Neural Network (DNN) hardware processors/accelerators face challenges of low energy consumption, low latency, and data privacy issues. This paper proposes an energy-efficient processor design based on Deep Belief Network (DBN), which is one of the most suitable DNN models for on- chip learning. In this study, a thorough algorithm-architecture-circuitdesign optimization method is used for efficient design. The characteristics of data reuse and data sparsity in the DBN learning algorithm inspires this study to propose a heterogeneous multi-core architecture with local learning. In addition, novel circuits of transposable weight memory and sparse address generator are proposed to reduce weight memory access and exploit neuron state sparsity, respectively, for maximizing the energy efficiency. The DBN processor is implemented and thoroughly evaluated on Xilinx Zynq FPGA. Implementation results confirm that the proposed DBN processor has excellent energy efficiency of 45.0 pJ per neuron-weight update, which has been improved by 74% against the conventional design.
暂无评论