咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >MARS: Multimacro Architecture ... 收藏

MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks

作     者:Sie, Syuan-Hao Lee, Jye-Luen Chen, Yi-Ren Yeh, Zuo-Wei Li, Zhaofang Lu, Chih-Cheng Hsieh, Chih-Cheng Chang, Meng-Fan Tang, Kea-Tiong 

作者机构:Natl Tsing Hua Univ Hsinchu 30013 Taiwan Ind Technol Res Inst Informat & Commun Labs Chutung 31030 Taiwan 

出 版 物:《IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS》 (IEEE Trans Comput Aided Des Integr Circuits Syst)

年 卷 期:2022年第41卷第5期

页      面:1550-1562页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:Ministry of Science and Technology  Taiwan [MOST 109-2218-E-007-019  MOST 109-2262-8-007-022] 

主  题:Random access memory Hardware Computer architecture Quantization (signal) Training Common Information Model (computing) Software Compression algorithm computing-in-memory (CIM) deep learning quantization 

摘      要:Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computational cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed on CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computational costs, model compression is a widely studied method to shrink the model size. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros. In this study, a software and hardware co-design approach is proposed to design MARS, a SRAM-based CIM (SRAM CIM)-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparse CNN, and an SRAM CIM-aware model compression algorithm that considers a CIM architecture to reduce the number of network parameters. With the proposed hardware software co-designed method, MARS can reach over 700 and 400 FPS for CIFAR-10 and CIFAR-100, respectively. In addition, MARS achieves 52.3 and 88.2 TOPs/W in VGG16 and ResNet18, respectively.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分