咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >TrIM, Triangular Input Movemen... 收藏

TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Architecture and Hardware Implementation

作     者:Sestito, Cristian Agwa, Shady Prodromakis, Themis 

作者机构:Univ Edinburgh Inst Integrated Micro & Nano Syst Ctr Elect Frontiers Sch Engn Edinburgh EH9 3BF Scotland 

出 版 物:《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS》 (IEEE Trans. Circuits Syst. Regul. Pap.)

年 卷 期:2025年第72卷第5期

页      面:2263-2273页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 

基  金:Engineering and Physical Sciences Research Council (EPSRC) Programme Grant "Functional Oxide Reconfigurable Technologies" (FORTE) [EP/R024642/2] RAEng Chair in Emerging Technologies [CiET1819/2/93] 

主  题:Artificial intelligence convolutional neural networks systolic arrays field programmable gate arrays memory accesses energy efficiency 

摘      要:Modern hardware architectures for Convolutional Neural Networks (CNNs), other than targeting high performance, aim at dissipating limited energy. Reducing the data movement cost between the computing cores and the memory is a way to mitigate the energy consumption. Systolic arrays are suitable architectures to achieve this objective: they use multiple processing elements that communicate each other to maximize data utilization, based on proper dataflows like the weight stationary and row stationary. Motivated by this, we have proposed TrIM, an innovative dataflow based on a triangular movement of inputs, and capable to reduce the number of memory accesses by one order of magnitude when compared to state-of-the-art systolic arrays. In this paper, we present a TrIM-based hardware architecture for CNNs. As a showcase, the accelerator is implemented onto a Field Programmable Gate Array (FPGA) to execute the VGG-16 and AlexNet CNNs. The architecture achieves a peak throughput of 453.6 Giga Operations per Second, outperforming a state-of-the-art row stationary systolic array up to similar to 3x in terms of memory accesses, and being up to similar to 11.9 x more energy-efficient than other FPGA accelerators.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分