版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Imperial Coll London Exhibit Rd London SW7 2BX England Tokyo Inst Technol Ohokayama 1-21-2 Tokyo 1528550 Japan Univ Cambridge Cambridge CB2 1TN England Corerain Technol Ltd 14F Changfu Jinmao Bldg CFC Shenzhou Peoples R China Intel Corp Jones Farm Campus Hillsboro OR 97124 USA
出 版 物:《ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS》 (美国计算机学会可重构技术和系统汇刊)
年 卷 期:2023年第16卷第1期
页 面:1-26页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:United Kingdom EPSRC [EP/V028251/1, EP/L016796/1, EP/N031768/1, EP/P010040/1, EP/S030069/1] Intel Corerain
主 题:Accelerator architecture recurrent neural networks multi-tenant execution
摘 要:This work introduces Remarn, a reconfigurable multi-threaded multi-core accelerator supporting both spatial and temporal co-execution of Recurrent Neural Network (RNN) inferences. It increases processing capabilities and quality of service of cloud-based neural processing units (NPUs) by improving their hardware utilization and by reducing design latency, with two innovations. First, a custom coarse-grained multi-threaded RNN/Long Short-Term Memory (LSTM) hardware architecture, switching tasks among threads when RNN computational engines meet data hazards. Second, the partitioning of this hardware architecture into multiple full-fledged sub-accelerator cores, enabling spatially co-execution of multiple RNN/LSTM inferences. These innovations improve the exploitation of the available parallelism to increase runtime hardware utilization and boost design throughput. Evaluation results show that a dual-threaded quad-core Remarn NPU achieves 2.91 times higher performance while only occupying 5.0% more area than a single-threaded one on a Stratix 10 FPGA. When compared with a Tesla V100 GPU implementation, our design achieves 6.5 times better performance and 15.6 times higher power efficiency, showing that our approach contributes to high performance and energy-efficient FPGA-based multi-RNN inference designs for datacenters.