文献详情 >Remarn: A Reconfigurable Multi... 收藏

Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks

作者：Que, Zhiqiang Nakahara, Hiroki Fan, Hongxiang Li, He Meng, Jiuxi Tsoi, Kuen Hung Niu, Xinyu Nurvitadhi, Eriko Luk, Wayne

作者机构：Imperial Coll London Exhibit Rd London SW7 2BX England Tokyo Inst Technol Ohokayama 1-21-2 Tokyo 1528550 Japan Univ Cambridge Cambridge CB2 1TN England Corerain Technol Ltd 14F Changfu Jinmao Bldg CFC Shenzhou Peoples R China Intel Corp Jones Farm Campus Hillsboro OR 97124 USA

出版物：《ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS》 (美国计算机学会可重构技术和系统汇刊)

年卷期：2023年第16卷第1期

页面：1-26页

核心收录：

学科分类：08[工学] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：United Kingdom EPSRC [EP/V028251/1, EP/L016796/1, EP/N031768/1, EP/P010040/1, EP/S030069/1] Intel Corerain

主　　题：Accelerator architecture recurrent neural networks multi-tenant execution

摘要：This work introduces Remarn, a reconfigurable multi-threaded multi-core accelerator supporting both spatial and temporal co-execution of Recurrent Neural Network (RNN) inferences. It increases processing capabilities and quality of service of cloud-based neural processing units (NPUs) by improving their hardware utilization and by reducing design latency, with two innovations. First, a custom coarse-grained multi-threaded RNN/Long Short-Term Memory (LSTM) hardware architecture, switching tasks among threads when RNN computational engines meet data hazards. Second, the partitioning of this hardware architecture into multiple full-fledged sub-accelerator cores, enabling spatially co-execution of multiple RNN/LSTM inferences. These innovations improve the exploitation of the available parallelism to increase runtime hardware utilization and boost design throughput. Evaluation results show that a dual-threaded quad-core Remarn NPU achieves 2.91 times higher performance while only occupying 5.0% more area than a single-threaded one on a Stratix 10 FPGA. When compared with a Tesla V100 GPU implementation, our design achieves 6.5 times better performance and 15.6 times higher power efficiency, showing that our approach contributes to high performance and energy-efficient FPGA-based multi-RNN inference designs for datacenters.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：