检索结果-内蒙古大学图书馆

A reconfigurable spatial architecture for Energy-Efficient Inception Neural Networks

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS 2023年第1期13卷 7-20页

作者： Luo, Lichuan Kang, Wang Liu, Junzhan Zhang, He Zhang, Youguang Liu, Dijun Ouyang, Peng Beihang Univ Sch Elect & Informat Engn Beijing 100191 Peoples R China Beihang Univ Fert Beijing Inst Sch Integrated Circuit Sci & Engn Beijing 100191 Peoples R China China Acad Telecommun Technol CATT State Key Lab Wireless Mobile Commun Beijing 100191 Peoples R China Tsingmicro Intelligent Technol Co Ltd Beijing 100192 Peoples R China

Convolutional neural networks (CNNs) have been widely utilized in modern artificial intelligent (AI) systems. In particular, GoogLeNet, one of the most popular CNNs, consisting of a number of inception layers and max-pooling layers, has been intensively studied for mobile and embedded scenarios. However, the energy efficiency of GoogLeNet in hardware is still limited as the huge data movement between the processor and the memory. Therefore, designing a dataflow and the corresponding hardware architecture to achieve parallel processing with minimal data movement is rather critical to achieve high energy efficiency and throughput. In this paper, we propose a novel column stationary (CS) dataflow that maximally exploits the local data reuse of both the filter weights and feature maps. Moreover, a reconfigurable spatial architecture was proposed to map multiple convolution kernels (with different types and dimensions) in parallel to the processing engines (PEs) array. In this case, multiple convolution kernels can share the same input feature maps (activations) in computing process. In our hardware design, we utilize three typical convolution kernels (i.e., 5 x 5, 3 x 3,1 x 1, corresponding to the inception layers of GoogLeNet) as an example to test the efficiency of our proposed dataflow and hardware architecture. The accelerator was implemented for one inception layer of the GoogLeNet in a 55-nm foundry's CMOS process. The test results show that our CS dataflow can reduce similar to 85% energy consumption for memory access and save area of 13% and power of 12% for computing. In summary, our CS dataflow is 1.2x to 2.5x more energy-efficient compared to state-of-the-art dataflows.

关键词： Convolution Hardware Computer architecture Radio frequency Random access memory Energy efficiency Costs Column stationary (CS) dataflow convolutional neural network GoogLeNet reconfigurable spatial architecture

来源：评论

学校读者我要写书评

暂无评论

CaSMap: Agile Mapper for reconfigurable spatial architectures by Automatically Clustering Intermediate Representations and Scattering Mapping Process 22

CaSMap: Agile Mapper for Reconfigurable Spatial Architecture...

引用

49th IEEE/ACM Annual International Symposium on Computer architecture (ISCA)

作者： Man, Xingchen Zhu, Jianfeng Song, Guihuan Yin, Shouyi Wei, Shaojun Liu, Leibo Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol BNRis Sch Integrated Circuits Beijing Peoples R China

ISBN: (纸本)9781450386104

Today, reconfigurable spatial architectures (RSAs) have sprung up as accelerators for compute- and data-intensive domains because they deliver energy and area efficiency close to ASICs and still retain sufficient programmability to keep the development cost low. The mapper, which is responsible for mapping algorithms onto RSAs, favors a systematic backtracking methodology because of high portability for evolving RSA designs. However, exponentially scaling compilation time has become the major obstacle. The key observation of this paper is that the key limiting factor to the systematic backtracking mappers is the waterfall mapping model which resolves all mapping variables and constraints at the same time using single-level intermediate representations (IRs). This work proposes CaSMap, an agile mapper framework independent of software and hardware of RSAs. By clustering the lowest-level software and hardware IRs into multi-level IRs, the original mapping process can be scattered as multi-stage decomposed ones and therefore the mapping problem with exponential complexity is mitigated. This paper introduces (a) strategies for clustering low-level hardware and software IRs with static connectivity and critical path analysis. (b) a multi-level scattered mapping model in which the higher-level model carries out the heuristics from IR clustering, endeavors to promote mapping success rate, and reduces the scale of the lower-level model. Our evaluation shows that CaSMap is able to reduce the problem scale (nonzeros) by 80.5% (23.1%-94.9%) and achieve a mapping time speedup of 83x over the state-of-the-art waterfall mapper across four different RSA topologies: MorphoSys, HReA, HyCUBE, and REVEL.

关键词： reconfigurable spatial architecture Coarse-Grained reconfigurable architecture Compiler Integer Linear Programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：