检索结果-内蒙古大学图书馆

Very-large-scale integration implementation of a convolutional neural network accelerator for abnormal heartbeat detection

引用

ELECTRONICS LETTERS 2020年第7期56卷 330-+页

作者： Chen, Y. -H. Juan, Y. Chang Gung Univ Dept Elect Engn Taoyuan Taiwan Chang Gung Mem Hosp LinKou Inst Radiol Res Dept Radiat Oncol Taoyuan Taiwan

In this study, a very-large-scale integration implementation of a convolutional neural network (CNN) inference for abnormal heartbeat detection was proposed. Four-lead electrocardiogram signals were used to detect abnormal heartbeat conditions, such as premature ventricular complex. 1D CNNs and fully connected layers were utilised in the proposed chip to achieve high-speed, small-area, and high-accuracy arrhythmia detection. The proposed chip was implemented using a 90-nm complementary metal-oxide-semiconductor process and operated at 125 MHz with a 0.67mm(2) core area. The power consumption was 4.18mW at high-speed operation frequency (125 MHz) and 3.79 mu W at 10 kHz for low-power applications. The detection accuracy was 95.14% based on the MIT-BIH arrhythmia database. Consequently, the properties of high speed, low power, small area, and high accuracy were established in the proposed accelerator chip.

关键词： medical signal processing convolutional neural network inference fully connected layers high-speed operation frequency medical signal detection detection accuracy electrocardiography four-lead electrocardiogram signals high-accuracy arrhythmia detection neural nets abnormal heartbeat detection complementary metal-oxide-semiconductor process premature ventricular complex convolutional neural network accelerator very-large-scale integration implementation accelerator chip frequency 125.0 MHz

来源：评论

学校读者我要写书评

暂无评论

EWS: An Energy-Efficient CNN accelerator With Enhanced Weight Stationary Dataflow

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS 2024年第7期71卷 3478-3482页

作者： Wang, Chengxuan Wang, Zongsheng Li, Shuaiting Zhang, Yuanming Shen, Haibin Huang, Kejie Zhejiang Univ Coll Informat Sci & Elect Engn Hangzhou 310027 Peoples R China

In the realm of convolutional neural networks (CNNs), convolution operations exhibit a multiple degree of data reuse. However, the current Weight Stationary (WS) can't adequately exploit the potential local data reuse. In addition, the local data reuse of the Systolic Array (SA) implementing WS is limited by array size. To address this issue, we propose a novel dataflow called Enhanced Weight Stationary (EWS) for SA-based CNN accelerators with our customized architecture to enhance data reuse. Our approach focuses on expanding the flexibility of weight mapping on Processing Elements (PEs) through the utilization of Weight Register Files (WRF) in PE arrays. Additionally, by incorporating Activation Register Files (ARF) and Partial Sum Register Files (PRF), our accelerator enables the convolutional reuse of input feature maps (ifmaps) and facilitates the reuse of partial sums (psums) during channel-wise accumulation, which can effectively reduce access to on-chip SRAM. Experimental results demonstrate the effectiveness of our CNN accelerator employing the EWS by achieving 1.22- $1.72\times $ throughput and 1.35- $2.48\times $ energy efficiency over WS dataflow as the size of array varies from 16 to 64.

关键词： Integrated circuit interconnections Arrays Kernel Systolic arrays convolutional neural networks Registers Random access memory convolutional neural network accelerator weight stationary dataflow dedicated dataflow design data reuse systolic array

来源：评论

学校读者我要写书评

暂无评论

An Energy-Efficient Differential Frame convolutional accelerator with on-Chip Fusion Storage Architecture and Pixel-Level Pipeline Data Flow

An Energy-Efficient Differential Frame Convolutional Acceler...

引用

IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Dai, Zhenhui Wang, Jiawei Zhong, Yi Feng, Kunyu Zhao, Cheng Jiang, Yuanyuan Chen, Peiyu Wang, Yuan Yu, Dunshan Cui, Xiaoxin Peking Univ Sch Integrated Circuits Beijing 100871 Peoples R China Peking Univ Sch Software & Microelect Beijing 102600 Peoples R China Peking Univ MPW Ctr Key Lab Microelect Devices & Circuits MoE Beijing 100871 Peoples R China Beijing Adv Innovat Ctr Integrated Circuits Beijing 100871 Peoples R China

ISBN: (纸本)9798350330991;9798350331004

convolutional neural networks require a huge amount of computation in video applications. For some specific tasks, such as surveillance, differential frame convolution reuses inter-frame data and significantly reduces multiplication and accumulation. However, there are still some challenges in improving energy efficiency of differential frame convolution on chips. Firstly, differential frame convolution brings additional on-chip storage for reusing inter-frame data. Secondly, in post-processing of differential frame convolution, there are more memory accessing and arithmetic logic operations. Therefore, sparse working mode is of vital importance for the post-processing. In response to these challenges, this work proposes an on-chip fusion storage architecture for energy-efficient differential frame convolution and a pixel-level pipeline data flow that supports the sparsity of features. The simulation of our accelerator implemented in 28nm CMOS can achieve energy efficiency by 3.09x compared with other state-of-the-art works

关键词： convolutional neural network accelerator differential frame convolution inter-frame data reuse on-chip storage architecture

来源：评论

学校读者我要写书评

暂无评论

DSLR-CNN: Efficient CNN Acceleration Using Digit-Serial Left-to-Right Arithmetic

引用

IEEE ACCESS 2024年 12卷 174608-174622页

作者： Nisar, Malik Zohaib Ibrahim, Muhammad Sohail Gorgin, Saeid Usman, Muhammad Lee, Jeong-A Chosun Univ Coll IT Convergence Dept Comp Engn Gwangju 61452 South Korea Iranian Res Org Sci & Technol IROST Dept Elect Engn & Informat Technol Tehran 33535111 Iran Univ Regensburg Fac Informat & Data Sci D-93053 Regensburg Germany

Digit-serial arithmetic has emerged as a viable approach for designing hardware accelerators, reducing interconnections, area utilization, and power consumption. However, conventional methods suffer from performance and latency issues. To address these challenges, we propose an accelerator design using left-to-right (LR) arithmetic, which performs computations in a most-significant digit first (MSDF) manner, enabling digit-level pipelining. This leads to substantial performance improvements and reduced latency. The processing engine is designed for convolutional neural networks (CNNs), which includes low-latency LR multipliers and adders for digit-level parallelism. The proposed DSLR-CNN is implemented in Verilog and synthesized with Synopsys design compiler using GSCL 45nm technology, the DSLR-CNN accelerator was evaluated on AlexNet, VGG-16, and ResNet-18 networks. Results show significant improvements across key performance evaluation metrics, including response time, peak performance, power consumption, operational intensity, area efficiency, and energy efficiency. The peak performance measured in GOPS of the proposed design is 4.37x to 569.11x higher than contemporary designs, and it achieved 3.58x to 44.75x higher peak energy efficiency (TOPS/W), outperforming conventional bit-serial designs.

关键词： Arithmetic Energy efficiency convolutional neural networks Delays Throughput Power demand Low latency communication Filters Convolution Computational efficiency convolutional neural network accelerator left-to-right arithmetic digit-serial most-significant digit first

来源：评论

学校读者我要写书评

暂无评论

Energy-Efficient In-SRAM Accumulation for CMOS-based CNN accelerators 22

Energy-Efficient In-SRAM Accumulation for CMOS-based CNN Acc...

引用

32nd Great Lakes Symposium on VLSI (GLSVLSI)

作者： Li, Wanqian Han, Yinhe Chen, Xiaoming Chinese Acad Sci Inst Comp Technol Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Beijing Acad Artificial Intelligence Beijing Peoples R China

ISBN: (纸本)9781450393225

State-of-the-art convolutional neural network (CNN) accelerators are typically communication-dominate architectures. To reduce the energy consumption of data accesses and also to maintain the high performance, researches have adopted large amounts of on-chip register resources and proposed various methods to concentrate communication on on-chip register accesses. As a result, the on-chip register accesses become the energy bottleneck. To further reduce the energy consumption, in this work we propose an in-SRAM accumulation architecture to replace the conventional register files and digital accumulators in the processing elements of CNN accelerators. Compared with the existing in-SRAM computing approaches (which may not be targeted at CNN accelerators), the presented in-SRAM computing architecture not only realizes in-memory accumulation, but also solves the structure contention problem which occurs frequently when embedding in-memory architectures into CNN accelerators. HSPICE simulation results based on the 45nm technology demonstrate that with the proposed in-SRAM accumulator, the overall energy efficiency of a state-of-the-art communication-optimal CNN accelerator is increased by 29% on average.

关键词： In-SRAM computing convolutional neural network accelerator energy consumption

来源：评论

学校读者我要写书评

暂无评论

Fusion for Tile-based Deconvolution Layers 18

Fusion for Tile-based Deconvolution Layers

引用

18th International SoC Design Conference (ISOCC)

作者： Jeong, Min-Wu Rhee, Chae Eun Inha Univ Dept Elect & Comp Engn Incheon South Korea

ISBN: (纸本)9781665401746

Recently, various deep learning accelerators are being studied through data flow structure improvement and memory access optimization. Among them, the encoder-decoder model is widely used in object detection and semantic segmentation showing good performance. However, due to the deconvolution operation that outputs a high-resolution feature map from the decoder, the memory access and computational complexity are higher than that of the existing encoder-only structure. Thus, it is a big obstacle to the implementation of encoder-decoder accelerators. Most of the previous studies have focused only on the encoder part. This paper attempts to apply the fusion approach, which was effective for the convolution layer of the encoder, to the deconvolution of the decoder and shows the possibility of reducing the processing time and hardware complexity.

关键词： Fused Layer Cross Layer Sparsity convolutional neural network accelerator Tile-Based Deconvolution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：