检索结果-内蒙古大学图书馆

5th International Conference on Frontiers Technology of Information and Computer, ICFTIC 2023

作者： Huang, Weizhe Harbin University of Science and Technology School of Measurement and Control Technology and Communication Engineering Harbin China

ISBN: (纸本)9798350309034

Target detection is widely applied in fields such as face recognition, autonomous driving, and industrial automation. However, when deploying target detection models based on convolutional neural networks on resource-constrained embedded platforms, issues like high computational complexity, long execution times, and significant on-chip memory usage arise. To address these challenges, this paper presents a target detection accelerator design based on Field Programmable Gate Array (FPGA). Firstly, the computational workload of convolutional neural networks is reduced by merging convolutional layers with batch normalization layers. Secondly, to mitigate on-chip storage space concerns, on-chip data fixed-point quantization is employed to decrease on-chip memory consumption. Subsequently, to tackle the issue of long execution times, a convolution calculation module based on the winograd algorithm is designed, which reduces computation latency. Finally, a dual-buffer pipeline architecture is devised to optimize memory bandwidth. Experimental results demonstrate that the proposed solution accelerates YOLOv4-tiny on the ZYNQ-7020, achieving a performance of 21.35 GOP/s. The energy efficiency is 32.56 times higher than CPU and 9.76 times higher than GPU. Additionally, the convolution calculation module in this paper reduces inference time by 15% compared to the sliding window convolution calculation method. © 2023 IEEE.

关键词： CNN FPGA target detection winograd algorithm

来源：评论

学校读者我要写书评

暂无评论

WinoNet:Reconfigurable look-up table-based winograd accelerator for arbitrary precision convolutional neural network inference

引用

Journal of Southeast University(English Edition) 2022年第4期38卷 332-339页

作者： Wang Chengcheng Li He Cao Yanpeng Song Changjun Yu Feng Tang Yongming School of Electronic Science and Engineering Southeast UniversityNanjing 210096China

To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is *** the help of the winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational *** LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth *** data toggle rate is reduced to optimize power *** experimental results show that the use of the winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource *** this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 *** LUT-based winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.

关键词： quantized neural networks look-up table(LUT)-based multiplier winograd algorithm arbitrary precision

来源：评论

学校读者我要写书评

暂无评论

Dimension fusion: Dimension-level dynamically composable accelerator for convolutional neural networks

引用

IEICE ELECTRONICS EXPRESS 2021年第24期18卷 20210491-20210491页

作者： Deng, Huipeng Wang, Jian Ye, Huafeng Xiao, Shanlin Meng, Xiangyu Yu, Zhiyi Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou 510006 Peoples R China Sun Yat Sen Univ Sch Microelect Sci & Technol Zhuhai 519082 Peoples R China

Convolutional neural networks (CNNs) have proven to be promising in various applications such as audio recognition, image classification, and video understanding. winograd algorithm helps to reduce the complexity of computation in a convolution but suffers from poor compatibility for different convolution shapes. This work introduces a dynamic dimension-level fusion architecture based on winograd for accelerating different dimensions of CNNs. We explore this winograd architecture by designing Dimension Fusion, a dimension-level processing engine that dynamically fuses to match the convolution shape of individual CNN layers. The proposed architecture is the first work based on winograd algorithm to be compatible with all convolution shapes (dimension, stride, and filter-size) and achieves highest PE efficiency up to 1.55x and energy efficiency up to 3.3x compared with the state-of-art accelerators.

关键词： dynamic fusion winograd algorithm accelerators convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

A Fast Discrete Fourier Transform algorithm Over Gf(2～8)

A Fast Discrete Fourier Transform Algorithm Over Gf(2～8)

引用

2015 International Conference on Computer Science and Environmental Engineering(CSEE 2015)

作者： T.C.Lin T.K.Truong Y.H.Chen C.D.Lee Department of Information Engineering I-Shou University Department of Computer Science and Engineering “National Sun Yat-Sen University”

In this paper,a fast Fourier-like transform over GF(28) is developed to compute the syndromes of the transmitted codewords and the roots of the error location *** new algorithm based on the conjugates of GF(28) with respect to winograd's algorithm and Goertzel-Blahut(GB) *** simplified transform decoder is over GF(28) in a program on a digital *** is expected that such a new type of transforms can be employed to syndrome evaluation for decoding of the Reed-Solomon(RS) code of block length 255.

关键词： conjugates computing root GB algorithm Fourier-like transform syndrome evaluation winograd algorithm

来源：评论

学校读者我要写书评

暂无评论

An efficient prime factor memory-based FFT processor for LTE systems

An efficient prime factor memory-based FFT processor for LTE...

引用

International Symposium on Circuits and Systems

作者： Kaifeng Xia Bin Wu Xiaoping Zhou Tao Xiong Institute of Microelectronics of Chinese Academy of Sciences Bei-jing China

ISBN: (纸本)9781479953424

Abstract: This paper presents an efficient memory-based fast Fourier transform processor including 35 different working sizes for LTE systems. A factorization method named high-radix-small-butterfly combined with a conflict-free address scheme for 2~p3~q5~r point memory-based FFT processor is proposed. The processor can not only provide conflict-free concurrent data access from different memory banks but also continuous-flow working mode. Moreover, we exploit prime factor algorithm to decrease the multiplications and twiddle factor storage. In addition, a unified winograd Fourier transform algorithm butterfly core was designed for the small 2, 3, 4, 5-point DFTs. The FFT processor was implemented in a SMIC 55nm CMOS process with core area 1.063mm~2. The chip consumes 40 8mW at 122.88MHz operating frequency with 1.08V voltage supply.

关键词： LTE conflict-free address scheme memory-based FFT processor prime factor algorithm winograd algorithm

来源：评论

学校读者我要写书评

暂无评论

FTConv: FPGA Acceleration for Transposed Convolution Layers in Deep Neural Networks 19

FTConv: FPGA Acceleration for Transposed Convolution Layers ...

引用

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

作者： Zhucheng Tang Guojie Luo Ming Jiang Peking University Beijing China

ISBN: (纸本)9781450361378

Transposed convolution, which is often used to scale up feature maps in various computer vision tasks, is a structural inverse process of convolution. Both convolution and transposed convolution, if any, account for the majority of computation in the inferences of deep neural networks. While convolution has been studied extensively, there are few investigations on accelerating transposed convolution. In this paper, we propose a fast algorithm, FTConv, to reduce the computation of transposed convolution using the winograd algorithm, which has also been used for convolution with small kernels. Specifically, a transposed convolution can be converted into multiple convolutions after dividing the kernel into several congruence classes. Thus, we can accelerate the multiple convolutions using a modified winograd algorithm. The transposed convolution can be obtained by interleaving output feature elements of each congruence class. We also design a winograd ALU in four pipeline stages to further accelerate the computation on FPGA. By carefully designing a sliding window for on-chip buffer reuse according to the memory access pattern of transposed convolution, we save the memory bandwidth by 88.2% compared with a straightforward method. We evaluate FTConv using FSRCNN-s, a neural network for super-resolution. The number of multiplications in the transposed convolution layer can be reduced by 69% over the direct computation of FSRCNN-s.

关键词： transposed convolution winograd algorithm fpga acceleration

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：