This paper presents a multiply-accumulate (MAC) unit that enables a dual-mode truncation error compensation (TEC) scheme based on a fixed-width Booth multiplier (FWBM) for convolutional neural network (CNN) inference ...
详细信息
This paper presents a multiply-accumulate (MAC) unit that enables a dual-mode truncation error compensation (TEC) scheme based on a fixed-width Booth multiplier (FWBM) for convolutional neural network (CNN) inference operations. The proposed tailored TEC schemes of Modes 1 and2 can achieve high MAC accuracy for a general or rectified linear unit-based CNN model with general (Mode 1) or positive/zero (Mode 2) input patterns. By pre-calculating the pre-known CNN model coefficients, the proposeddual-mode TEC scheme can be realized using minimal partial product operations with high hardware efficiency using a softwarefihardware codesign approach. Further, a reconfigurable architecture of the resultant MAC unit is presented to realize the proposeddual-mode TEC scheme. By evaluating the accuracy for 9-N and25-N MAC operations (N denotes the number of times MAC is performed), a MAC operation using the proposed TEC scheme can achieve the highest accuracy for Modes 1 and2, relative to contrast samples that directly employ the FWBM with a conventional TEC function. The hardware performances of 9-N and25-N MAC units are also evaluated using the TSMC 40-nm standard cell library. Compared with the contrast TEC-enableddesigns, the proposed MAC unit exhibits higher hardware efficiency in terms of area, delay, and power consumption and achieves a minimum reduction of more than 40% in both area-delay-error and power-delay-error products. Moreover, the resultant 9-N and25-N MAC units are verified using a system-on-chip field-programmable gate array platform to test a CNN model for handwritten digit classification.
The objective of this paper is to design, model, simulate and synthesis of Spatial Filtering Techniques, in which the operation is performed within neighborhood of a pixel. Recent increases in Filed Programmable Gate ...
详细信息
The objective of this paper is to design, model, simulate and synthesis of Spatial Filtering Techniques, in which the operation is performed within neighborhood of a pixel. Recent increases in Filed Programmable Gate Array (FPGA) performance and size offer a new hardware acceleration opportunity. The convolution filtering operations are implemented using Xilinx System Generator (XSG) which is the industry's leading high-level tool for designing high-performance dSP systems using FPGAs. The designs are modeled using XSG Block set and synthesized onto Virtex 6 xc6vs315t-3ff156 FPGA device. The algorithms are validated using hardware co-simulation method.
We introduce a new method to qualify the goodness of fit parameter estimation of compound Wishart models. Our method is based on the free deterministic equivalent Z-score, which we introduce in this paper. Furthermore...
详细信息
We introduce a new method to qualify the goodness of fit parameter estimation of compound Wishart models. Our method is based on the free deterministic equivalent Z-score, which we introduce in this paper. Furthermore, an application to two-dimensional autoregressive moving-average model is provided. Our proposed method is a generalization of statistical hypothesis testing to one-dimensional moving average model based on fluctuations of real compound Wishart matrices by Hasegawa et al. (A. Hasegawa, N. Sakuma and H. Yoshida, Fluctuations of Marchenko-Pastur limit of random matrices with dependent entries, Statist. Probab. Lett. 127 (2017) 85-96].
Modern mobile neural networks with a reduced number of weights and parameters do a good job with image classification tasks, but even they may be too complex to be implemented in an FPGA for video processing tasks. Th...
详细信息
ISBN:
(纸本)9781728103396
Modern mobile neural networks with a reduced number of weights and parameters do a good job with image classification tasks, but even they may be too complex to be implemented in an FPGA for video processing tasks. The article proposes neural network architecture for the practical task of recognizing images from a camera, which has several advantages in terms of speed. This is achieved by reducing the number of weights, moving from a floating-point to a fixed-point arithmetic, anddue to a number of hardware-level optimizations associated with storing weights in blocks, a shift register, and an adjustable number of convolutional blocks that work in parallel. The article also proposed methods for adapting the existing data set for solving a different task. As the experiments showed, the proposed neural network copes well with real-time video processing even on the cheap FPGAs.
暂无评论