检索结果-内蒙古大学图书馆

Zero-Centered fixed-point quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

引用

IEEE ACCESS 2021年 9卷 20828-20839页

作者： Kim, Sungrae Kim, Hyun Seoul Natl Univ Sci & Technol Dept Elect & Informat Engn Seoul 01811 South Korea Seoul Natl Univ Sci & Technol Res Ctr Elect & Informat Technol Seoul 01811 South Korea

In the field of object detection, deep learning has greatly improved accuracy compared to previous algorithms and has been used widely in recent years. However, object detection using deep learning requires many hardware (HW) resources due to the huge computations for high performance, making it very difficult to run real-time on embedded platforms. Therefore, various compression methods have been studied to solve this problem. In particular, quantization methods greatly reduce the computational burden of deep learning by reducing the number of bits used for weights and activation functions in deep learning. However, most of these existing studies targeted only object classification and cannot be applied to object detection. Furthermore, most of the existing quantization studies are based on floating-point operations, which requires additional effort when implementing HW accelerators. This paper proposes an HW-friendly fixed-point-based quantization method that can also be applied to object detection. In the proposed method, the center of the weight distribution is adjusted to zero by subtracting the mean of weight parameters before quantization, and the retraining process is iteratively applied to minimize the accuracy drop caused by quantization. Furthermore, while applying the proposed method to object detection, performance degradation is minimized by considering the minimum and maximum values of weight parameters of deep learning networks. When applying the proposed quantization method to representative one-stage object detectors, You Only Look Once v3 and v4 (YOLOv3 and YOLOv4), detection accuracy similar to the original networks (i.e., YOLOv3 and YOLOv4) with a single-precision floating-point format (32-bit) is maintained despite expressing weights with only about 20% of the bits compared to a single-precision floating-point format in COCO dataset.

关键词： quantization (signal) Object detection Detectors Degradation Deep learning Computational complexity Power demand Convolutional neural network deep neural network fixed-point quantization network compression object detector YOLOv3 YOLOv4

来源：评论

学校读者我要写书评

暂无评论

A Secure and Effective Energy-Aware fixed-point quantization Scheme for Asynchronous Federated Learning

引用

Computers, Materials & Continua 2023年第5期75卷 2939-2955页

作者： Zerui Zhen Zihao Wu Lei Feng Wenjing Li Feng Qi Shixuan Guo Beijing University of Posts and Telecommunication Beijing100876China Vanderbilt University Nashville TN37240USA

Asynchronous federated learning(AsynFL)can effectivelymitigate the impact of heterogeneity of edge nodes on joint training while satisfying participant user privacy protection and data ***,the frequent exchange of massive data can lead to excess communication overhead between edge and central nodes regardless of whether the federated learning(FL)algorithm uses synchronous or asynchronous ***,there is an urgent need for a method that can simultaneously take into account device heterogeneity and edge node energy consumption *** paper proposes a novel fixed-point Asynchronous Federated Learning(fixedAsynFL)algorithm,which could mitigate the resource consumption caused by frequent data communication while alleviating the effect of device *** uses fixed-point quantization to compress the local and global models in *** order to balance energy consumption and learning accuracy,this paper proposed a quantization scale selection *** paper examines the mathematical relationship between the quantization scale and energy consumption of the computation/communication process in the *** on considering the upper bound of quantization noise,this paper optimizes the quantization scale by minimizing communication and computation *** paper performs pertinent experiments on the MNIST dataset with several edge nodes of different computing *** results show that the fixedAsynFL algorithm with an 8-bit quantization can significantly reduce the communication data size by 81.3%and save the computation energy in the training phase by 74.9%without significant loss of *** to the experimental results,we can see that the proposed AsynfixedFL algorithm can effectively solve the problem of device heterogeneity and energy consumption limitation of edge nodes.

关键词： Asynchronous federated learning artificial intelligence model compression energy consumption fixed-point quantization learning accuracy

来源：评论

学校读者我要写书评

暂无评论

SYMOG: Learning symmetric mixture of Gaussian modes for improved fixed-point quantization

引用

NEUROCOMPUTING 2020年 416卷 310-315页

作者： Enderich, Lukas Timm, Fabian Burgard, Wolfram Robert Bosch GmbH Chassis Syst Control Engn Cognit Syst Gerlingen Germany Robert Bosch GmbH Corp Res Vehicle Safety & Autom Gerlingen Germany Univ Freiburg Autonomous Intelligent Syst Freiburg Germany

Deep neural networks (DNNs) have been proven to outperform classical methods on several machine learning benchmarks. However, they have high computational complexity and require powerful processing units. Especially when deployed on embedded systems, model size and inference time must be significantly reduced. We propose SYMOG (symmetric mixture of Gaussian modes), which significantly decreases the complexity of DNNs through low-bit fixed-point quantization. SYMOG is a novel soft quantization method such that the learning task and the quantization are solved simultaneously. During training the weight distribution changes from an unimodal Gaussian distribution to a symmetric mixture of Gaussians, where each mean value belongs to a particular fixed-point mode. We evaluate our approach with different architectures (LeNet5, VGG7, VGG11, DenseNet) on common benchmark data sets (MNIST, CIFAR-10, CIFAR-100) and we compare with state-of-the-art quantization approaches. We achieve excellent results and outperform 2-bit state-of-the-art performance with an error rate of only 5.71% on CIFAR-10 and 27.65% on CIFAR-100. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Deep neural network Model reduction fixed-point quantization Gradient descent Weight clipping

来源：评论

学校读者我要写书评

暂无评论

Throughput-Optimized Frequency Domain CNN with fixed-point quantization on FPGA

Throughput-Optimized Frequency Domain CNN with Fixed-Point Q...

引用

International Conference on Reconfigurable Computing and FPGAs (ReConFig)

作者： Sun, Weiyi Zeng, Hanqing Yang, Yi-hua Edward Prasanna, Viktor Tsinghua Univ Dept Microelect & Nanoelect Beijing Peoples R China Univ Southern Calif Ming Hsieh Dept Elect Engn Los Angeles CA 90089 USA Alibaba Grp Machine Intelligence Technol Hangzhou Zhejiang Peoples R China

ISBN: (纸本)9781728119687

State-of-the-art hardware accelerators for large scale CNNs face two challenges: high computation complexity of convolution, and high on-chip memory consumption by weight kernels. Two techniques have been proposed in the literature to address these challenges: frequency domain convolution and space domain fixed-point quantization. In this paper, we propose frequency domain quantization schemes to achieve high throughput CNN inference on FPGAs. We first analyze the impact of quantization bit width on the accuracy of a frequency domain CNN, via the metric of Signal-to-quantization-Noise-Ratio (SQNR). Taking advantage of the reconfigurability of FPGAs, we design a statically-reconfigurable and a dynamically-reconfigurable architecture for the quantized convolutional layers. Then, based on the SQNR analysis, we propose quantization schemes for both types of architectures, achieving optimal tradeoff between throughput and accuracy. The proposed quantizer allocates the number of bits for each convolutional layer under various design constraints, including overall SQNR, available DSP resources, on-chip memory and off-chip bandwidth. Experiments on AlexNet show that our designs improve the CNN inference throughput by 1.45x to 8.44x, with negligible (< 0.5%) loss in accuracy.

关键词： Convolutional Neural Networks FPGA fixed-point quantization Frequency Domain Convolution

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Network quantization via fixed-point Factorization

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021年第6期32卷 2706-2720页

作者： Wang, Peisong He, Xiangyu Chen, Qiang Cheng, Anda Liu, Qingshan Cheng, Jian Chinese Acad Sci Natl Lab Pattern Recognit Inst Automat Beijing 100049 Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing 100049 Peoples R China Nanjing Univ Informat Sci & Technol B Data Lab Nanjing 210044 Peoples R China

The deep neural network (DNN) has achieved remarkable performance in a wide range of applications at the cost of huge memory and computational complexity. fixed-point network quantization emerges as a popular acceleration and compression method but still suffers from huge performance degradation when extremely low-bit quantization is utilized. Moreover, current fixed-point quantization methods rely heavily on supervised retraining using large amounts of the labeled training data, while the labeled data are hard to obtain in the real-world applications. In this article, we propose an efficient framework, namely, fixed-point factorized network (FFN), to turn all weights into ternary values, i.e., (-1, 0, 1). We highlight that the proposed FFN framework can achieve negligible degradation even without any supervised retraining on the labeled data. Note that the activations can be easily quantized into an 8-bit format;thus, the resulting networks only have low-bit fixed-point additions that are significantly more efficient than 32-bit floating-point multiply-accumulate operations (MACs). Extensive experiments on large-scale ImageNet classification and object detection on MS COCO show that the proposed FFN can achieve about more than 20x compression and remove most of the multiply operations with comparable accuracy. Codes are available on GitHub at https://***/wps712/FFN.

关键词： Acceleration compression deep neural networks (DNNs) fixed-point quantization unsupervised quantization

来源：评论

学校读者我要写书评

暂无评论

Supervised Contrastive Learning Framework and Hardware Implementation of Learned ResNet for Real-Time Respiratory Sound Classification

引用

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2025年第1期19卷 185-195页

作者： Hu, Jinhai Leow, Cong Sheng Tao, Shuailin Goh, Wang Ling Gao, Yuan Nanyang Technol Univ Sch Elect & Elect Engn Singapore 639798 Singapore ASTAR Inst Microelect IME Singapore 138634 Singapore Univ Michigan Dept Elect & Comp Engn Ann Arbor MI 48109 USA

This paper presents a supervised contrastive learning (SCL) framework for respiratory sound classification and the hardware implementation of learned ResNet on field programmable gate array (FPGA) for real-time monitoring. At the algorithmic level, multiple techniques such as features augmentation and MixUp are combined holistically to mitigate the impact of data scarcity and imbalanced classes in the training dataset. Bayesian optimization further enhances the classification accuracy through parameter tuning in pre-processing and SCL. The proposed framework achieves 0.8725 total score (including runtime score) on a ResNet-18 model in both event and record multi-class classification tasks using the SJTU Paediatric Respiratory Sound Database (SPRSound). In addition, algorithm-hardware co-optimizations including quantization-Aware Training (QAT), merge of network layers, optimization of memory size and number of parallel threads are performed for hardware implementation on FPGA. This approach reduces 40% model size and 70% computation latency. The learned ResNet is implemented on a Xilinx Zynq ZCU102 FPGA with 16ms latency and less than 2% inference score degradation compared to the software model.

关键词： Training Hardware Task analysis Mel frequency cepstral coefficient Field programmable gate arrays Data models Classification algorithms Respiratory sound classification balanced sampler supervised contrastive learning MixUp finetuning Bayesian optimization fixed-point quantization FPGA

来源：评论

学校读者我要写书评

暂无评论

MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks

引用

APPLIED INTELLIGENCE 2021年第7期51卷 4561-4574页

作者： Huang, Chenglong Liu, Puguang Fang, Liang Natl Univ Def Technol Coll Comp Sci & Technol Changsha 410073 Peoples R China

quantization, which involves bit-width reduction, is considered as one of the most effective approaches to rapidly and energy-efficiently deploy deep convolutional neural networks (DCNNs) on resource-constrained embedded hardware. However, bit-width reduction on the weights and activations of DCNNs seriously degrades accuracy. To solve this problem, in this paper we propose a mixed hardware-friendly quantization (MXQN) method that applies fixed-point quantization and logarithmic quantization for DCNNs without the necessity to retrain and fine-tune the DCNN. Our MXQN algorithm is a multi-staged process where, first, we employ a signal-to-quantization-noise ratio (SQNR) process as the metric to estimate the interplay between the parameter quantization errors of each layer and the overall model prediction accuracy. Then, we utilize a fixed-point quantization process to quantize weights, and depending on the SQNR metric we empirically select either a logarithmic or a fixed-point quantization process to quantize activations. For improved accuracy, we propose an optimized logarithmic quantization scheme that affords a fine-grained step size. We evaluate the performance of MXQN utilizing the VGG16 network on the MNIST, CIFAR-10, CIFAR-100, and the ImageNet datasets, as well as VGG19 and ResNet (ResNet18, ResNet34, ResNet50) networks on the ImageNet, and demonstrate that the MXQN-quantized DCNN despite not being retrained and fine-tuned, it still achieves high accuracy close to the original DCNN.

关键词： Mixed quantization Logarithmic quantization fixed-point quantization Signal-to-quantization-noise ratio (SQNR) No retraining Resource-constrained

来源：评论

学校读者我要写书评

暂无评论

Compression of Deep Neural Networks with Structured Sparse Ternary Coding

引用

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 2019年第9期91卷 1009-1019页

作者： Boo, Yoonho Sung, Wonyong Seoul Natl Univ Neural Proc Res Ctr Sch Elect Engn Seoul 151744 South Korea

Deep neural networks (DNNs) contain large number of weights, and usually require many off-chip memory accesses for inference. Weight size compression is a major requirement for on-chip memory based implementation of DNNs, which not only increases inference speed but also reduces power consumption. We propose a weight compression method for deep neural networks by combining pruning and quantization. The proposed method allows weights to have values of + 1 or - 1 only at predetermined positions. Then, a look-up table stores all possible combinations of sub-vectors of weight matrices. Encoding and decoding structured sparse weights can be conducted easily with the table. This method not only allows multiplication-free DNN implementations but also compresses the weight storage by as much as x32 times more than that in floating-point networks and with only a tiny performance loss. Weight distribution normalization and gradual pruning techniques are applied to lower performance degradation. Experiments are conducted with fully connected DNNs and convolutional neural networks.

关键词： Deep neural networks Weight compression Structured sparsity fixed-point quantization Pruning

来源：评论

学校读者我要写书评

暂无评论

APPQ-CNN: An Adaptive CNNs Inference Accelerator for Synergistically Exploiting Pruning and quantization Based on FPGA

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING

引用

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING 2024年第6期9卷 874-888页

作者： Zhang, Xian Xiao, Guoqing Duan, Mingxing Chen, Yuedan Li, Kenli Huaihua Univ Sch Comp & Artificial Intelligence Huaihua 418000 Peoples R China Hunan Univ Coll Comp Sci & Elect Engn Changsha 410082 Peoples R China Hunan Univ Shenzhen Inst Shenzhen 518063 Peoples R China

Convolutional neural networks (CNNs) are widely utilized in intelligent edge computing applications such as computational vision and image processing. However, as the number of layers of the CNN model increases, the number of parameters and computations gets larger, making it increasingly challenging to accelerate in edge computing applications. To effectively adapt to the tradeoff between the speed and accuracy of CNNs inference for smart applications. This paper proposes an FPGA-based adaptive CNNs inference accelerator synergistically utilizing filter pruning, fixed-point parameter quantization, and multi-computing unit parallelism called APPQ-CNN. First, the article devises a hybrid pruning algorithm based on the L1-norm and APoZ to measure the filter impact degree and a configurable parameter quantization fixed-point computing architecture instead of floating-point architecture. Then, design a cascade of the CNN pipelined kernel architecture and configurable multiple computation units. Finally, conduct extensive performance exploration and comparison experiments on various real and synthetic datasets. With negligible accuracy loss, the speed performance of our accelerator APPQ-CNN compares with current state-of-the-art FPGA-based accelerators PipeCNN and OctCNN by 2.15x and 1.91x, respectively. Furthermore, APPQ-CNN provides settable fixed-point quantization bit-width parameters, filter pruning rate, and multiple computation unit counts to cope with practical application performance requirements in edge computing.

关键词： Convolutional neural networks Convolution Computational modeling Field programmable gate arrays Kernel Adaptation models quantization (signal) fixed-point quantization FPGA inference accelerator pipeline pruning algorithm

来源：评论

学校读者我要写书评

暂无评论

SkeletonGCN: A Simple Yet Effective Accelerator For GCN Training 32

SkeletonGCN: A Simple Yet Effective Accelerator For GCN Trai...

引用

32nd International Conference on Field-Programmable Logic and Applications (FPL)

作者： Wu, Chen Tao, Zhuofu Wang, Kun He, Lei Univ Calif Los Angeles Elect & Comp Engn Los Angeles CA 90024 USA

ISBN: (纸本)9781665473903

Graph Convolutional Networks (GCNs) have shown great results but come with large computation costs and memory overhead. Recently, sampling-based approaches have been proposed to alter input sizes, which allows large GCN workloads to align to hardware constraints. Motivated by this flexibility, we propose an FPGA-based GCN accelerator, named SkeletonGCN, along with multiple software-hardware co-optimizations to improve training efficiency. We first quantize all feature and adjacency matrices of GCN from FP32 to SINT16. We then simplify the non-linear operations to better fit the FPGA computation, and identify reusable intermediate results to eliminate redundant computation. Moreover, we employ a linear time sparse matrix compression algorithm to further reduce memory bandwidth while allowing efficient decompression on hardware. Finally, we propose a unified hardware architecture to process sparse-dense matrix multiplication (SpMM) and dense matrix multiplication (MM), all on the same group of PEs to increase DSP utilization on FPGA. Evaluation is performed on a Xilinx Alveo U200 board. Compared with existing FPGA-based accelerator on the same network architecture, SkeletonGCN can achieve up to 11.3x speedup while maintaining the same training accuracy. In addition, SkeletonGCN can achieve up to 178x and 13.1x speedup over state-of-art CPU and GPU implementation on popular datasets, respectively.

关键词： GCN Training Accelerator fixed-point quantization Unified Architecture SpMM

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：