Condition monitoring and fault diagnosis have been critical for the optimal scheduling of machines, improving the system reliability and the reducing maintenance cost. In recent years, various of methods based on the ...
详细信息
Condition monitoring and fault diagnosis have been critical for the optimal scheduling of machines, improving the system reliability and the reducing maintenance cost. In recent years, various of methods based on the deep learning method have made the great progress in the field of the mechanical fault diagnosis. However, there is a conflict between the massive parameters of the fault diagnosis networks and the limited computing resource of the embedded platforms. It is difficult to deploy the trained network on the small scale embedded platforms (like field programmable gate array (FPGA)) in the actual industrial situations. This seriously hinders the practical process of the intelligent fault diagnosis method. To address this problem, a new neural network compression method based on knowledge-distillation (K-D) and parameter quantization is proposed in this paper. In the proposed method, a large scale deep neural network with multiple convolutional layers and fully-connected layers is designed and trained as the teacher network. Then a small scale network with just one convolutional layer and one fully-connected layer is designed as the student network. When training the student network, the K-D process is conducted to improve the accuracy of the student network. After the training process, the parameter quantization is conducted to further compress the scale of the student network. Experimental results on the field programmable gate array (FPGA) are presented to demonstrate the effectiveness of the proposed method. The results show that the proposed method can greatly compress the scales of the fault diagnosis networks for over 10 times at the cost of the minimal loss of the accuracy.(c) 2022 Elsevier B.V. All rights reserved.
Most deep neural networks (DNNs) require complex models to achieve high performance. parameter quantization is widely used for reducing the implementation complexities. Previous studies on quantization were mostly bas...
详细信息
ISBN:
(纸本)9781479981311
Most deep neural networks (DNNs) require complex models to achieve high performance. parameter quantization is widely used for reducing the implementation complexities. Previous studies on quantization were mostly based on extensive simulation using training data on a specific model. We choose a different approach and attempt to measure the per- parameter capacity of DNN models and interpret the results to obtain insights on optimum quantization of parameters. This research uses artificially generated data and generic forms of fully connected DNNs, convolutional neural networks, and recurrent neural networks. We conduct memorization and classification tests to study the effects of the number and precision of the parameters on the performance. The model and the per- parameter capacities are assessed by measuring the mutual information between the input and the classified output. To get insight for parameter quantization when performing real tasks, the training and test performances are compared.
The most recent studies on deep learning based speech enhancement (SE) are focused on improving denoising performance. However, successful SE applications require striking a desirable balance between the denoising per...
详细信息
The most recent studies on deep learning based speech enhancement (SE) are focused on improving denoising performance. However, successful SE applications require striking a desirable balance between the denoising performance and computational cost in real scenarios. In this study, we propose a novel parameter pruning (PP) technique, which removes redundant channels in a neural network. In addition, parameter quantization (PQ) and feature-map quantization (FQ) techniques were also integrated to generate even more compact SE models. The experimental results show that the integration of PP, PQ, and FQ can produce a compacted SE model with a size of only 9.76 % compared to that of the original model, resulting in minor performance losses of 0.01 (from 0.85 to 0.84) and 0.03 (from 2.55 to 2.52) for STOI and PESQ scores, respectively. These promising results confirm that the PP, PQ, and FQ techniques can be used to effectively reduce the storage of an SE system on edge devices.
Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning *** this approach allows models to specialize in specific tasks w...
详细信息
Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning *** this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader ***-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational *** these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA *** these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains *** study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM *** investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do *** insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized ***,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller *** study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM
The precise identification of drone modulation schemes serves as a fundamental basis for achieving intelligent drone recognition. To address the limitations of existing algorithms that overly rely on single features, ...
详细信息
The precise identification of drone modulation schemes serves as a fundamental basis for achieving intelligent drone recognition. To address the limitations of existing algorithms that overly rely on single features, resulting in low recognition rates and excessive model complexity, this paper proposes a drone signal modulation recognition algorithm based on joint features. The algorithm begins by employing Maximum Likelihood Estimation (MLE) to compensate for phase noise, mitigating its adverse effects on subsequent modulation recognition. Next, it combines the signal's time-frequency representation with IQ data derived from higher-order cumulants as joint features, which are then input into a recognition network composed of 2D convolutional layers (Conv2D) and Long Short-Term Memory (LSTM) networks. Additionally, parameter dynamic fixed-point quantization is applied to optimize the weights and biases of the model, reducing resource consumption during practical deployment. Experimental results demonstrate that when the Signal-to-Noise Ratio (SNR) exceeds 2 dB, the proposed algorithm achieves a recognition accuracy of up to 90% for nine common drone modulation schemes, substantially outperforming comparable models. After quantization, the recognition performance remains nearly unaffected, while computational resource requirements are greatly reduced, making the algorithm highly suitable for deployment in resource-constrained environments.
Through truncating the weights and activations of a deep neural network, conventional binary quantization imposes limitations on the representation capability of the network parameters, which hence deteriorates the de...
详细信息
Through truncating the weights and activations of a deep neural network, conventional binary quantization imposes limitations on the representation capability of the network parameters, which hence deteriorates the detection performance of the network. In this paper, a joint-guided distillation binary neural network via dynamic channel-wise diversity enhancement for object detection (JDBNet) is proposed to mitigate the gap of quantization errors. Our JDBNet includes a dynamic channel-wise diversity scheme and real-valued joint-guided teacher assistance to enhance the representation capability of the binary neural network in the object detection tasks. In the dynamic diversity scheme, the learning channel-wise bias (LCB) layer supports adjusting the magnitude of the parameters in which the sensitivity of the model parameters to the arbitrary quantization method is reduced, thereby improving the diversity expression ability of the feature parameters. In the joint-guided strategy, the single-precision implicit knowledge from the guiding teacher in the multilevel layer is utilized to supervise and penalize the quantitative model, enhancing the fitting performance of parameters in the binary quantized model. Extensive experiments on the PASCAL VOC, MS COCO, and VisDrone-DET datasets demonstrate that our JDBNet outperforms the state-of-the-art binary object detection networks in terms of mean Average Precision.
Federated learning is a distributed machine learning technique that ensures user privacy and enables multiple clients to jointly train a shared global model without transmitting local data. However, the frequent excha...
详细信息
Federated learning is a distributed machine learning technique that ensures user privacy and enables multiple clients to jointly train a shared global model without transmitting local data. However, the frequent exchange of model parameters between numerous clients and the server results in heavy network delay and bandwidth limitation in federated learning. In view of that, we propose an efficient algorithm for federated learning using sparse ternary compression based on layer variation classification(LVC). First, use layer variation as a metric to assess the significance of each layer of the model parameters, and after client training, categorize the model parameters into different levels by the layer variation and sensitivity analysis. Then, during the upstream and downstream transmission of model parameters, we assign corresponding sparse and ternary quantization ratios for different levels to maximize compression efficiency while preserving crucial parameters. Finally, on the server side, a majority-layer aggregation strategy is adopted to further reduce the communication cost. Experimental results from image classification tasks conducted on MNIST and Fashion-MNIST datasets demonstrate that proposed LVC algorithm achieves high accuracy with minimal communication cost, thereby striking an optimal balance between communication efficiency and accuracy.
As neural networks become an increasingly popular technique in the field of aeromagnetic compensation, there is an increasing demand for hardware systems with more computing power. Compared with the linear regression ...
详细信息
As neural networks become an increasingly popular technique in the field of aeromagnetic compensation, there is an increasing demand for hardware systems with more computing power. Compared with the linear regression method, applying a neural network to the task of real-time compensation is difficult because of insufficient computing resources in the unmanned aerial vehicle (UAV) flight detection platform. To perform real-time compensation calculations with limited computing resources, we optimized back propagation neural network (OBPNN) through model compression and acceleration. In this study, we found that the most time-consuming part of network training is the iterative updating of the weights in the BPNN interference model. Using transfer learning, we replace the randomly initialized weights (RWs) with pretrained weights, thereby greatly reducing the number of iterations required. We also apply other model compression and acceleration algorithms. In a case study of our new technique, we implement the fast training of the OBPNN on a Raspberry Pi 4B system. This network processes approximately 316 samples per 0.1 s, which is fast enough to complete aeromagnetic compensation in real time.
This paper introduces a novel source coding method for voltage and current signals, called fundamental, harmonic and transient coding method (FHTCM), which is a generalization of the enhanced disturbance compression m...
详细信息
This paper introduces a novel source coding method for voltage and current signals, called fundamental, harmonic and transient coding method (FHTCM), which is a generalization of the enhanced disturbance compression method (EDCM). The proposed method makes use of notch filtering-warped discrete Fourier transform (NF-WDFT) technique for estimating the parameters (amplitude, frequency, and phase) of the fundamental and harmonic components acquired from power lines so that only the transient components are compressed with wavelet transform (WT) coding technique. For the WT-based compression of transient components, we formulate a minimum description length (MDL) criterion, taking into account the selection of wavelet bases in a dictionary, wavelet decomposition structure, and quantization. Computational simulations have verified that the proposed method outperforms the EDCM as well as the traditional WT-based compression techniques.
We present PocketSUMMIT, a small-footprint version of our SUMMIT continuous speech recognition system. With portable devices becoming smaller and more powerful, speech is increasingly becoming an important input modal...
详细信息
ISBN:
(纸本)9781605603162
We present PocketSUMMIT, a small-footprint version of our SUMMIT continuous speech recognition system. With portable devices becoming smaller and more powerful, speech is increasingly becoming an important input modality on these devices. PocketSUMMIT is implemented as a variable-rate continuous density hidden Markov model with diphone context-dependent models. We explore various Gaussian parameter quantization schemes and find 8:1 compression or more is achievable with little reduction in accuracy. We also show how the quantized parameters can be used for rapid table lookup. We explore first-pass language model pruning in a finite-state transducer (FST) framework, as well as FST and n-gram weight quantization and bit packing, to further reduce memory usage. PocketSUMMIT is currently able to run a moderate vocabulary conversational speech recognition system in real time in a few MB on current PDAs and smart phones.
暂无评论