Exploring the expected quantizing scheme with suitable mixed-precision policy is the key point to compress deep neural networks (DNNs) in high efficiency and accuracy. This exploration implies heavy workloads for doma...
详细信息
Exploring the expected quantizing scheme with suitable mixed-precision policy is the key point to compress deep neural networks (DNNs) in high efficiency and accuracy. This exploration implies heavy workloads for domain experts, and an automatic compression method is needed. However, the huge search space of the automatic method introduces plenty of computing budgets that make the automatic process challenging to be applied in real scenarios. In this paper, we propose an end-to-end framework named AutoQNN, for automatically quantizing different layers utilizing different schemes and bitwidths without any human labor. AutoQNN can seek desirable quantizing schemes and mixed-precision policies for mainstream DNN models efficiently by involving three techniques: quantizing scheme search (QSS), quantizing precision learning (QPL), and quantized architecture generation (QAG). QSS introduces five quantizing schemes and defines three new schemes as a candidate set for scheme search, and then uses the differentiable neural architecture search (DNAS) algorithm to seek the layer- or model-desired scheme from the set. QPL is the first method to learn mixed-precision policies by reparameterizing the bitwidths of quantizing schemes, to the best of our knowledge. QPL optimizes both classification loss and precision loss of DNNs efficiently and obtains the relatively optimal mixed-precision model within limited model size and memory footprint. QAG is designed to convert arbitrary architectures into corresponding quantized ones without manual intervention, to facilitate end-to-end neural network quantization. We have implemented AutoQNN and integrated it into Keras. Extensive experiments demonstrate that AutoQNN can consistently outperform state-of-the-art quantization. For 2-bit weight and activation of AlexNet and ResNet18, AutoQNN can achieve the accuracy results of 59.75% and 68.86%, respectively, and obtain accuracy improvements by up to 1.65% and 1.74%, respectively, compared
Amidst global warming and escalating extreme weather events, indoor environmental quality’s impact on human health and public hygiene gains prominence. Environmental parameters exist essentially as fields, which are ...
详细信息
ISBN:
(数字)9798331515966
ISBN:
(纸本)9798331515973
Amidst global warming and escalating extreme weather events, indoor environmental quality’s impact on human health and public hygiene gains prominence. Environmental parameters exist essentially as fields, which are characterized by high dimensionality, density and complexity, and contain massive amounts of information in space. To facilitate visualization and analysis of indoor environmental field, we design and implement BuildEnVR, an immersive analysis system by virtual reality, enabling remote analysis of real-time and historical environmental field data. Grounded in user needs and cognitive psychology, three visualization modes emerge: the Virtual Sensor mode enables users to access perceptual data in real-time at any 3D coordinates in ambient space, the 4D Heatmap mode visualizes spatial variations and trends over time in environmental field data, and the Synaesthesia mode realizes the fusion display of multi-dimensional environmental field data, allowing users to quickly understand the overall condition of the indoor environment with a low cognitive load. Extensive user surveys validate BuildEnVR’s intuitiveness and precision, and it is suitable for both experts and general users.
The robust generalization of deep learning models in the presence of inherent noise remains a significant challenge, especially when labels are subjective and noise is indiscernible in natural settings. This problem i...
详细信息
In Number Theoretic Transform (NTT) operation, more than half of the active energy consumption stems from memory accesses. Here, we propose a generalized design method to improve the energy efficiency of NTT operation...
In Number Theoretic Transform (NTT) operation, more than half of the active energy consumption stems from memory accesses. Here, we propose a generalized design method to improve the energy efficiency of NTT operation by considering the effect of processing element (PE) geometry and memory organization on the data flow between PEs and memory. To decrease the number of data bits that are required to be accessed from the memory, a two-dimensional (2-D) PE array architecture is used. A pair of ping-pong buffers are proposed to transposed swap the coefficients to enable a single bank of memory to be used with the 2-D PE array to reduce the average memory bit access energy without compromising the throughput. Our experimental results show that this design method can produce NTT accelerators with up to 69.8% saving in average energy consumption compared with the existing designs based on multi-bank SRAM and one-bank SRAM with one-dimensional PE array with the same number of PEs and total memory size.
Stencils computations are a class of computations commonly found in scientific and engineering applications. They have relatively lower arithmetic intensity. Therefore, their performance is greatly affected by memory ...
详细信息
Radio modulation recognition is the key link of modern electronic warfare. This paper applies the idea of deep learning to radio modulation recognition. Since the modulation type is the most important information abou...
详细信息
Background Exploring correspondences across multiview images is the basis of various computer vision ***,most existing methods have limited accuracy under challenging *** To learn more robust and accurate corresponden...
详细信息
Background Exploring correspondences across multiview images is the basis of various computer vision ***,most existing methods have limited accuracy under challenging *** To learn more robust and accurate correspondences,we propose DSD-MatchingNet for local feature matching in this ***,we develop a deformable feature extraction module to obtain multilevel feature maps,which harvest contextual information from dynamic receptive *** dynamic receptive fields provided by the deformable convolution network ensure that our method obtains dense and robust ***,we utilize sparse-to-dense matching with symmetry of correspondence to implement accurate pixel-level matching,which enables our method to produce more accurate *** Experiments show that our proposed DSD-MatchingNet achieves a better performance on the image matching benchmark,as well as on the visual localization ***,our method achieved 91.3%mean matching accuracy on the HPatches dataset and 99.3%visual localization recalls on the Aachen Day-Night dataset.
In distributed quantum computing(DQC), quantum hardware design mainly focuses on providing as many as possible high-quality inter-chip connections. Meanwhile, quantum software tries its best to reduce the required num...
In distributed quantum computing(DQC), quantum hardware design mainly focuses on providing as many as possible high-quality inter-chip connections. Meanwhile, quantum software tries its best to reduce the required number of remote quantum gates between chips. However, this “hardware first, software follows” methodology may not fully exploit the potential of DQC. Inspired by classical software–hardware co-design, this paper explores the design space of application-specific DQC architectures. More specifically, we propose Auto Arch, an automated quantum chip network(QCN) structure design tool. With qubits grouping followed by a customized QCN design, AutoArch can generate a near-optimal DQC architecture suitable for target quantum algorithms. Experimental results show that the DQC architecture generated by Auto Arch can outperform other general QCN architectures when executing target quantum algorithms.
Cryo-electron microscopy (cryo-EM) single particle analysis (SPA) has been an indispensable technology to reconstruct three-dimensional (3D) structures of biomolecules at near-atomic resolution. Tens of thousands of p...
详细信息
The twin-field class quantum key distribution (TF-class QKD) has experimentally demonstrated the ability to surpass the fundamental rate-distance limit without requiring a quantum repeater, as a revolutional milestone...
详细信息
暂无评论