The coronavirus disease (COVID-19) changed the world’s lifestyle switching many techno-services to be provided remotely instead of direct usual physical interactions between people. This study focused on university s...
详细信息
Exploring the expected quantizing scheme with suitable mixed-precision policy is the key point to compress deep neural networks (DNNs) in high efficiency and accuracy. This exploration implies heavy workloads for doma...
详细信息
Exploring the expected quantizing scheme with suitable mixed-precision policy is the key point to compress deep neural networks (DNNs) in high efficiency and accuracy. This exploration implies heavy workloads for domain experts, and an automatic compression method is needed. However, the huge search space of the automatic method introduces plenty of computing budgets that make the automatic process challenging to be applied in real scenarios. In this paper, we propose an end-to-end framework named AutoQNN, for automatically quantizing different layers utilizing different schemes and bitwidths without any human labor. AutoQNN can seek desirable quantizing schemes and mixed-precision policies for mainstream DNN models efficiently by involving three techniques: quantizing scheme search (QSS), quantizing precision learning (QPL), and quantized architecture generation (QAG). QSS introduces five quantizing schemes and defines three new schemes as a candidate set for scheme search, and then uses the differentiable neural architecture search (DNAS) algorithm to seek the layer- or model-desired scheme from the set. QPL is the first method to learn mixed-precision policies by reparameterizing the bitwidths of quantizing schemes, to the best of our knowledge. QPL optimizes both classification loss and precision loss of DNNs efficiently and obtains the relatively optimal mixed-precision model within limited model size and memory footprint. QAG is designed to convert arbitrary architectures into corresponding quantized ones without manual intervention, to facilitate end-to-end neural network quantization. We have implemented AutoQNN and integrated it into Keras. Extensive experiments demonstrate that AutoQNN can consistently outperform state-of-the-art quantization. For 2-bit weight and activation of AlexNet and ResNet18, AutoQNN can achieve the accuracy results of 59.75% and 68.86%, respectively, and obtain accuracy improvements by up to 1.65% and 1.74%, respectively, compared
Sparse matrix-sparse vector multiplication (SpMSpV) is an important primitive for graph algorithms and machine learning applications. The sparsity of the input and output vectors makes its floating point efficiency in...
详细信息
ISBN:
(纸本)9781450397339
Sparse matrix-sparse vector multiplication (SpMSpV) is an important primitive for graph algorithms and machine learning applications. The sparsity of the input and output vectors makes its floating point efficiency in general lower than sparse matrix-vector multiplication (SpMV) and sparse matrix-matrix multiplication (SpGEMM). Existing parallel SpMSpV methods focused on various row- and column-wise storage formats and merging operations. However, the data locality and sparsity pattern of the input matrix and vector are largely ignored. We in this paper propose TileSpMSpV, a tiled algorithm for accelerating SpMSpV on GPUs. Firstly, tile-wise storage structures are developed for fast positioning a group of nonzeros in matrix and vectors. Then, we develop the TileSpMSpV algorithm on top of the storage structures. In addition, to accelerate directional optimization breadth-first search (BFS) by using TileSpMSpV, we propose a TileBFS algorithm including three kernels called Push-CSC, Push-CSR and Pull-CSC. In the experiments running on a high-end NVIDIA GPU and using 2757 sparse matrices, the TileSpMSpV algorithm outperforms TileSpMV, cuSPARSE and CombBLAS by a factor of on average 1.83, 17.18 and 17.20 (up to 7.68, 1050.02 and 235.90), respectively. Moreover, our TileBFS algorithm outperforms Gunrock and GSwitch by a factor of on average 2.88 and 4.52 (up to 21.35 and 1000.85), respectively.
Selection of insulation material and determination of its thickness are the two most important parameters that prevent heat loss. Too much thickness complicates the price and use. The low coefficient of thermal conduc...
详细信息
Digital twins have a major potential to form a significant part of urban management in emergency planning, as they allow more efficient designing of the escape routes, better orientation in exceptional situations, and...
详细信息
Probabilistic graphical models, such as Markov random fields (MRFs), are useful for describing high-dimensional distributions in terms of local dependence structures. The probabilistic inference is a fundamental probl...
详细信息
This research study examines the evolving ecosystem of network applications for enhancing connectivity and performance in 5G and beyond (B5G) networks. The objective is to streamline large-scale deployment of vertical...
详细信息
ISBN:
(数字)9798331530013
ISBN:
(纸本)9798331530020
This research study examines the evolving ecosystem of network applications for enhancing connectivity and performance in 5G and beyond (B5G) networks. The objective is to streamline large-scale deployment of vertical applications through a middleware layer, facilitating interactions among network operators and third parties based on varying trust levels. The proposed model addresses the complexity and computational demands of enabling adaptable, secure, and scalable applications for diverse 5G platforms. In particular, this study highlights the 5G-EPICENTRE project, which enables traffic management and dynamic control, especially for missioncritical Public Protection and Disaster Relief (PPDR) applications. By optimizing resource allocation, the model reduces computational costs while meeting the unique demands of PPDR services, such as high-quality video and data for emergency operations. The model's effectiveness will be evaluated through experiments leveraging 5G Core (5GC) control-plane capabilities, with a focus on quality of service (QoS) and latency. Practical limitations, including the integration challenges of multi-network APIs, are discussed in the conclusion.
This one-day hybrid workshop builds on previous feminist CSCW workshops to explore feminist theoretical and methodological approaches that have provided us with useful tools to see things differently and make space fo...
详细信息
The article is devoted to the development of means for recognition of the emotions of the speaker, based on the neural network analysis of fixed fragments of the voice signal. The possibility of improving recognition ...
详细信息
Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many *** to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural networks,fail to...
详细信息
Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many *** to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural networks,fail to achieve comparable performance especially on tasks with large problem *** previous work tried to close the gap between DNNs and SNNs but used small networks on simple *** work proposes a simple but effective way to construct deep spiking neural networks(DSNNs)by transferring the learned ability of DNNs to *** achieve comparable accuracy on large networks and complex datasets.
暂无评论