Federated learning (FL) enables distributed training via periodically synchronizing model updates among participants. Communication overhead becomes a dominant constraint of FL since participating clients usually suff...
详细信息
ISBN:
(数字)9798350383508
ISBN:
(纸本)9798350383515
Federated learning (FL) enables distributed training via periodically synchronizing model updates among participants. Communication overhead becomes a dominant constraint of FL since participating clients usually suffer from limited bandwidth. To tackle this issue, top-k based gradient compression techniques are broadly explored in FL context, manifesting powerful capabilities in reducing gradient volumes via picking significant entries. However, previous studies are primarily conducted on the raw gradients where massive spatial redundancies exist and positions of non-zero (top-k) entries vary greatly between gradients, which both impede the achievement of deeper compressions. Top-k may also degrade the performance of trained models due to biased gradient estimations. Targeting the above issues, we propose FedTC, a novel transform coding based compression framework. FedTC transforms gradients into a new domain with more compact energy distributions, which facilitates reducing spatial redundancies and biases in subsequent sparsification. Furthermore, non-zero entries across clients from different rounds become highly aligned in the transform domain, motivating us to partition the gradients into smaller entry blocks with various alignment levels to better exploit these alignments. Lastly, positions and values of non-zero entries are independently compressed in a block-wise manner with our customized designs, through which a higher compression ratio is achieved. Theoretical analysis and extensive experiments consistently demonstrate the effectiveness of our approach.
The Enhanced Compression Model (ECM) serves as the software foundation for future video coding exploration, extending beyond the capabilities of the current Versatile Video coding (VVC) standard. This paper conducts s...
详细信息
ISBN:
(数字)9798350389838
ISBN:
(纸本)9798350389845
The Enhanced Compression Model (ECM) serves as the software foundation for future video coding exploration, extending beyond the capabilities of the current Versatile Video coding (VVC) standard. This paper conducts statistical analyses on ECM encoded videos, focusing particularly on 1D and 2D transformation types, as well as intra and inter prediction modes across videos from different classes with distinct resolutions. These analyses are performed at the decoder level, where the coding decisions have already been made by the encoder. Results reveal that the selection of transformation type and size, as well as prediction mode (intra or inter), depend on video characteristics such as motion and texture. This study represents a significant advancement in the development of intelligent algorithms based on video characteristics to expedite decision-making in the ECM encoding process.
Many video encoders use DCT transform coding to compress the encoded video. For hardware implementation, DCT will be approximately an integer matrix, which may cause some deviations in this process, and these deviatio...
详细信息
ISBN:
(数字)9781510646018
ISBN:
(纸本)9781510646018
Many video encoders use DCT transform coding to compress the encoded video. For hardware implementation, DCT will be approximately an integer matrix, which may cause some deviations in this process, and these deviations will accumulate and become obvious in the larger code unit. Our method is to construct all DCT-related discrete orthogonal transforms in the required size (corresponding to the coding unit supported by H.266/VVC). By using a novel discrete orthogonal matrix generation method with determined DCT-II roots, and scaling and rounding a regular DCT that depends on the quantization parameter, instead of integer approximation. We can obtain an accurate integer DCT matrix. Experimental results show that this method can not only improve the video quality and also require fewer bit rates.
Contemporary lossy image and video coding standards rely on transform coding, the process through which pixels are mapped to an alternative representation to facilitate efficient data compression. Despite impressive p...
详细信息
Contemporary lossy image and video coding standards rely on transform coding, the process through which pixels are mapped to an alternative representation to facilitate efficient data compression. Despite impressive performance of end-to-end optimized compression with deep neural networks, the high computational and space demands of these models has prevented them from superseding the relatively simple transform coding found in conventional video codecs. In this study, we propose learned transforms and entropy coding that may either serve as (non)linear drop-in replacements, or enhancements for linear transforms in existing codecs. These transforms can be multi-rate, allowing a single model to operate along the entire rate-distortion curve. To demonstrate the utility of our framework, we augmented the DCT with learned quantization matrices and adaptive entropy coding to compress intra-frame AV1 block prediction residuals. We report substantial BD-rate and perceptual quality improvements over more complex nonlinear transforms at a fraction of the computational cost.
In this paper, we describe a video coding design that enables a higher coding efficiency than the HEVC standard. The proposed video codec follows the design of block-based hybrid video coding, but includes a number of...
详细信息
In this paper, we describe a video coding design that enables a higher coding efficiency than the HEVC standard. The proposed video codec follows the design of block-based hybrid video coding, but includes a number of advanced coding tools. A part of the incorporated advanced concepts was developed by the Joint Video Exploration Team, while others are newly proposed. The key aspects of these newly proposed tools are the following. A video frame is subdivided into rectangles of variable size using a binary partitioning with variable split ratios. Three new approaches for generating spatial intra prediction signals are supported: A line-wise application of conventional intra prediction modes, coupled with a mode-dependent processing order, a region-based template matching prediction method and intra prediction modes based on neural networks. For motion-compensated prediction, a multi-hypothesis mode with more than two motion hypotheses can be used. In transform coding, mode dependent combinations of primary and secondary transforms are applied. Moreover, scalar quantization is replaced by trellis-coded quantization and the entropy coding of the quantized transform coefficients is improved. The intra and inter prediction signals can be filtered using an edge-preserving diffusion filter or a non-linear DCT-based thresholding operation. The video codec includes an adaptive in-loop filter for which one of three classifiers can be chosen on a picture basis. We also incorporated an optional encoder control, which adjusts the quantization parameters based on a perceptually motivated distortion measure. In a random access scenario, our proposed video codec achieves luma BD-rate savings between 32.5% for HDR HLG UHD and 39.6% for SDR UHD over the HEVC (HM software) anchor for different categories of test sequences.
This paper proposes a new method for pre-echo reduction in transform-based audio coding by controlling the temporal envelope of the waveform. The proposed method comprises two operating modes: temporal envelope flatte...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
This paper proposes a new method for pre-echo reduction in transform-based audio coding by controlling the temporal envelope of the waveform. The proposed method comprises two operating modes: temporal envelope flattening and temporal envelope correction of a target signal. The proposed method estimates signal levels with a low temporal resolution from side information using machine learning and converts them into a signal to be applied to the target signal to flatten and correct the temporal envelope. It also adjusts the signals to maintain signal continuity between the non-transient and transient frames. The proposed method differs from conventional methods in that it directly modifies the waveform before encoding and after decoding, which makes it useful as a new coding tool for legacy codecs. A subjective performance evaluation confirms that the proposed method uses fewer bits to provide sound quality equivalent to that of the short-window transform.
The development of real-time 3D sensing devices and algorithms (e.g., multiview capturing systems, Time-of-Flight depth cameras, LIDAR sensors), as well as the widespreading of enhanced user applications processing 3D...
详细信息
The development of real-time 3D sensing devices and algorithms (e.g., multiview capturing systems, Time-of-Flight depth cameras, LIDAR sensors), as well as the widespreading of enhanced user applications processing 3D data, have motivated the investigation of innovative and effective coding strategies for 3D point clouds. Several compression algorithms, as well as some standardization efforts, has been proposed in order to achieve high compression ratios and flexibility at a reasonable computational cost. This paper presents a transform-based coding strategy for dynamic point clouds that combines a non-linear transform for geometric data with a linear transform for color data;both operations are region-adaptive in order to fit the characteristics of the input 3D data. Temporal redundancy is exploited both in the adaptation of the designed transform and in predicting the attributes at the current instant from the previous ones. Experimental results showed that the proposed solution obtained a significant bit rate reduction in lossless geometry coding and an improved rate-distortion performance in the lossy coding of color components with respect to state-of-the-art strategies.
This paper introduces the GBT-NN, a novel class of Graph-based transform within the context of block-based predictive transform coding using intra-prediction. The GBT-NNis constructed by learning a mapping function to...
详细信息
This paper introduces the GBT-NN, a novel class of Graph-based transform within the context of block-based predictive transform coding using intra-prediction. The GBT-NNis constructed by learning a mapping function to map a graph Laplacian representing the covariance matrix of the current block. Our objective of learning such a mapping functionis to design a GBT that performs as well as the KLT without requiring to explicitly com-pute the covariance matrix for each residual block to be transformed. To avoid signallingany additional information required to compute the inverse GBT-NN, we also introduce acoding framework that uses a template-based prediction to predict residuals at the decoder. Evaluation results on several video frames and medical images, in terms of the percentageof preserved energy and mean square error, show that the GBT-NN can outperform the DST and DCT.
In this work we analyse the capacity of the distributed MIMO uplink when transform coding is applied locally at each remote radio head (RRH) to compress fronthaul traffic. Assuming the use of optimal scalar compressio...
详细信息
ISBN:
(纸本)9781538680889
In this work we analyse the capacity of the distributed MIMO uplink when transform coding is applied locally at each remote radio head (RRH) to compress fronthaul traffic. Assuming the use of optimal scalar compression, we derive a closed form capacity expression for the distributed MIMO uplink under Gaussian signalling, which is shown to be a function of both local and global channel eigendecompositions. We then outline two rate allocation schemes for efficiently allocating the available fronthaul to the compressed scalars, based on either local or global channel state information (CSI). Numerical results under Rayleigh fading conditions are presented which show that transform coding can provide a significant compression gain relative to direct signal quantisation, which grows as the number of antennas deployed at each RRH increases. Results also show that allocating fronthaul based on global CSI significantly improves performance, especially as the number of RRHs deployed increases.
With the advent of virtual and augmented reality applications, 3D and free-viewpoint representations have evolved towards solid scene models using meshes and point clouds. Recent works have been addressing point cloud...
详细信息
ISBN:
(纸本)9781479970612
With the advent of virtual and augmented reality applications, 3D and free-viewpoint representations have evolved towards solid scene models using meshes and point clouds. Recent works have been addressing point clouds compression via octree-based hierarchical strategies in order to enable a multi-resolution coding and visualization at a reasonable computational cost. This paper presents a voxelized dynamic point cloud coding scheme that combines a Cellular Automata block reversible transform for geometric data with a region adaptive transform for color data. Temporal redundancy is removed using a low-complexity prediction scheme to minimize the computational complexity and reduce the coded bit rate. Experimental results showed that the proposed solution obtained a significant bit rate reduction in lossless geometry coding and an improved rate-distortion performance in the lossy coding of color components with respect to state-of-the-art strategies.
暂无评论