This paper presents a method allowing learnedvideo encoders to apply arbitrary latent refinement strategies to serve as RateDistortion Optimization (RDO) at the time of encoding. To do so, a latent domain search is a...
详细信息
Traditional block-based spatially scalable videocoding has been studied for over twenty years. While significant advancements have been made, the scope for further improvement in compression performance is limited. I...
详细信息
Traditional block-based spatially scalable videocoding has been studied for over twenty years. While significant advancements have been made, the scope for further improvement in compression performance is limited. Inspired by the success of learned video coding, we propose an end-to-end learned spatially scalable videocoding scheme, LSSVC, which provides a new solution for scalable videocoding. In LSSVC, we propose to use the motion, texture, and latent information of the base layer (BL) as interlayer information for compressing the enhancement layer (EL). To reduce interlayer redundancy, we design three modules to leverage the upsampled interlayer information. Firstly, we design a contextual motion vector (MV) encoder-decoder, which utilizes the upsampled BL motion information to help compress high-resolution MV. Secondly, we design a hybrid temporal-layer context mining module to learn more accurate contexts from the EL temporal features and the upsampled BL texture information. Thirdly, we use the upsampled BL latent information as an interlayer prior for the entropy model to estimate more accurate probability distribution parameters for the high-resolution latents. Experimental results show that our scheme surpasses H.265/SHVC reference software by a large margin. Our code is available at https://***/EsakaK/LSSVC.
The rapid development of Big Data and network technology demands more secure and efficient video transmission for surveillance and video analysis applications. Classical video transmission relies on spatial-frequency ...
详细信息
The rapid development of Big Data and network technology demands more secure and efficient video transmission for surveillance and video analysis applications. Classical video transmission relies on spatial-frequency transformation for compressing with loss but with limited coding efficiencies. The deep learning-based approach exceeds such limitations. In this work, we push the limit further by proposing an implicit spatial transform parameter method, which models the interframe redundancy to efficiently provide information for frame compression. Specifically, our method comprises a transform estimation module, which estimates the conversion from decoded frame to the current frame, and a context generator. The transform compensation and context generator produce a condensed high-dimensional context. Furthermore, we propose a P-frame CoDec for more efficient frame compression by removing the interframe redundancy. The proposed framework is extensible with a flexible context module. We demonstrate experimentally that our method outperforms previous methods by a large margin. Our method brings 34.817% more saved bit rate than H.265/HEVC. We also demonstrate 17.500% more bit rate saving and 0.490 dB gains in peak signal-to-noise ratio (PSNR) compared with the current state-of-the-art learningbased method proposed by Liu et al. (2022).
In response to the Grand Challenge on Neural Network-based videocoding at ISCAS 2024, this paper proposes a learned hierarchical B-frame coding scheme. Most learnedvideo codecs concentrate on P-frame coding for the ...
详细信息
ISBN:
(纸本)9798350330991;9798350331004
In response to the Grand Challenge on Neural Network-based videocoding at ISCAS 2024, this paper proposes a learned hierarchical B-frame coding scheme. Most learnedvideo codecs concentrate on P-frame coding for the RGB content, while B-frame coding for the YUV420 content remains largely under-explored. Some early works explore Conditional Augmented Normalizing Flows (CANF) for B-frame coding. However, they suffer from high computational complexity because of stacking multiple variational autoencoders (VAE) and using separate Y and UV codecs. This work aims to develop a lightweight VAE-based B-frame codec in a conditional coding framework. It features (1) extracting multi-scale features for conditional motion and inter-frame coding, (2) performing frame-type adaptive coding for better bit allocation, and (3) a lightweight conditional VAE backbone that encodes YUV420 content by a simple conversion into YUV444 content for joint Y and UV coding. Experimental results confirms its superior compression performance to the CANF-based B-frame codec from the last year's challenge while having much reduced complexity.
learned hierarchical B-frame coding aims to leverage bidirectional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challe...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
learned hierarchical B-frame coding aims to leverage bidirectional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GOPs, is unable to handle large motion at test time, incurring a negative impact on compression performance. To mitigate the domain shift, we present an online motion resolution adaptation (OMRA) method. It adapts the spatial resolution of video frames on a per-frame basis to suit the capability of the motion estimation network in a pre-trained B-frame codec. Our OMRA is an online, inference technique. It need not re-train the codec and is readily applicable to existing B-frame codecs that adopt hierarchical bi-directional prediction. Experimental results show that OMRA significantly enhances the compression performance of two state-of-the-art learned B-frame codecs on commonly used datasets.
暂无评论