检索结果-内蒙古大学图书馆

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Abdoli, Mohsen Clare, Gordon Henry, Félix Irt B-Com 1219 Avenue des Champs Blancs Cesson-Sévigné35510 France

ISBN: (纸本)9781728163277

This paper presents a method allowing learned video encoders to apply arbitrary latent refinement strategies to serve as RateDistortion Optimization (RDO) at the time of encoding. To do so, a latent domain search is applied on an initial latent representation of the video signal. This search is implemented as a set of iterations, each of which performs a gradient descent with back-propagation of error defined by a Lagrangian RD cost. This cost function is intentionally chosen to be the same as the cost function that was used during the end-to-end model training, except that instead of updating model weights, each iteration fine-tunes the latent representation itself. Moreover, a temporal look-ahead is integrated in the cost function of I and P frames to take into account the cascade effect of their latent fine-tuning on subsequent frames in the Group of Pictures (GOP). The experiments show that the proposed latent space RDO method can improve by 11.6% and 9.4% in terms of BD-BR coding efficiency in Random-Access (RA) and All-Intra (AI) configurations, when applied on top a high-performance opensource end-to-end codec. © 2023 IEEE.

关键词： Back-propagation with gradient decent learned video coding Rate-Distortion Optimization

来源：评论

学校读者我要写书评

暂无评论

LSSVC: A learned Spatially Scalable video coding Scheme

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2024年 33卷 3314-3327页

作者： Bian, Yifan Sheng, Xihua Li, Li Liu, Dong Univ Sci & Technol China MOE Key Lab Brain Inspired Intelligent Percept & C Hefei 230027 Peoples R China

Traditional block-based spatially scalable video coding has been studied for over twenty years. While significant advancements have been made, the scope for further improvement in compression performance is limited. Inspired by the success of learned video coding, we propose an end-to-end learned spatially scalable video coding scheme, LSSVC, which provides a new solution for scalable video coding. In LSSVC, we propose to use the motion, texture, and latent information of the base layer (BL) as interlayer information for compressing the enhancement layer (EL). To reduce interlayer redundancy, we design three modules to leverage the upsampled interlayer information. Firstly, we design a contextual motion vector (MV) encoder-decoder, which utilizes the upsampled BL motion information to help compress high-resolution MV. Secondly, we design a hybrid temporal-layer context mining module to learn more accurate contexts from the EL temporal features and the upsampled BL texture information. Thirdly, we use the upsampled BL latent information as an interlayer prior for the entropy model to estimate more accurate probability distribution parameters for the high-resolution latents. Experimental results show that our scheme surpasses H.265/SHVC reference software by a large margin. Our code is available at https://***/EsakaK/LSSVC.

关键词： learned video coding spatial scalability scalable video coding contextual MV encoder-decoder hybrid temporal-layer context mining interlayer prior

来源：评论

学校读者我要写书评

暂无评论

Learning-Based video Compression Framework With Implicit Spatial Transform for Applications in the Internet of Things

引用

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 2023年第5期19卷 6576-6587页

作者： Li, Qinghai Zhu, Shiping Wang, Jinxiang Chen, Tianrun Beihang Univ Sch Instrumentat & Optoelect Engn Dept Measurement Control & Informat Technol Beijing 100191 Peoples R China Zhejiang Univ Coll Comp Sci & Technol Hangzhou 310027 Peoples R China Moxin Huzhou Technol Co Ltd Mafu Lab Huzhou 313000 Zhejiang Peoples R China

The rapid development of Big Data and network technology demands more secure and efficient video transmission for surveillance and video analysis applications. Classical video transmission relies on spatial-frequency transformation for compressing with loss but with limited coding efficiencies. The deep learning-based approach exceeds such limitations. In this work, we push the limit further by proposing an implicit spatial transform parameter method, which models the interframe redundancy to efficiently provide information for frame compression. Specifically, our method comprises a transform estimation module, which estimates the conversion from decoded frame to the current frame, and a context generator. The transform compensation and context generator produce a condensed high-dimensional context. Furthermore, we propose a P-frame CoDec for more efficient frame compression by removing the interframe redundancy. The proposed framework is extensible with a flexible context module. We demonstrate experimentally that our method outperforms previous methods by a large margin. Our method brings 34.817% more saved bit rate than H.265/HEVC. We also demonstrate 17.500% more bit rate saving and 0.490 dB gains in peak signal-to-noise ratio (PSNR) compared with the current state-of-the-art learningbased method proposed by Liu et al. (2022).

关键词： Deep learning Internet of Things (IoT) learned video coding video compression

来源：评论

学校读者我要写书评

暂无评论

Conditional Variational Autoencoders for Hierarchical B-frame coding

Conditional Variational Autoencoders for Hierarchical B-fram...

引用

IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Gao, Zong-Lin Chen, Cheng-Wei Yao, Yi-Chen Ho, Cheng-Yuan Peng, Wen-Hsiao Natl Yang Ming Chiao Tung Univ Comp Sci Dept Hsinchu Taiwan

ISBN: (纸本)9798350330991;9798350331004

In response to the Grand Challenge on Neural Network-based video coding at ISCAS 2024, this paper proposes a learned hierarchical B-frame coding scheme. Most learned video codecs concentrate on P-frame coding for the RGB content, while B-frame coding for the YUV420 content remains largely under-explored. Some early works explore Conditional Augmented Normalizing Flows (CANF) for B-frame coding. However, they suffer from high computational complexity because of stacking multiple variational autoencoders (VAE) and using separate Y and UV codecs. This work aims to develop a lightweight VAE-based B-frame codec in a conditional coding framework. It features (1) extracting multi-scale features for conditional motion and inter-frame coding, (2) performing frame-type adaptive coding for better bit allocation, and (3) a lightweight conditional VAE backbone that encodes YUV420 content by a simple conversion into YUV444 content for joint Y and UV coding. Experimental results confirms its superior compression performance to the CANF-based B-frame codec from the last year's challenge while having much reduced complexity.

关键词： learned video coding YUV420 B-frame coding

来源：评论

学校读者我要写书评

暂无评论

OMRA: ONLINE MOTION RESOLUTION ADAPTATION TO REMEDY DOMAIN SHIFT IN learned HIERARCHICAL B-FRAME coding 31

OMRA: ONLINE MOTION RESOLUTION ADAPTATION TO REMEDY DOMAIN S...

引用

2024 International Conference on Image Processing

作者： Gao, Zong-Lin Sang NguyenQuang Peng, Wen-Hsiao Xiem HoangVan Natl Yang Ming Chiao Tung Univ Comp Sci Dept Hsinchu Taiwan VNU Univ Engn & Technol Elect & Telecommun Hanoi Vietnam

ISBN: (纸本)9798350349405;9798350349399

learned hierarchical B-frame coding aims to leverage bidirectional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GOPs, is unable to handle large motion at test time, incurring a negative impact on compression performance. To mitigate the domain shift, we present an online motion resolution adaptation (OMRA) method. It adapts the spatial resolution of video frames on a per-frame basis to suit the capability of the motion estimation network in a pre-trained B-frame codec. Our OMRA is an online, inference technique. It need not re-train the codec and is readily applicable to existing B-frame codecs that adopt hierarchical bi-directional prediction. Experimental results show that OMRA significantly enhances the compression performance of two state-of-the-art learned B-frame codecs on commonly used datasets.

关键词： learned video coding B-frame coding and Domain Shift

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：