检索结果-内蒙古大学图书馆

31st ACM International Conference on Multimedia (MM)

作者： Lin, Hongbin Chen, Bolin Zhang, Zhichen Lin, Jielian Wang, Xu Zhao, Tiesong Fuzhou Univ Fuzhou Peoples R China City Univ Hong Kong Hong Kong Peoples R China Shenzhen Univ Shenzhen Peoples R China

ISBN: (纸本)9798400701085

Nowadays, end-to-end video coding for both machine and human vision has become an emerging research topic. In complicated systems such as large-scale internet of video things (IoVT), feature streams and video streams can be separately encoded and delivered for machine judgement and human viewing. In this paper, we propose a deep scalable video codec (deepSVC) to support three-layer scalability from machine to human vision. First, we design a semantic layer that encodes semantic features extracted from the captured video for machine analysis. This layer employs a conditional semantic compression (CSC) method to remove redundancies between semantic features. Second, we design a structure layer that can be combined with semantic layer to predict the captured video at a low quality. This layer effectively estimates video frames based on semantic layer with an interlayer frame prediction (IFP) network. Third, we design a texture layer that can be combined with the above two layers to reconstruct high-quality video signals. This layer also takes advantage of the IFP network to improve its coding efficiency. In large-scale IoVT systems, deepSVC can deliver semantic layer for regular use and transmit the other layers on demand. Experimental results indicate that the proposed deepSVC outperforms popular codecs for machine and human vision. Compared with scalable extension of H.265/HEVC (SHVC), the proposed deepSVC reduces average bit-per-pixel (bpp) by 25.51%/27.63%/59.87% at the same mAP/PSNR/MS-SSIM. Source-code is available at: https://***/LHB116/deepSVC.

关键词： deep video coding video coding for machines scalable video coding

来源：评论

学校读者我要写书评

暂无评论

An End-to-End video coding Method via Adaptive Vision Transformer

引用

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE 2024年第1期38卷 2354023-2354023页

作者： Yang, Haoyan Zhou, Mingliang Shang, Zhaowei Pu, Huayan Luo, Jun Huang, Xiaoxu Wang, Shilong Cao, Huajun Wei, Xuekai Xian, Weizhi Chongqing Univ Coll Comp Sci Chongqing 400044 Peoples R China Chongqing Univ Sch Mech & Vehicle Engn Chongqing 400044 Peoples R China Chongqing Univ Coll Mat Sci & Engn Chongqing 400044 Peoples R China Harbin Inst Technol Chongqing Res Inst Chongqing 401151 Peoples R China

deep learning-based video coding methods have demonstrated superior performance compared to classical video coding standards in recent years. The vast majority of the existing deep video coding (DVC) networks are based on convolutional neural networks (CNNs), and their main drawback is that since CNNs are affected by the size of the receptive field, they cannot effectively handle long-range dependencies and local detail recovery. Therefore, how to better capture and process the overall structure as well as local texture information in the video coding task is the core issue. Notably, the transformer employs a self-attention mechanism that captures dependencies between any two positions in the input sequence without being constrained by distance limitations. This is an effective solution to the problem described above. In this paper, we propose end-to-end transformer-based adaptive video coding (TAVC). First, we compress the motion vector and residuals through a compression network built on the vision transformer (ViT) and design the motion compensation network based on ViT. Second, based on the requirement of video coding to adapt to different resolution inputs, we introduce a position encoding generator (PEG) as adaptive position encoding (APE) to maintain its translation invariance across different resolution video coding tasks. The experiment shows that for multiscale structural similarity index measurement (MS-SSIM) metrics, this method exhibits significant performance gaps compared to conventional engineering codecs, such as x 264, x 265, and VTM-15.2. We also achieved a good performance improvement compared to the CNN-based DVC methods. In the case of peak signal-to-noise ratio (PSNR) evaluation metrics, TAVC also achieves good performance.

关键词： deep video coding Swin transformer motion estimation position encoding

来源：评论

学校读者我要写书评

暂无评论

Two-Layer Learning-based P-Frame coding with Super-Resolution and Content-Adaptive Conditional ANF 4

Two-Layer Learning-based P-Frame Coding with Super-Resolutio...

引用

4th ACM International Conference on Multimedia in Asia (MMAsia)

作者： Alexandre, David Hang, Hsueh-Ming Peng, Wen-Hsiao Natl Yang Ming Chiao Tung Univ Dept Elect & Comp Engr Hsinchu Taiwan Natl Yang Ming Chiao Tung Univ Dept Elect Engr Hsinchu Taiwan Natl Yang Ming Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan

ISBN: (纸本)9781450394789

deep-learning-based video compression technique has been rapidly growing in recent years. This paper adopts the Conditional Augmented Normalizing Flow video codec (CANF-VC) [8] as our basic system. To improve the quality of the condition signal (image) for CANF, we propose a two-layer structure learning-based video codec. At low cost of extra bit rate, the low-resolution base layer provides side information to improve the quality of motion-compensated reference frame through a super-resolution module with a merge-net. In addition, the base layer also provides information to the skip-mask generator. The skip-mask guides the coding mechanism to reduce the transmitted samples for the high-resolution enhancement layer. The experiment results indicate that the proposed two-layer coding scheme can provide 22.19% PSNR BD-Rate saving and 49.59% MS-SSIM BD-Rate saving over H.265 (HM 16.20) on the UVG test sequences.

关键词： video compression deep video coding two-layer coding skip mode coding merge-net

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：