检索结果-内蒙古大学图书馆

GAN-based multi-view video coding with spatio-temporal EPI reconstruction

SIGNAL PROCESSING-IMAGE COMMUNICATION 2025年 132卷

作者： Lan, Chengdong Yan, Hao Luo, Cheng Zhao, Tiesong Fuzhou Univ Fujian Key Lab Intelligent Proc & Wireless Transmi Fuzhou 350108 Peoples R China

The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SInfo). Typically, depth maps are used to construct SInfo. However, these methods suffer from reconstruction inaccuracies and inherently high bitrates. In this paper, we propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adversarial Network (GAN) to improve the reconstruction accuracy of SInfo. Additionally, we consider incorporating information from adjacent temporal and spatial viewpoints to further reduce SInfo redundancy. At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize a convolutional network to extract the latent code of a GAN as SInfo. At the decoder, we combine the SInfo and adjacent viewpoints to reconstruct intermediate views using the GAN generator. Specifically, we establish a joint encoder constraint for reconstruction cost and SInfo entropy to achieve an optimal trade-off between reconstruction quality and bitrate overhead. Experiments demonstrate the significant improvement in Rate-Distortion (RD) performance compared to state-of-the-art methods.

关键词： multi-view video coding Generative adversarial network Latent code learning Epipolar plane image

来源：评论

学校读者我要写书评

暂无评论

Fast inter-frame prediction in multi-view video coding based on perceptual distortion threshold model

引用

SIGNAL PROCESSING-IMAGE COMMUNICATION 2019年 70卷 199-209页

作者： Jiang, Gangyi Du, Baozhen Fang, Shuqing Yu, Mei Shao, Feng Peng, Zongju Chen, Fen Ningbo Univ Fac Informat Sci & Engn Ningbo Zhejiang Peoples R China Ningbo Polytech Elect & Informat Coll Ningbo Zhejiang Peoples R China

multi-view video coding (MVC) utilizes hierarchical B picture prediction structure and adopts many coding techniques to remove spatiotemporal and inter-view redundancies at the cost of high computational complexity. In this paper, a novel perceptual distortion threshold model (PDTM) is proposed to reveal the relationship between the mode selection of inter-frame prediction and coding distortion threshold. Based on the proposed PDTM, a new fast inter-frame prediction algorithm in MVC is developed aimed at minimizing computational complexity for dependent view coding. Then the fast MVC algorithm is incorporated into the multi-view High Efficiency video coding (MV-HEVC) software to improve MVC coding efficiency. In practical coding, the mode selection for inter-frame prediction of dependent views may be early terminated based on the thresholds derived from the PDTM, thereby reducing the coding time complexity. Experimental results demonstrate that the proposed algorithm can reduce the computational complexity of the dependent views by 52.9% compared with the HTM14.1 algorithm under the coding structure of hierarchical B pictures. Moreover, the bitrate is increased by 0.9% under the same subjective quality and only increased by 1.0% under the same objective quality peak signal-to-noise ratio (PSNR). Compared with the state-of-the-art fast algorithm, the proposed algorithm can save more coding time, while the bitrate under the same PSNR increases slightly.

关键词： multi-view video coding Perceptual distortion threshold model Binocular just noticeable difference Fast mode decision

来源：评论

学校读者我要写书评

暂无评论

Prediction architecture based on block matching statistics for mixed spatial-resolution multi-view video coding

引用

EURASIP JOURNAL ON IMAGE AND video PROCESSING 2017年第1期2017卷 1页

作者： Said, Hany Moniri, Mansour Chibelushi, Claude C. Arab Acad Sci Technol & Maritime Transport Coll Engn Alexandria Egypt Univ East London Sch Architecture Comp & Engn London England Staffordshire Univ Fac Comp Engn & Sci Stoke On Trent Staffs England

The use of mixed spatial resolutions in multi-view video coding is a promising approach for coding videos efficiently at low bitrates. It can achieve a perceived quality, which is close to the view with the highest quality, according to the suppression theory of binocular vision. The aim of the work reported in this paper is to develop a new multi-view video coding technique suitable for low bitrate applications in terms of coding efficiency, computational and memory complexity, when coding videos, which contain either a single or multiple scenes. The paper proposes a new prediction architecture that addresses deficiencies of prediction architectures for multi-view video coding based on H.264/AVC. The prediction architectures which are used in mixed spatial-resolution multi-view video coding (MSR-MVC) are afflicted with significant computational complexity and require significant memory size, with regards to coding time and to the minimum number of reference frames. The architecture proposed herein is based on a set of investigations, which explore the effect of different inter-view prediction directions on the coding efficiency of multi-view video coding, conduct a comparative study of different decimation and interpolation methods, in addition to analyzing block matching statistics. The proposed prediction architecture has been integrated with an adaptive reference frame ordering algorithm, to provide an efficient coding solution for multi-view videos with hard scene changes. The paper includes a comparative performance assessment of the proposed architecture against an extended architecture based on the 3D digital multimedia broadcast (3D-DMB) and the Hierarchical B-Picture (HBP) architecture, which are two most widely used architectures for MSR-MVC. The assessment experiments show that the proposed architecture needs less bitrate by on average 13.1 Kbps, less coding time by 14% and less memory consumption by 31.6%, compared to a corresponding codec, which deploy

关键词： H.264/AVC Mixed spatial-resolution multi-view video coding Prediction architecture

来源：评论

学校读者我要写书评

暂无评论

Error-Resilient multi-view video coding Based on End-to-End Rate-Distortion Optimization

引用

Chinese Journal of Electronics 2016年第2期25卷 277-283页

作者： GAO Pan PENG Qiang WANG Qionghua School of Information Science and Technology Southwest Jiaotong University School of Mechanical and Electrical Engineering University of Southern Queensland School of Electronics and Information Engineering Sichuan University

video transmission over packet-switched networks usually suffers from packet losses. The use of the prediction loop in video coding will cause these errors to propagate to subsequent frames, and thus significantly impacts on the received video quality. With the increasing number of cameras to capture the scene, robustly delivering multi-view video over error-prone channels becomes a rather challenging task. A rate-distortion optimization algorithm is proposed to improve error resilience for multi-view video transmission. A recursive model to estimate the end-to-end distortion is developed for multiview video coding, in which the distortion model explicitly takes into consideration the inherent error resilience property of the hierarchical bi-prediction structure. Based on the proposed distortion model, end-to-end rate-distortion optimization criterion is employed to perform coding mode switching. Extensive experimental results demonstrate significant performance gains can be achieved for multi-view video communication against transmission errors.

关键词： multi-view video coding Error-resilient Distortion estimation Mode switching

来源：评论

学校读者我要写书评

暂无评论

Light Field multi-view video coding With Two-Directional Parallel Inter-view Prediction

引用

IEEE TRANSACTIONS ON IMAGE PROCESSING 2016年第11期25卷 5104-5117页

作者： Wang, Gengkun Xiang, Wei Pickering, Mark Chen, Chang Wen Univ Southern Queensland Sch Mech & Elect Engn Toowoomba Qld 4350 Australia James Cook Univ Coll Sci Technol & Engn Cairns Qld 4870 Australia UNSW Sch Engn & Informat Technol Canberra ACT 2600 Australia SUNY Buffalo Dept Comp Sci & Engn Buffalo NY 14260 USA

Light field (LF) technology has been popularly adopted by a wide range of conventional industries. However, one problem when dealing with LFs is the sheer size of data volume. There have been many multi-view video coding (MVC)-based LF video coding methods reported in the literature, aiming at finding the best prediction structure for LF video coding. It is clear that the number of possible prediction structures is unlimited, and it is also observed that the coding bit-rate can be reduced by increasing the number of bi-directionally encoded views in the prediction structure. However, none work has been conducted to analyze the relationship of the prediction structure with its coding performance. In light of this observation, we first design a new LF-MVC prediction structure by extending the inter-view prediction into a two-directional parallel structure. Analytical models for source coding rate and encoding time are developed to analyze their relationships with the prediction structure, and are proven to be well-matched to our experimental results. Experimental evaluation of two LF video sequences demonstrates that the proposed LF-MVC prediction structure can achieve a factor of 26% bit-rate reduction against the conventional MVC prediction structure for an LF video with 5 x 5 views, and a further 34% bit-rate reduction for an LF video with a larger 10 x 10 views. Compared with the state-of-the-art MVC-based LF video coding prediction structures in the literature, LF-MVC can achieve the best coding performance, and with its high encoding efficiency, is well suited for deployment in practical LF-based 3D systems.

关键词： Light field video multi-view video coding prediction structure source coding rate encoding time

来源：评论

学校读者我要写书评

暂无评论

Error-resilient multi-view video coding using Wyner-Ziv techniques

引用

multiMEDIA TOOLS AND APPLICATIONS 2015年第18期74卷 7957-7982页

作者： Gao, Pan Peng, Qiang Xiang, Wei Southwest Jiaotong Univ Sch Informat Sci & Technol Chengdu 610031 Sichuan Peoples R China Univ So Queensland Sch Mech & Elect Engn Toowoomba Qld 4350 Australia

In this paper, a Wyner-Ziv (WZ) coding based error-resilient scheme is proposed for multi-view video transmission over error-prone channels. At the encoder, the key frames of the odd views are protected by WZ encoding to generate the auxiliary bit-stream alongside the multi-view video coded bit-stream. At the decoder, error-concealed multi-view decoded frames are used as the side information (SI) for WZ decoding. Based on the study on the characteristics of multi-view video coding (MVC) and the propagating behavior of channel errors, a recursive model to estimate the transmission distortion is developed in the transform domain, in which the channel-induced distortion takes into consideration both motion and disparity compensation. With the proposed model, we propose a rate control strategy for WZ encoding to infer the minimum bit rate so as to correct the SI errors. The WZ bit rate estimation method exploits the correlation between the original bit-planes and the SI bit-planes as well as the bit-plane interdependency. Extensive experimental results show that the proposed error-resilient scheme outperforms Reed Solomon based forward error correction method by about 1.1 dB and outperforms the adaptive intra refresh algorithm by approximately 1.6 dB at the packet loss rate 10 %.

关键词： multi-view video coding Wyner-Ziv coding Error resilience Transmission distortion estimation Bit rate estimation

来源：评论

学校读者我要写书评

暂无评论

Performance Improvement of multi-view video coding Based on Geometric Prediction and Human Visual System

引用

INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY 2015年第1期25卷 41-49页

作者： Li, Mian-Shiuan Chen, Mei-Juan Yeh, Chia-Hung Tai, Kuang-Han Natl Dong Hwa Univ Dept Elect Engn Shoufeng Township Taiwan Natl Sun Yat Sen Univ Dept Elect Engn Kaohsiung Taiwan

This article proposed an accurate disparity vector prediction (DVP) algorithm for multi-view video coding. Differing from traditional DVP that uses the information of motion vectors of neighboring blocks, the geometry of the camera position is utilized to calculate the parallax of different viewpoints in this algorithm and this parallax is the foundation of DVP. We jointly applied the Just-Noticeable-Difference human visual model to the DVP. After filtered using Gaussian function, the geometric DVP was obtained. Experimental results showed that the proposed method achieved significant data reduction and subjective/objective quality enhancement. (c) 2015 Wiley Periodicals, Inc.

关键词： multi-view video coding disparity vector prediction geometric prediction parallax Gaussian

来源：评论

学校读者我要写书评

暂无评论

Color correction algorithm based on camera characteristics for multi-view video coding

引用

SIGNAL IMAGE AND video PROCESSING 2014年第5期8卷 955-966页

作者： Jung, Jae-Il Ho, Yo-Sung Gwangju Inst Sci & Technol Dept Informat & Commun C 412 Kwangju 500712 South Korea

Various types of multi-view camera systems have been proposed for capturing three dimensional scenes. Yet, color distributions among multi-view images remain inconsistent in most cases, degrading multi-view video coding performance. In this paper, we propose a color correction algorithm based on the camera characteristics to effectively solve such a problem. Initially, we model camera characteristics and estimate their coefficients by means of correspondences between views. To consider occlusion in multi-view images, correspondences are extracted via feature-based matching. During coefficient estimation with nonlinear regression, we remove outliers in the extracted correspondences. Consecutively, we generate lookup tables for each camera using the model and estimated coefficients. Such tables are employed for fast color converting in the final color correction process. The experimental results show that our algorithm enhances coding efficiency with gains of up to 0.9 and 0.8 dB for luminance and chrominance components, respectively. Further, the method also improves subjective viewing quality and reduces color distance between views.

关键词： Camera characteristic curve Color correction Color inconsistency problem multi-view video coding

来源：评论

学校读者我要写书评

暂无评论

Adaptive Learning Based view Synthesis Prediction for multi-view video coding

引用

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND video TECHNOLOGY 2014年第1期74卷 115-126页

作者： Hu, Jinhui Hu, Ruimin Wang, Zhongyuan Gao, Ge Duan, Mang Gong, Yan Wuhan Univ Sch Comp Natl Engn Res Ctr Multimedia Software Wuhan 430079 Peoples R China

In the applications of Free view TV, pre-estimated depth information is available to synthesize the intermediate views as well as to assist multi-view video coding. Existing view synthesis prediction schemes generate virtual view picture only from interview pictures. However, there are many types of signal mismatches caused by depth errors, camera heterogeneity or illumination difference across views and these mismatches decrease the prediction capability of virtual view picture. In this paper, we propose an adaptive learning based view synthesis prediction algorithm to enhance the prediction capability of virtual view picture. This algorithm integrates least square prediction with backward warping to synthesize the virtual view picture, which not only utilizes the adjacent views information but also the temporal decoded information to adaptively learn the prediction coefficients. Experiments show that the proposed method reduces the bitrates by up to 18 % relative to the multi-view video coding standard, and about 11 % relative to the conventional view synthesis prediction method.

关键词： multi-view video coding Depth map view synthesis prediction

来源：评论

学校读者我要写书评

暂无评论

Efficient multi-view video coding using inter-view information

引用

SIGNAL PROCESSING-IMAGE COMMUNICATION 2014年第6期29卷 667-677页

作者： Huang, Xin-Xian Chen, Mei-Juan Yeh, Chia-Hung Chi, Hao-Wen Chen, Chia-Yen Natl Dong Hwa Univ Dept Elect Engn Hualien 974 Taiwan Natl Sun Yat Sen Univ Dept Elect Engn Kaohsiung 804 Taiwan Natl Kaohsiung Univ Dept Comp Sci & Informat Engn Kaohsiung 811 Taiwan

multi-view video coding (MVC) has been extended from H.264/AVC to improve the coding efficiency of multi-view video. This paper proposes a fast mode decision algorithm which can make an early decision on the correct mode partition to solve the issue of the enormous computational complexity. The best modes of the reference views are utilized to determine the complexity of the macroblock (MB) in the current view, the mode candidates needed to be calculated can then be obtained according to the complexity. If the complexity is low or medium, the search range can be reduced. The threshold of the rate-distortion cost for the current MB is calculated using the co-located and neighboring MBs in previously coded view and is utilized as the criterion for early termination. The motion vector difference in the reference view is applied to dynamically adjust the search range in the current MB. Experimental results prove that the proposed algorithm achieves a time saving of 81.05% for a fast TZ search and 87.85% for full search, and still maintains quality performance and bitrate. (C) 2014 Elsevier B.V. All rights reserved.

关键词： multi-view video coding Disparity Motion estimation Mode decision Rate-distortion cost Search range

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：