The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints d...
详细信息
The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SInfo). Typically, depth maps are used to construct SInfo. However, these methods suffer from reconstruction inaccuracies and inherently high bitrates. In this paper, we propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adversarial Network (GAN) to improve the reconstruction accuracy of SInfo. Additionally, we consider incorporating information from adjacent temporal and spatial viewpoints to further reduce SInfo redundancy. At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize a convolutional network to extract the latent code of a GAN as SInfo. At the decoder, we combine the SInfo and adjacent viewpoints to reconstruct intermediate views using the GAN generator. Specifically, we establish a joint encoder constraint for reconstruction cost and SInfo entropy to achieve an optimal trade-off between reconstruction quality and bitrate overhead. Experiments demonstrate the significant improvement in Rate-Distortion (RD) performance compared to state-of-the-art methods.
multi-view video coding (MVC) utilizes hierarchical B picture prediction structure and adopts many coding techniques to remove spatiotemporal and inter-view redundancies at the cost of high computational complexity. I...
详细信息
multi-view video coding (MVC) utilizes hierarchical B picture prediction structure and adopts many coding techniques to remove spatiotemporal and inter-view redundancies at the cost of high computational complexity. In this paper, a novel perceptual distortion threshold model (PDTM) is proposed to reveal the relationship between the mode selection of inter-frame prediction and coding distortion threshold. Based on the proposed PDTM, a new fast inter-frame prediction algorithm in MVC is developed aimed at minimizing computational complexity for dependent viewcoding. Then the fast MVC algorithm is incorporated into the multi-view High Efficiency videocoding (MV-HEVC) software to improve MVC coding efficiency. In practical coding, the mode selection for inter-frame prediction of dependent views may be early terminated based on the thresholds derived from the PDTM, thereby reducing the coding time complexity. Experimental results demonstrate that the proposed algorithm can reduce the computational complexity of the dependent views by 52.9% compared with the HTM14.1 algorithm under the coding structure of hierarchical B pictures. Moreover, the bitrate is increased by 0.9% under the same subjective quality and only increased by 1.0% under the same objective quality peak signal-to-noise ratio (PSNR). Compared with the state-of-the-art fast algorithm, the proposed algorithm can save more coding time, while the bitrate under the same PSNR increases slightly.
The use of mixed spatial resolutions in multi-view video coding is a promising approach for codingvideos efficiently at low bitrates. It can achieve a perceived quality, which is close to the view with the highest qu...
详细信息
The use of mixed spatial resolutions in multi-view video coding is a promising approach for codingvideos efficiently at low bitrates. It can achieve a perceived quality, which is close to the view with the highest quality, according to the suppression theory of binocular vision. The aim of the work reported in this paper is to develop a new multi-view video coding technique suitable for low bitrate applications in terms of coding efficiency, computational and memory complexity, when codingvideos, which contain either a single or multiple scenes. The paper proposes a new prediction architecture that addresses deficiencies of prediction architectures for multi-view video coding based on H.264/AVC. The prediction architectures which are used in mixed spatial-resolution multi-view video coding (MSR-MVC) are afflicted with significant computational complexity and require significant memory size, with regards to coding time and to the minimum number of reference frames. The architecture proposed herein is based on a set of investigations, which explore the effect of different inter-view prediction directions on the coding efficiency of multi-view video coding, conduct a comparative study of different decimation and interpolation methods, in addition to analyzing block matching statistics. The proposed prediction architecture has been integrated with an adaptive reference frame ordering algorithm, to provide an efficient coding solution for multi-viewvideos with hard scene changes. The paper includes a comparative performance assessment of the proposed architecture against an extended architecture based on the 3D digital multimedia broadcast (3D-DMB) and the Hierarchical B-Picture (HBP) architecture, which are two most widely used architectures for MSR-MVC. The assessment experiments show that the proposed architecture needs less bitrate by on average 13.1 Kbps, less coding time by 14% and less memory consumption by 31.6%, compared to a corresponding codec, which deploy
video transmission over packet-switched networks usually suffers from packet losses. The use of the prediction loop in videocoding will cause these errors to propagate to subsequent frames, and thus significantly imp...
详细信息
video transmission over packet-switched networks usually suffers from packet losses. The use of the prediction loop in videocoding will cause these errors to propagate to subsequent frames, and thus significantly impacts on the received video quality. With the increasing number of cameras to capture the scene, robustly delivering multi-viewvideo over error-prone channels becomes a rather challenging task. A rate-distortion optimization algorithm is proposed to improve error resilience for multi-viewvideo transmission. A recursive model to estimate the end-to-end distortion is developed for multiviewvideocoding, in which the distortion model explicitly takes into consideration the inherent error resilience property of the hierarchical bi-prediction structure. Based on the proposed distortion model, end-to-end rate-distortion optimization criterion is employed to perform coding mode switching. Extensive experimental results demonstrate significant performance gains can be achieved for multi-viewvideo communication against transmission errors.
Light field (LF) technology has been popularly adopted by a wide range of conventional industries. However, one problem when dealing with LFs is the sheer size of data volume. There have been many multi-viewvideo cod...
详细信息
Light field (LF) technology has been popularly adopted by a wide range of conventional industries. However, one problem when dealing with LFs is the sheer size of data volume. There have been many multi-view video coding (MVC)-based LF videocoding methods reported in the literature, aiming at finding the best prediction structure for LF videocoding. It is clear that the number of possible prediction structures is unlimited, and it is also observed that the coding bit-rate can be reduced by increasing the number of bi-directionally encoded views in the prediction structure. However, none work has been conducted to analyze the relationship of the prediction structure with its coding performance. In light of this observation, we first design a new LF-MVC prediction structure by extending the inter-view prediction into a two-directional parallel structure. Analytical models for source coding rate and encoding time are developed to analyze their relationships with the prediction structure, and are proven to be well-matched to our experimental results. Experimental evaluation of two LF video sequences demonstrates that the proposed LF-MVC prediction structure can achieve a factor of 26% bit-rate reduction against the conventional MVC prediction structure for an LF video with 5 x 5 views, and a further 34% bit-rate reduction for an LF video with a larger 10 x 10 views. Compared with the state-of-the-art MVC-based LF videocoding prediction structures in the literature, LF-MVC can achieve the best coding performance, and with its high encoding efficiency, is well suited for deployment in practical LF-based 3D systems.
In this paper, a Wyner-Ziv (WZ) coding based error-resilient scheme is proposed for multi-viewvideo transmission over error-prone channels. At the encoder, the key frames of the odd views are protected by WZ encoding...
详细信息
In this paper, a Wyner-Ziv (WZ) coding based error-resilient scheme is proposed for multi-viewvideo transmission over error-prone channels. At the encoder, the key frames of the odd views are protected by WZ encoding to generate the auxiliary bit-stream alongside the multi-viewvideo coded bit-stream. At the decoder, error-concealed multi-view decoded frames are used as the side information (SI) for WZ decoding. Based on the study on the characteristics of multi-view video coding (MVC) and the propagating behavior of channel errors, a recursive model to estimate the transmission distortion is developed in the transform domain, in which the channel-induced distortion takes into consideration both motion and disparity compensation. With the proposed model, we propose a rate control strategy for WZ encoding to infer the minimum bit rate so as to correct the SI errors. The WZ bit rate estimation method exploits the correlation between the original bit-planes and the SI bit-planes as well as the bit-plane interdependency. Extensive experimental results show that the proposed error-resilient scheme outperforms Reed Solomon based forward error correction method by about 1.1 dB and outperforms the adaptive intra refresh algorithm by approximately 1.6 dB at the packet loss rate 10 %.
This article proposed an accurate disparity vector prediction (DVP) algorithm for multi-view video coding. Differing from traditional DVP that uses the information of motion vectors of neighboring blocks, the geometry...
详细信息
This article proposed an accurate disparity vector prediction (DVP) algorithm for multi-view video coding. Differing from traditional DVP that uses the information of motion vectors of neighboring blocks, the geometry of the camera position is utilized to calculate the parallax of different viewpoints in this algorithm and this parallax is the foundation of DVP. We jointly applied the Just-Noticeable-Difference human visual model to the DVP. After filtered using Gaussian function, the geometric DVP was obtained. Experimental results showed that the proposed method achieved significant data reduction and subjective/objective quality enhancement. (c) 2015 Wiley Periodicals, Inc.
Various types of multi-view camera systems have been proposed for capturing three dimensional scenes. Yet, color distributions among multi-view images remain inconsistent in most cases, degrading multi-viewvideo codi...
详细信息
Various types of multi-view camera systems have been proposed for capturing three dimensional scenes. Yet, color distributions among multi-view images remain inconsistent in most cases, degrading multi-view video coding performance. In this paper, we propose a color correction algorithm based on the camera characteristics to effectively solve such a problem. Initially, we model camera characteristics and estimate their coefficients by means of correspondences between views. To consider occlusion in multi-view images, correspondences are extracted via feature-based matching. During coefficient estimation with nonlinear regression, we remove outliers in the extracted correspondences. Consecutively, we generate lookup tables for each camera using the model and estimated coefficients. Such tables are employed for fast color converting in the final color correction process. The experimental results show that our algorithm enhances coding efficiency with gains of up to 0.9 and 0.8 dB for luminance and chrominance components, respectively. Further, the method also improves subjective viewing quality and reduces color distance between views.
In the applications of Free view TV, pre-estimated depth information is available to synthesize the intermediate views as well as to assist multi-view video coding. Existing view synthesis prediction schemes generate ...
详细信息
In the applications of Free view TV, pre-estimated depth information is available to synthesize the intermediate views as well as to assist multi-view video coding. Existing view synthesis prediction schemes generate virtual view picture only from interview pictures. However, there are many types of signal mismatches caused by depth errors, camera heterogeneity or illumination difference across views and these mismatches decrease the prediction capability of virtual view picture. In this paper, we propose an adaptive learning based view synthesis prediction algorithm to enhance the prediction capability of virtual view picture. This algorithm integrates least square prediction with backward warping to synthesize the virtual view picture, which not only utilizes the adjacent views information but also the temporal decoded information to adaptively learn the prediction coefficients. Experiments show that the proposed method reduces the bitrates by up to 18 % relative to the multi-view video coding standard, and about 11 % relative to the conventional view synthesis prediction method.
multi-view video coding (MVC) has been extended from H.264/AVC to improve the coding efficiency of multi-viewvideo. This paper proposes a fast mode decision algorithm which can make an early decision on the correct m...
详细信息
multi-view video coding (MVC) has been extended from H.264/AVC to improve the coding efficiency of multi-viewvideo. This paper proposes a fast mode decision algorithm which can make an early decision on the correct mode partition to solve the issue of the enormous computational complexity. The best modes of the reference views are utilized to determine the complexity of the macroblock (MB) in the current view, the mode candidates needed to be calculated can then be obtained according to the complexity. If the complexity is low or medium, the search range can be reduced. The threshold of the rate-distortion cost for the current MB is calculated using the co-located and neighboring MBs in previously coded view and is utilized as the criterion for early termination. The motion vector difference in the reference view is applied to dynamically adjust the search range in the current MB. Experimental results prove that the proposed algorithm achieves a time saving of 81.05% for a fast TZ search and 87.85% for full search, and still maintains quality performance and bitrate. (C) 2014 Elsevier B.V. All rights reserved.
暂无评论