The multiview video coding (MVC) extension of H.264/MPEG-4 AVC [1] is one of the most promising visual encoders for three-dimensional television and free viewpoint video applications. In this paper, we propose a joint...
详细信息
The multiview video coding (MVC) extension of H.264/MPEG-4 AVC [1] is one of the most promising visual encoders for three-dimensional television and free viewpoint video applications. In this paper, we propose a joint dense motion/disparity estimation algorithm, designed to replace the classical temporal/inter-view unit within MVC, which uses a block-based motion/disparity estimation. The motion vector fields and the disparity vector fields are therefore simultaneously derived using the stereo-motion consistency constraint in a set theoretic convex optimization framework. The obtained displacement vector fields are then jointly segmented by minimizing a rate-distortion cost function, in line with the multiple reference frame strategy used in H.264/MPEG-4 AVC. Experimental results demonstrate the benefits of the proposed method compared to the separated dense estimation scheme Or the block-based estimation technique. (C) 2009 Elsevier Inc. All rights reserved.
Occupying the most significant portion of global data traffic, video is being generated in almost every aspect of our life. Because of its huge volume, we are depending much more heavily on machine intelligence based ...
详细信息
Occupying the most significant portion of global data traffic, video is being generated in almost every aspect of our life. Because of its huge volume, we are depending much more heavily on machine intelligence based analysis. In the meantime, video coding technology has been continuously improved for better compression efficiency. However, the state-of-the-art video coding standards, such as H.265/HEVC and versatile video coding (VVC), are still designed assuming that the compressed video will be watched by a human later. Such a design is not optimal when the compressed video will be used by computer vision applications. While the human visual system (HVS) is consistently sensitive to the content with high contrast, the impact of pixels on computer vision algorithms is task driven. For example, because of the different categories of objects used to train detection algorithms, the influence of the same image content on those detectors also varies. Therefore, human oriented video coding strategies may not be optimal when the compressed signal is further processed by algorithms, as the encoder is unaware of the task specific information. In this article, taking object detection as an example, we propose a novel video coding strategy for computer vision. By protecting the information according to its importance for an object detector rather than for the human visual system, our proposed method has the potential to achieve a better object detection performance with the same bandwidth. The main contributions of our paper are: 1) the modeling of the relationship between object detection accuracy and bit rate;2) a back propagation based method to analyze the influence of each pixel on the detection of target objects;3) an object detection oriented bit allocation and codec control parameter determination scheme;4) an evaluation metric to compare the impact of video coding strategies on a given object detector over a predefined range of bit rate. Experimental results demonst
Image and video coding is an optimization problem. A successful image and video coding algorithm delivers a good tradeoff between visual quality and other coding performance measures, such as compression, complexity, ...
详细信息
Image and video coding is an optimization problem. A successful image and video coding algorithm delivers a good tradeoff between visual quality and other coding performance measures, such as compression, complexity, scalability, robustness, and security. In this paper, we follow two recent trends in image and video coding research. One is to incorporate human visual system (HVS) models to improve the current state-of-the-art of image and video coding algorithms by better exploiting the properties of the intended receiver. The other is to design rate scalable image and video codecs, which allow the extraction of coded visual information at continuously varying bit rates from a single compressed bitstream. Specifically, we propose a foveation scalable video coding (FSVC) algorithm which supplies good quality-compression performance as well as effective rate scalability. The key idea is to organize the encoded bitstream to provide the best decoded video at an arbitrary bit rate in terms of foveated visual quality measurement. A foveation-based HVS model plays an important role in the algorithm. The algorithm is adaptable to different applications, such as knowledge-based video coding and video communications over time-varying, multiuser and interactive networks.
In this paper, we propose an enhanced bi-prediction scheme based on the convolutional neural network (CNN) to improve the rate-distortion performance in video compression. In contrast to the traditional bi-prediction ...
详细信息
In this paper, we propose an enhanced bi-prediction scheme based on the convolutional neural network (CNN) to improve the rate-distortion performance in video compression. In contrast to the traditional bi-prediction strategy which computes the linear superposition as the predictive signals with pixel-to-pixel correspondence, the proposed scheme employs CNN to directly infer the predictive signals in a data-driven manner. As such, the predicted blocks are fused in a nonlinear fashion to improve the coding performance. Moreover, the patch-to-patch inference strategy with CNN also improves the prediction accuracy since the patch-level information for the prediction of each individual pixel can be exploited. The proposed enhanced bi-prediction scheme is further incorporated into the high-efficiency video coding standard, and the experimental results exhibit a significant performance improvement under different coding configurations.
A general multiple description video coding (MDVC) framework based on hierarchical B pictures is proposed in this paper. Two or more descriptions are generated by employing the hierarchical B pictures of H.264/AVC sca...
详细信息
A general multiple description video coding (MDVC) framework based on hierarchical B pictures is proposed in this paper. Two or more descriptions are generated by employing the hierarchical B pictures of H.264/AVC scalable extension, where temporal-level-based key pictures are selected in a staggered way among different descriptions. Based on this hierarchical and staggered structure, inter-description redundancy control is studied to achieve a good central/side-distortion-rate tradeoff. Moreover, to better exploit multiple complementary descriptions, a linear combination of received descriptions is employed to optimize decoding results. This proposed MDVC framework is H.264/AVC-compliant for each temporal scalable description. Some existing temporal-splitting MDVC techniques can be considered as a degraded case in the proposed structure. Experimental results validate the effectiveness of the proposed design for MDVC.
In this paper, we review the recent advances in the pipeline of omnidirectional video processing including projection and evaluation. Being distinct from the traditional video, the omnidirectional video, also called p...
详细信息
In this paper, we review the recent advances in the pipeline of omnidirectional video processing including projection and evaluation. Being distinct from the traditional video, the omnidirectional video, also called panoramic video or 360 degree video, is in the spherical domain, thus specialized tools are necessary. For this type of video, each picture should be projected to a 2-D plane for encoding and decoding, adapting to the input of existing video coding systems. Thus the coding influence of the projection and the accuracy of the evaluation method are very important in this pipeline. Recent advances, such as different projection methods benefiting video coding, specialized video quality evaluation metrics and optimized methods for transmission, are all presented and classified in this paper. In addition, the coding performances under different projection methods are specified. The future trends of omnidirectional video processing are also discussed. (C) 2018 Elsevier B.V. All rights reserved.
The application of different downsampling filters in video coding directly models visual information at lower resolutions and influences the compression performance of a chosen coding system. In wavelet-based scalable...
详细信息
The application of different downsampling filters in video coding directly models visual information at lower resolutions and influences the compression performance of a chosen coding system. In wavelet-based scalable video coding the spatial scalability is achieved by the application of wavelets as downsampling filters. However, characteristics of different wavelets influence the performance at targeting spatio-temporal decoding points. An analysis of different downsampling filters in popular wavelet-based scalable video coding schemes is presented. Evaluation is performed for both intra- and inter-coding schemes using wavelets and standard downsampling strategies. On the basis of the obtained results a new concept of inter-resolution prediction is proposed, which maximises the average performance using a combination of standard downsampling filters and wavelet-based coding.
Inter-frame temporal prediction in video coding causes distortion propagation among adjacent frames. As a result, frame level bit allocation and quantization control are intrinsically to be optimized with dependent ra...
详细信息
Inter-frame temporal prediction in video coding causes distortion propagation among adjacent frames. As a result, frame level bit allocation and quantization control are intrinsically to be optimized with dependent rate distortion optimization (RDO). Taking inter-frame distortion propagation into consideration, quantization parameter cascading (QPC) is an efficient technique for frame level quantization control in terms of dependent RDO. This paper proposes a temporal distortion propagation model by quantitatively evaluating the temporal distortion dependency. Block-level temporal trajectory is tracked via inter-prediction on down-sampled original frames to construct the temporal analysis chain, and then tree-style dependent analysis is implemented along the trajectory. The amounts of equivalent distortion propagated from the temporally adjacent frames are quantitatively measured. Then, a new distortion model is developed to facilitate measuring the RD (rate distortion) cost in terms of dependent RDO. Finally, a simplified trellis comprised of candidate quantization parameters (Qp) of the frames within one GOP, and the optimal Qp is searched via simplified dynamic programming to achieve global optimization. The simulation results verify that the frame level QPC algorithm achieves 3.46% BD-RATE saving on average, which is contributed by efficient bit allocation along temporal trajectory.
In order to improve 3D video coding efficiency, we propose methods to estimate rendered view distortion in synthesized views as a function of the depth map quantization error. Our approach starts by calculating the ge...
详细信息
In order to improve 3D video coding efficiency, we propose methods to estimate rendered view distortion in synthesized views as a function of the depth map quantization error. Our approach starts by calculating the geometric error caused by the depth map error based on the camera parameters. Then, we estimate the rendered view distortion based on the local video characteristics. The estimated rendered view distortion is used in the rate-distortion optimized mode selection for depth map coding. A Lagrange multiplier is derived using the proposed distortion metric, which is estimated based on an autoregressive model. Experimental results show the efficiency of the proposed methods, with average savings of 43% in depth map bitrate as compared with encoding the depth maps using the same coding tools but with the rate-distortion optimization based on the conventional distortion metric.
This paper proposes an efficient video coding method based on audio-visual attention, which is motivated by the fact that cross-modal interaction significantly affects humans' perception of multimedia content. Fir...
详细信息
ISBN:
(纸本)9781424442904
This paper proposes an efficient video coding method based on audio-visual attention, which is motivated by the fact that cross-modal interaction significantly affects humans' perception of multimedia content. First, we propose an audio-visual source localization method to locate the sound source in a video sequence. Then, its result is used for applying spatial blurring to video frames in order to reduce redundant high-frequency information and achieve coding efficiency. We demonstrate the effectiveness of the proposed method for H.264/AVC coding along with the results of a subjective evaluation.
暂无评论