We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple ...
详细信息
One key challenge to the learning-based image compression is that adaptive bit allocation is crucial for compression effectiveness but can hardly be trained into a neural network. Hereby, in this work, We presents an ...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
One key challenge to the learning-based image compression is that adaptive bit allocation is crucial for compression effectiveness but can hardly be trained into a neural network. Hereby, in this work, We presents an end-to-end trainable image compression framework, named Multi-scale Progressive Network (MPN) to achieve spatially variant bit allocation and rate control through the guidance of a novel learnable just noticeable distortion (JND) map. Specifically, MPN's encoder archives multi-scale feature representation through a three-branched structure. Each branch employs an independent feature extraction strategy for the specific receptive field and merge progressively under the guidance of corresponding learnable JND maps that generated by our proposed Bit-Allocation sub-Network (BAN), which make MPN focus on the areas where attract the human visual system (HVS) and preserve more texture of the image during the compression procedure. Finally, a hybrid objective function is introduced to further make MPN more efficient and mimic the discriminative characteristics of the human visual system (HVS). Experiments show that MPN significantly outperforms traditional JPEG, JPEG 2000 and few state-of-art learning-based methods by multi-scale structural similarity (MS-SSIM) index, and has the ability to produce the much better visual result with rich textures, sharp edges, and fewer artifacts.
Inspired by the progress of image and video super-resolution (SR) achieved by convolutional neural network (CNN), we propose a CNN-based residue SR method for video coding. Different from the previous works that opera...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
Inspired by the progress of image and video super-resolution (SR) achieved by convolutional neural network (CNN), we propose a CNN-based residue SR method for video coding. Different from the previous works that operate in the pixel domain, i.e. down- and up-sampling of image or video frame, we propose to perform down- and up-sampling in the residue domain. Specifically, for each block, we perform motion estimation and compensation to achieve residual signal at the original resolution, then we down-sample the residue and compress it at low resolution, and perform residue SR using a trained CNN model. We design a new CNN for residue SR with the help of the motion compensated prediction signal. We integrate the residue SR method into the High Efficiency Video Coding (HEVC) scheme, providing mode decision at the level of coding tree unit. Experimental results show that our method achieves on average 4.0% and 2.8% BD-rate reduction under low-delay P and low-delay B configurations, respectively.
Visual Quality Assessment of 3D/stereoscopic video (3D VQA) is significant for both quality monitoring and optimization of the existing 3D video services. In this paper, we build a 3D video database based on the lates...
详细信息
ISBN:
(纸本)9781509003556
Visual Quality Assessment of 3D/stereoscopic video (3D VQA) is significant for both quality monitoring and optimization of the existing 3D video services. In this paper, we build a 3D video database based on the latest 3D-HEVC video coding standard, to investigate the relationship among video quality, depth quality, and overall quality of experience (QoE) of 3D/stereoscopic video. We also analyze the pivotal factors to the video and depth qualities. Moreover, we develop a No-Reference 3D-HEVC bitstream-level objective video quality assessment model, which utilizes the key features extracted from the 3D video bitstreams to assess the perceived quality of the stereoscopic video. The model is verified to be effective on our database as compared with widely used 2D Full-Reference quality metrics as well as a state-of-the-art 3D FR pixel-level video quality metric.
In this paper, A SAR image simulation code of 3D complex targets named CASpatch is introduced. This code is based on the high frequency technique of shooting and bouncing rays (SBR). The original purpose to design the...
详细信息
The Dynamic Adaptive Streaming over HTTP (DASH) enables bitrate adaptation through different representations of the same content. It is common to encode random access point (RAP) pictures at segment boundaries to supp...
详细信息
ISBN:
(纸本)9781479983407
The Dynamic Adaptive Streaming over HTTP (DASH) enables bitrate adaptation through different representations of the same content. It is common to encode random access point (RAP) pictures at segment boundaries to support representation switching. As an open group of pictures (GOP) results into a temporary discontinuity of the video playback due to the inability to decode some pictures when switching representations, closed GOP prediction structures are normally used in DASH. This paper proposes two similar methods for using the open GOP prediction structure in DASH representations while a full picture rate is maintained also during representation switching. The first method is enabled with straightforward changes in the decoding of the High Efficiency Video Coding (H.265/HEVC) standard, whereas the second method utilizes the adaptive resolution change feature of the scalable (SHVC) extension of H.265/HEVC. Experiments show that the proposed methods outperform the use of closed GOPs by 5.6% on average in terms of Bjontegaard delta bitrate (BD-rate).
In the hybrid video coding framework, transform is adopted to exploit the dependency within the input signal. In this paper, we propose a deep learning-based nonlinear transform for intra coding. Specifically, we inco...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In the hybrid video coding framework, transform is adopted to exploit the dependency within the input signal. In this paper, we propose a deep learning-based nonlinear transform for intra coding. Specifically, we incorporate the directional information into the residual domain. Then, a convolutional neural network model is designed to achieve better decorrelation and energy compaction than the conventional discrete cosine transform. This work has two main contributions. First, we propose to use the intra prediction signal to reduce the directionality in the residual. Second, we present a novel loss function to characterize the efficiency of the transform during the training. To evaluate the compression performance of the proposed transform, we implement it into the High Efficiency Video Coding reference software. Experimental results demonstrate that the proposed method achieves up to 1.79% BD-rate reduction for natural videos.
Vehicle re-identification (re-id) plays an important role in the automatic analysis of the drastically increasing urban surveillance videos. Similar to the other image retrieval problems, vehicle re-id suffers from th...
详细信息
ISBN:
(纸本)9781509060689
Vehicle re-identification (re-id) plays an important role in the automatic analysis of the drastically increasing urban surveillance videos. Similar to the other image retrieval problems, vehicle re-id suffers from the difficulties caused by various poses of vehicles, diversified illuminations, and complicated environments. Triplet-wise training of convolutional neural network (CNN) has been studied to address these challenges, where the CNN is adopted to automate the feature extraction from images, and the training adopts triplets of (query, positive example, negative example) to capture the relative similarity between them to learn representative features. The traditional triplet-wise training is weakly constrained and thus fails to achieve satisfactory results. We propose to improve the triplet-wise training at two aspects: first, a stronger constraint namely classification-oriented loss is augmented with the original triplet loss; second, a new triplet sampling method based on pairwise images is designed. Our experimental results demonstrate the effectiveness of the proposed methods that achieve superior performance than the state-of-the-arts on two vehicle re-id datasets, which are derived from real-world urban surveillance videos.
Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Cl...
详细信息
In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in video...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression model to predict the perceptual video quality. Finally, experimental results demonstrate that our proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms.
暂无评论