We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing t...
详细信息
ISBN:
(数字)9781728180687
ISBN:
(纸本)9781728180694
We propose to improve neural network-based compression artifact reduction by transmitting side information for the neural network. The side information consists of artifact descriptors that are obtained by analyzing the original and compressed images in the encoder. In the decoder, the received descriptors are used as additional input to a well-designed conditional post-processing neural network. To reduce the transmission overhead, the entire model is optimized under the rate-distortion constraint via end-to-end learning. Experimental results show that introducing the side information greatly improves the ability of the post-processing neural network, and improves the rate-distortion performance.
In this paper, we propose a learned scalable/progressive image compression scheme based on deep neural networks (DNN), named Bidirectional Context Disentanglement Network (BCD-Net). For learning hierarchical represent...
详细信息
In all of the existing block-based image and video coding standards, blocks are processed in the fixed scan order. Then in HEVC intra coding, intra prediction is always based on the top and/or left neighboring reconst...
详细信息
ISBN:
(纸本)9781509053179
In all of the existing block-based image and video coding standards, blocks are processed in the fixed scan order. Then in HEVC intra coding, intra prediction is always based on the top and/or left neighboring reconstructed pixels, which incurs less accurate prediction for blocks where the spatial correlation is not along the topleft-to-bottomright direction. To obtain better intra prediction, we propose to flexibly determine the coding order of blocks in HEVC intra coding. Complying with the hierarchical quadtree structure in HEVC, our flexible block ordering (FBO) technique recursively decides the coding order of four sub-blocks when splitting one block. Moreover, we propose new methods to perform inter/extrapolation for intra prediction so as to fully utilize neighboring reconstructed pixels, not always being top/left. Experimental results show that our proposed FBO technique achieves on average 2.9% BD-rate reduction compared to HEVC baseline.
The High Efficiency Video Coding (HEVC) with the transform bypass mode is simple but inefficient for lossless coding. For this reason, we propose a novel transform to further eliminate the redundancy between residues ...
详细信息
ISBN:
(纸本)9781479934331
The High Efficiency Video Coding (HEVC) with the transform bypass mode is simple but inefficient for lossless coding. For this reason, we propose a novel transform to further eliminate the redundancy between residues of different blocks in intra prediction. Dependent on intra prediction modes, the proposed transform is adaptable to exploit correlations of residues formed by different modes. In order to accurately obtain parameters of the transform matrix, an approach similar to the Wiener filtering method is adopted. Experimental results show that on top of the lossless coding mode in HEVC, our method offers the performance with a 7.4% bit-rate reduction on average for All Intra Main configuration. Compared with other representative algorithms, our proposal still shows an improvement in the compression ratio, without substantial increases of computational complexity in the encoder or decoder.
Rapid growing intelligent applications require optimized bit allocation in image/video coding to support specific task-driven scenarios such as detection, classification, segmentation, etc. Some learning-based framewo...
详细信息
In this paper, we propose a novel deep architecture with multiple classifiers for continuous sign language recognition. Representing the sign video with a 3D convolutional residual network and a bidirectional LSTM, we...
详细信息
In this paper, we propose a novel deep architecture with multiple classifiers for continuous sign language recognition. Representing the sign video with a 3D convolutional residual network and a bidirectional LSTM, we formulate continuous sign language recognition as a grammatical-rule-based classification problem. We first split a text sentence of sign language into isolated words and n-grams, where an n-gram is a sequence of consecutive n words in a sentence. Then, we propose a word-independent classifiers (WIC) module and an n-gram classifier (NGC) module to identify the words and n-grams in a sentence, respectively. A greedy decoding algorithm is employed to integrate words and n-grams into the sentence based on the confidence scores provided by both modules. Our method is evaluated on a Chinese continuous sign language recognition benchmark, and the experimental results demonstrate its effectiveness and superiority.
Query difficulty estimation (QDE) attempts to automatically predict the performance of the search results returned for a given query. QDE has been widely investigated in text document retrieval for many years. However...
详细信息
ISBN:
(纸本)9781479947607
Query difficulty estimation (QDE) attempts to automatically predict the performance of the search results returned for a given query. QDE has been widely investigated in text document retrieval for many years. However, few research works have been explored in image retrieval. State-of-the-art QDE methods in image retrieval mainly investigate the statistical characteristics (coherence, robustness, etc.) of the returned images to derive a value for indicating the query difficulty degree. To the best of our knowledge, little research has been done to directly estimate the real retrieval performance of the search results, such as average precision, instead of only an indicator. In this paper, we propose a novel query difficulty estimation approach which automatically estimate the average precision of the image search results. Specifically, we first select a set of query relevant and query irrelevant images for each query via pseudo relevance feedback. Then an efficient and effective voting scheme is proposed to estimate the relevance label of each image in the search results. Based on the images' relevance labels, the average precision of the search results returned for the given query is derived. The experimental results on a benchmark image search dataset demonstrate the effectiveness of the proposed method.
Objective quality assessment of stereoscopic panoramic images becomes a challenging problem owing to the rapid growth of 360-degree contents. Different from traditional 2D image quality assessment (IQA), more complex ...
Objective quality assessment of stereoscopic panoramic images becomes a challenging problem owing to the rapid growth of 360-degree contents. Different from traditional 2D image quality assessment (IQA), more complex aspects are involved in 3D omnidirectional IQA, especially unlimited field of view (FoV) and extra depth perception, which brings difficulty to evaluate the quality of experience (QoE) of 3D omnidirectional images. In this paper, we propose a multi-viewport based full-reference stereo 360 IQA model. Due to the freely changeable viewports when browsing in the head-mounted display, our proposed approach processes the image inside FoV rather than the projected one such as equirectangular projection (ERP). In addition, since overall QoE depends on both image quality and depth perception, we utilize the features estimated by the difference map between left and right views which can reflect disparity. The depth perception features along with binocular image qualities are employed to further predict the overall QoE of 3D 360 images. The experimental results on our public Stereoscopic OmnidirectionaL Image quality assessment Database (SOLID) show that the proposed method achieves a significant improvement over some well-known IQA metrics and can accurately reflect the overall QoE of perceived images.
Motion estimation and motion compensation are fundamental in video coding to remove the temporal redundancy between video frames. The current video coding schemes usually adopt block-based motion estimation and compen...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
Motion estimation and motion compensation are fundamental in video coding to remove the temporal redundancy between video frames. The current video coding schemes usually adopt block-based motion estimation and compensation using simple translational or affine motion models, which cannot efficiently characterize complex motions in natural video signal. In this paper, we propose a frame extrapolation method for motion estimation and compensation. Specifically, based on the several previous frames, our method directly extrapolates the current frame using a trained deep network model. The deep network we adopted is a redesigned Video Coding oriented LAplacian Pyramid of Generative Adversarial Networks (VC-LAPGAN). The extrapolated frame is then used as an additional reference frame. Experimental results show that the VC-LAPGAN is capable in estimating and compensating for complex motions, and extrapolating frames with high visual quality. Using the VC-LAPGAN, our method achieves on average 2.0% BD-rate reduction than High Efficiency Video Coding (HEVC) under low-delay P configuration.
Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, t...
ISBN:
(数字)9781728123455
ISBN:
(纸本)9781728123462
Semantic segmentation is a fundamental task in indoor scene understanding. Most previous supervised approaches rely on densely annotated image data sets. Due to the limited amount of images with segmentation labels, the performance of existing networks is greatly limited. In this paper, we exploit temporal correlation in video frames to improve the performance and robustness of segmentation networks. Two effective learning strategies are proposed to propagate the information from a few labeled frames to their immediate neighbor frames. First, we scale up training dataset for supervised semantic segmentation networks by generating pseudo ground-truth for neighboring frames from a labeled frame using filtered homography transformation. Furthermore, we introduce a self-supervised loss function to ensure temporal consistency between the segmentation results of adjacent frames. The experimental results demonstrate that our proposed method outperforms state-of-the-art techniques for semantic segmentation on NYU-Depth V2 dataset.
暂无评论