Inspired by the progress of image and video super-resolution (SR) achieved by convolutional neural network (CNN), we propose a CNN-based residue SR method for video coding. Different from the previous works that opera...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
Inspired by the progress of image and video super-resolution (SR) achieved by convolutional neural network (CNN), we propose a CNN-based residue SR method for video coding. Different from the previous works that operate in the pixel domain, i.e. down- and up-sampling of image or video frame, we propose to perform down- and up-sampling in the residue domain. Specifically, for each block, we perform motion estimation and compensation to achieve residual signal at the original resolution, then we down-sample the residue and compress it at low resolution, and perform residue SR using a trained CNN model. We design a new CNN for residue SR with the help of the motion compensated prediction signal. We integrate the residue SR method into the High Efficiency Video Coding (HEVC) scheme, providing mode decision at the level of coding tree unit. Experimental results show that our method achieves on average 4.0% and 2.8% BD-rate reduction under low-delay P and low-delay B configurations, respectively.
—Existing generalization theories analyze the generalization performance mainly based on the model complexity and training process. The ignorance of the task properties, which results from the widely used IID assumpt...
详细信息
In this paper, we propose a learned scalable/progressive image compression scheme based on deep neural networks (DNN), named Bidirectional Context Disentanglement Network (BCD-Net). For learning hierarchical represent...
详细信息
One key challenge to the learning-based image compression is that adaptive bit allocation is crucial for compression effectiveness but can hardly be trained into a neural network. Hereby, in this work, We presents an ...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
One key challenge to the learning-based image compression is that adaptive bit allocation is crucial for compression effectiveness but can hardly be trained into a neural network. Hereby, in this work, We presents an end-to-end trainable image compression framework, named Multi-scale Progressive Network (MPN) to achieve spatially variant bit allocation and rate control through the guidance of a novel learnable just noticeable distortion (JND) map. Specifically, MPN's encoder archives multi-scale feature representation through a three-branched structure. Each branch employs an independent feature extraction strategy for the specific receptive field and merge progressively under the guidance of corresponding learnable JND maps that generated by our proposed Bit-Allocation sub-Network (BAN), which make MPN focus on the areas where attract the human visual system (HVS) and preserve more texture of the image during the compression procedure. Finally, a hybrid objective function is introduced to further make MPN more efficient and mimic the discriminative characteristics of the human visual system (HVS). Experiments show that MPN significantly outperforms traditional JPEG, JPEG 2000 and few state-of-art learning-based methods by multi-scale structural similarity (MS-SSIM) index, and has the ability to produce the much better visual result with rich textures, sharp edges, and fewer artifacts.
TOPSAR is an earth-imaging technique, which can provide wide swath coverage. The paper introduces a TOPSAR focusing and calibrating experiment based on the TOPSAR data acquired by Gaofen3(GF3). In this paper, we first...
ISBN:
(数字)9781728129129
ISBN:
(纸本)9781728129136
TOPSAR is an earth-imaging technique, which can provide wide swath coverage. The paper introduces a TOPSAR focusing and calibrating experiment based on the TOPSAR data acquired by Gaofen3(GF3). In this paper, we firstly derive the processor calibration factors under the demands of keeping signal energy invariant. After that, we fully analyze the impact of antenna electronic steering on TOPSAR products. Aimed to be applied to TOPSAR mode processing system of a SAR satellite, the next generation of GF3, calibration methods to processor and electronic steering was proposed in this paper.
Motion estimation and motion compensation are fundamental in video coding to remove the temporal redundancy between video frames. The current video coding schemes usually adopt block-based motion estimation and compen...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
Motion estimation and motion compensation are fundamental in video coding to remove the temporal redundancy between video frames. The current video coding schemes usually adopt block-based motion estimation and compensation using simple translational or affine motion models, which cannot efficiently characterize complex motions in natural video signal. In this paper, we propose a frame extrapolation method for motion estimation and compensation. Specifically, based on the several previous frames, our method directly extrapolates the current frame using a trained deep network model. The deep network we adopted is a redesigned Video Coding oriented LAplacian Pyramid of Generative Adversarial Networks (VC-LAPGAN). The extrapolated frame is then used as an additional reference frame. Experimental results show that the VC-LAPGAN is capable in estimating and compensating for complex motions, and extrapolating frames with high visual quality. Using the VC-LAPGAN, our method achieves on average 2.0% BD-rate reduction than High Efficiency Video Coding (HEVC) under low-delay P configuration.
Surveillance and security scenarios usually require high efficient facial image compression scheme for face recognition and identification. While either traditional general image codecs or special facial image compres...
详细信息
High efficient facial image compression is broadly required and challenging for surveillance and security scenarios, while either traditional general image codecs or special facial image compression schemes only heuri...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
High efficient facial image compression is broadly required and challenging for surveillance and security scenarios, while either traditional general image codecs or special facial image compression schemes only heuristically refine codec separately according to face verification accuracy metric. We propose an End-to-End Facial Image Compression (E2EFIC) framework with a novel variable block size Regionally Adaptive Pooling (RAP) module whose parameters can be automatically optimized according to gradient feedback from an integrated semantic distortion metrics, including a successful exploration to apply Generative Adversarial Network (GAN) as metric directly in image compression scheme. The experimental results verify the framework's efficiency by demonstrating performance improvement of 71.41%, 48.28% and 52.67% bitrate saving separately over JPEG2000, WebP and neural network-based codecs under the same face verification accuracy distortion metric. We also evaluate E2EFIC's superior performance gain compared with latest specific facial image codecs.
Lossy compression of image and video yields visually annoying artifacts including blocking, blurring, ringing, etc., especially at low bit rates. In-loop filtering techniques can reduce these artifacts, improve qualit...
详细信息
ISBN:
(纸本)9781538644591;9781538644584
Lossy compression of image and video yields visually annoying artifacts including blocking, blurring, ringing, etc., especially at low bit rates. In-loop filtering techniques can reduce these artifacts, improve quality, and achieve coding gain accordingly. In this paper, we present a convolutional neural network (CNN) based in-loop filter for High Efficiency Video Coding (HEVC). First, we design a new CNN structure that is composed of multiple Variable-filter-size Residue-learning blocks, namely VRCNN-ext, for artifact reduction. VRCNN-ext is trained by natural images as well as their compressed versions at different quality levels. Second, we investigate a new in-loop filter based on the trained VRCNN-ext models. Specifically, we observed that using VRCNN-ext directly on the inter pictures is not effective. To solve this problem, we further train a classifier to decide whether to use VRCNN-ext for each coding unit (CU). The classifier makes decision based on the compressed information, thus avoiding the overhead bits to control the on/off of the CNN-based filter at the CU level. Experimental results show that our scheme achieves significant bits saving than the HEVC anchor, leading to on average 9.2%, 9.6% and 7.4% BD-rate reduction on the HEVC test sequences, under all-intra, low-delay B and random-access configurations, respectively.
Person re-identification is a crucial task of identifying pedestrians of interest across multiple surveillance camera views. For person re-identification, a pedestrian is usually represented with features extracted fr...
详细信息
暂无评论