This paper addresses image resealing, the task of which is to downscale an input image followed by upscaling for the purposes of transmission, storage, or playback on heterogeneous devices. The state-of-the-art image ...
详细信息
ISBN:
(纸本)9781728185514
This paper addresses image resealing, the task of which is to downscale an input image followed by upscaling for the purposes of transmission, storage, or playback on heterogeneous devices. The state-of-the-art image resealing network (known as IRN) tackles image downscaling and upscaling as mutually invertible tasks using invertible affine coupling layers. In particular, for upscaling, IRN models the missing high-frequency component by an input-independent (case-agnostic) Gaussian noise. In this work, we take one step further to predict a case-specific high-frequency component from textures embedded in the downscaled image. Moreover, we adopt integer coupling layers to avoid quantizing the downscaled image. When tested on commonly used datasets, the proposed method, termed DIRECT, improves high-resolution reconstruction quality both subjectively and objectively, while maintaining visually pleasing downscaled images.
Deep learning-based single image super-resolution (SR) consistently shows superior performance compared to the traditional SR methods. However, most of these methods assume that the blur kernel used to generate the lo...
详细信息
ISBN:
(纸本)9781728185514
Deep learning-based single image super-resolution (SR) consistently shows superior performance compared to the traditional SR methods. However, most of these methods assume that the blur kernel used to generate the low-resolution (LR) image is known and fixed (e.g. bicubic). Since blur kernels involved in real-life scenarios are complex and unknown, performance of these SR methods is greatly reduced for real blurry images. Reconstruction of high-resolution (HR) images from randomly blurred and noisy LR images remains a challenging task. Typical blind SR approaches involve two sequential stages: i) kernel estimation;ii) SR image reconstruction based on estimated kernel. However, due to the ill-posed nature of this problem, an iterative refinement could be beneficial for both kernel and SR image estimate. With this observation, in this paper, we propose an image SR method based on deep learning with iterative kernel estimation and image reconstruction. Simulation results show that the proposed method outperforms state-of-the-art in blind image SR and produces visually superior results as well.
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide ...
详细信息
ISBN:
(纸本)9781728185514
As an emerging media format, virtual reality (VR) has attracted the attention of researchers. 6-DoF VR can reconstruct the surrounding environment with the help of the depth information of the scene, so as to provide users with immersive experience. However, due to the lack of depth information in panoramic image, it is still a challenge to convert panorama to 6-DOF VR. In this paper, we propose a new depth estimation method SPCNet based on spherical convolution to solve the problem of depth information restoration of panoramic image. Particularly, spherical convolution is introduced to improve depth estimation accuracy by reducing distortion, which is attributed to Equi-Rectangular Projection (ERP). The experimental results show that many indicators of SPCNet are better than other advanced networks. For example, RMSE is 0.419 lower than UResNet. Moreover, the threshold accuracy of depth estimation has also been improved.
Stereo image super-resolution (SR) has achieved great progress in recent years. However, the two major problems of the existing methods are that the parallax correction is insufficient and the cross-view information f...
详细信息
ISBN:
(纸本)9781728185514
Stereo image super-resolution (SR) has achieved great progress in recent years. However, the two major problems of the existing methods are that the parallax correction is insufficient and the cross-view information fusion only occurs in the beginning of the network. To address these problems, we propose a two-stage parallax correction and a multi-stage cross-view fusion network for better stereo image SR results. Specially, the two-stage parallax correction module consists of horizontal parallax correction and refined parallax correction. The first stage corrects horizontal parallax by parallax attention. The second stage is based on deformable convolution to refine horizontal parallax and correct vertical parallax simultaneously. Then, multiple cascaded enhanced residual spatial feature transform blocks are developed to fuse cross-view information at multiple stages. Extensive experiments show that our method achieves state-of-the-art performance on the KITTI2012, KITTI2015, Middlebury and Flickr1024 datasets.
Leaky prediction layered video coding (LPLC) partially includes the enhancement layer in the motion compensated prediction loop, by using a leaky factor between 0 and 1, to balance the coding efficiency and error resi...
详细信息
ISBN:
(纸本)0819452114
Leaky prediction layered video coding (LPLC) partially includes the enhancement layer in the motion compensated prediction loop, by using a leaky factor between 0 and 1, to balance the coding efficiency and error resilience performance. In this paper., rate distortion functions are derived for LPLC from rate distortion theory. Closed form expressions are obtained for two scenarios of LPLC, one where the enhancement layer stays intact and the other where the enhancement layer suffers from data rate truncation. The rate distortion performance of LPLC is then evaluated with respect to different choices of the leaky factor, demonstrating that the theoretical analysis well conforms with the operational results.
To speedup the image classification process which conventionally takes the reconstructed images as input, compressed domain methods choose to use the compressed images without decompression as input. Correspondingly, ...
详细信息
ISBN:
(纸本)9781665475921
To speedup the image classification process which conventionally takes the reconstructed images as input, compressed domain methods choose to use the compressed images without decompression as input. Correspondingly, there will be a certain decline about the accuracy. Our goal in this paper is to raise the accuracy of compressed domain classification method using compressed images output by the NN-based image compression networks. Firstly, we design a hybrid objective loss function which contains the reconstruction loss of deep feature map. Secondly, one image reconstruction layer is integrated into the image classification network for up-sampling the compressed representation. These methods greatly help increase the compressed domain image classification accuracy and need no extra computational complexity. Experimental results on the benchmark imageNet prove that our design outperforms the latest work ResNet-41 with a large accuracy gain, about 4.49% on the top-1 classification accuracy. Besides, the accuracy lagging behinds the method using reconstructed images is also reduced to 0.47%. Moreover, our designed classification network has the lowest computational complexity and model complexity.
Lookup tables (LUTs) are commonly used to speed up imageprocessing by handling complex mathematical functions like sine and exponential calculations. They are used in various applications such as camera image process...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Lookup tables (LUTs) are commonly used to speed up imageprocessing by handling complex mathematical functions like sine and exponential calculations. They are used in various applications such as camera imageprocessing, high-dynamic range imaging, and edge-preserving filtering. However, due to the increasing gap between computing and input/output performance, LUTs are becoming less effective. Even though specific circuits like SIMD can improve LUT efficiency, they still need to bridge the performance gap fully. The gap makes it difficult to choose between direct numerical and LUT calculations. For this problem, a register-LUTs method with the nearest neighbor was proposed;however, it is limited for functions with narrow-range values approaching zero. In this paper, we propose a method for using register LUTs to process images efficiently over a wide range of values. Our contributions include proposing register-LUT with linear interpolation for efficient computation, using a smaller data type for further efficiency, and suggesting an efficient data retrieving method.
Learned image compression (LIC) has shown its superior compression ability. Quantization is an inevitable stage to generate quantized latent for the entropy coding. To solve the non-differentiable problem of quantizat...
详细信息
ISBN:
(纸本)9781665475921
Learned image compression (LIC) has shown its superior compression ability. Quantization is an inevitable stage to generate quantized latent for the entropy coding. To solve the non-differentiable problem of quantization in the training phase, many differentiable approximated quantization methods have been proposed. However, the derivative of quantized latent to non-quantized latent are set as one in most of the previous methods. As a result, the quantization error between non-quantized and quantized latent is not taken into consideration in the gradient descent. To address this issue, we exploit the gradient scaling method to scale the gradient of non-quantized latent in the back-propagation. The experimental results show that we can outperform the recent LIC quantization methods.
This paper deals with the problem of estimating multiple motions at points where these motions are overlaid. We present a new approach that is based on block-matching and can deal with both transparent motions and occ...
详细信息
ISBN:
(纸本)0819452114
This paper deals with the problem of estimating multiple motions at points where these motions are overlaid. We present a new approach that is based on block-matching and can deal with both transparent motions and occlusions. We derive a block-matching constraint for an arbitrary number of moving layers. We use this constraint to design a hierarchical algorithm that can distinguish between the occurrence of single, transparent, and occluded motions and can thus select the appropriate local motion model. The algorithm adapts to the amount of noise in the image sequence by use of a statistical confidence test. The algorithm is further extended to deal with very noisy images by using a regularization based on Markov Random Fields. Performance is demonstrated on image sequences synthesized from natural textures with high levels of additive dynamic noise.
Pixel recovery with deep learning has shown to be very effective for a variety of low-level vision tasks like image super-resolution, denoising, and deblurring. Most existing works operate in the spatial domain, and t...
详细信息
ISBN:
(纸本)9781728185514
Pixel recovery with deep learning has shown to be very effective for a variety of low-level vision tasks like image super-resolution, denoising, and deblurring. Most existing works operate in the spatial domain, and there are few works that exploit the transform domain for image restoration tasks. In this paper, we present a transform domain approach for image deblocking using a deep neural network called DCTResNet. Our application is compressed video motion deblur, where the input video frame has blocking artifacts that make the deblurring task very challenging. Specifically, we use a block-wise Discrete Cosine Transform (DCT) to decompose the image into its low and high-frequency sub-band images and exploit the strong subband specific features for more effective deblocking solutions. Since JPEG also uses DCT for image compression, using DCT sub-band images for image deblocking helps to learn the JPEG compression prior to effectively correct the blocking artifacts. Our experimental results show that both PSNR and SSIM for DCTResNet perform more favorably than other state-of-the-art (SOTA) methods, while significantly faster in inference time.
暂无评论