Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding...
详细信息
ISBN:
(纸本)9781665475921
Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding framework that utilizes forward wavelet transforms to decompose the input signal by spatial frequency. Our encoder generates separate bitstreams for each latent representation of low and high frequencies. This enables our decoder to selectively decode bitstreams in a quality-scalable manner. Hence, the decoder can produce an enhanced image by using an enhancement bitstream in addition to the base bitstream. Furthermore, our method is able to enhance only a specific region of interest (ROI) by using a corresponding part of the enhancement latent representation. Our experiments demonstrate that the proposed method shows competitive rate-distortion performance compared to several non-scalable image codecs. We also showcase the effectiveness of our two-level quality scalability, as well as its practicality in ROI quality enhancement.
One of the most striking properties of natural image statistics is their scale invariance. Intuitively, a natural image always contains the same contents of different scales and dually the same contents of same scale ...
详细信息
ISBN:
(纸本)9781479902880
One of the most striking properties of natural image statistics is their scale invariance. Intuitively, a natural image always contains the same contents of different scales and dually the same contents of same scale exist throughout scales of the image. Different from the previous scale invariance related work decomposing an image to its local band-pass filter components, this paper seeks a general model of the natural image paths distribution to describe the scale invariance in the visual world and then a novel strategy for high-fidelity image restoration is presented by characterizing nonlocal self-similarity of natural images throughout scales in a unified statistical manner, which offers a powerful mechanism of combining natural images scale invariance and nonlocal self-similarity simultaneously to ensure a more reliable and robust estimation. Extensive experiments on image restoration from partial random samples manifest that the proposed algorithm achieves significant performance improvements over the current state-of-the-art schemes.
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representa...
详细信息
ISBN:
(纸本)9781728185514
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representation Reconstruction Network (ATSR-Net), composed of the Convolutional Sparse Representation subnet (CSR-subnet) and the truly Adaptive Threshold Generation subnet (ATG-subnet). The traditional iterations are unfolded into several CSR-subnets, which can fully exploit the local and nonlocal similarities. The ATG-subnet automatically determines a threshold map based on the image intrinsic characterization for flexible feature selection. Moreover, we present a three-level consistency loss based on pixel-level, measurement-level, and feature-level, to accelerate the network convergence. Extensive experiment results demonstrate the superiority of the proposed network to the existing state-of-the-art methods by large margins, both quantitatively and qualitatively.
Pixel-wise image quality assessment (IQA) algorithms, such as mean square error (MSE), mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) correlate well with perceptual quality when dealing with images sh...
详细信息
ISBN:
(纸本)9781728180687
Pixel-wise image quality assessment (IQA) algorithms, such as mean square error (MSE), mean absolute error (MAE) and peak signal-to-noise ratio (PSNR) correlate well with perceptual quality when dealing with images sharing the same distortion type but not well when processingimages in different distortion types, which is inconsistent with human visual system (HVS). Although a large number of metrics based on image error has been proposed, there are still difficulties and limitations. To solve this problem, a full reference image quality assessment (FR-IQA) method based on MAE is proposed in this paper. The metric divides the image error (difference between distorted image and reference image) map into smooth region and texture-edge region, calculates their mean values respectively, and then gives them different weights considering the masking effect. The key innovation of this paper is to propose a distortion significance measurement, which is a visual quality coefficient that can effectively indicate the influence of different distortion types on perceptual quality and unify them with HVS. The segmented image error maps are weighted by the distortion significance coefficient. The experimental results on four largest benchmark databases show that the most of the distortions are successfully evaluated and the results are consistent with HVS.
As the explosive growth of the web image data, image tag ranking used for image retrieval accurately from mass images is becoming an active research topic. However, the existing ranking approaches are not very ideal, ...
详细信息
ISBN:
(纸本)9781479902880
As the explosive growth of the web image data, image tag ranking used for image retrieval accurately from mass images is becoming an active research topic. However, the existing ranking approaches are not very ideal, which remains to be improved. This paper proposed a new image tag saliency ranking algorithm based on sparse representation. we firstly propagate labels from image-level to region-level via Multi-instance Learning driven by sparse representation, which means reconstructing the target instance from positive bag via the sparse linear combination of all the instances from training set, instances with nonzero reconstruction coefficients are considered to be similar to the target instance;then visual attention model is used for tag saliency analysis. Comparing with the existing approaches, the proposed method achieves a better effect and shows a better performance.
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attentio...
详细信息
ISBN:
(纸本)9781728185514
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attention areas when they observe the same stimulus. Research on the IOVC of video is interesting but lacking. In this paper, we first introduce the measurement to calculate the IOVC of video. And an eye-tracking experiment is conducted in a realistic movie-watching environment to establish a movie scene dataset. Then we propose a method to predict the IOVC of video, which employs a dual-channel network to extract and integrate content and optical flow features. The effectiveness of the proposed prediction model is validated on our dataset. And the correlation between inter-observer congruency and video emotion is analyzed.
This paper used Time-Frequency Analysis (TFA) techniques for signal processing on tasks of computer vision. Our main idea is as follows: To build a simple network architecture without two or more convolutional neural ...
详细信息
ISBN:
(纸本)9781665475921
This paper used Time-Frequency Analysis (TFA) techniques for signal processing on tasks of computer vision. Our main idea is as follows: To build a simple network architecture without two or more convolutional neural networks (CNNs), analyze hidden features by Discrete Wavelet Transform (DWT), and send them into filters as weights by convolutions, transformers or other methods. And we do not need to build the network with 2 or more stages to accomplish this idea. Actually, we try to directly use TFA skills on CNN to build one-stage network. Networks which build by this way not only keep their outstanding performance, but also cost lower computing resources. In this paper, we mainly use DWT on CNN to solve image inpainting problems. And the results show that our model can work stably in frequency domain to realize free-form image inpainting.
Existing view interpolation methods like DIBR require accurate disparity map which greatly limits their applications. To this aim, we propose a novel mesh-based view interpolation algorithm capable of synthesizing vis...
详细信息
ISBN:
(纸本)9781509053162
Existing view interpolation methods like DIBR require accurate disparity map which greatly limits their applications. To this aim, we propose a novel mesh-based view interpolation algorithm capable of synthesizing visually coherent virtual views with rough disparity map estimated by stereo matching algorithms. We adopt an edge-aware mesh cutting method to explicitly handle occlusion and preserve sharp depth discontinuities. Experiments on Middlebury dataset and 3D-HEVC test sequences demonstrate that proposed method outperforms DIBR and state-of-the-art mesh-based view interpolation algorithm in terms of visual quality and PSNR.
Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently...
详细信息
ISBN:
(纸本)9781479902880
Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently. Ideally, implementation of CSI provides lossless compression in image coding. In this paper, we consider the lossy compression of the CS measurements in CSI system. We design a universal quantizer for the CS measurements of any input image. The proposed method firstly establishes a universal probability model for the CS measurements in advance, without knowing any information of the input image. Then a fast quantizer is designed based on this established model. Simulation result demonstrates that the proposed method has nearly optimal rate-distortion (R similar to D) performance, meanwhile, maintains a very low computational complexity at the CS encoder.
In this paper we propose an efficient multi-phase image segmentation for color images based on the piecewise constant multi-phase Vese-Chan model and the split Bregman method. The proposed model is first presented in ...
详细信息
ISBN:
(纸本)9781479902880
In this paper we propose an efficient multi-phase image segmentation for color images based on the piecewise constant multi-phase Vese-Chan model and the split Bregman method. The proposed model is first presented in a four-phase level set formulation and then extended to a multi-phase formulation. The four-phase and multi-phase energy functionals are defined and the corresponding minimization problems of the proposed active contour model are presented. The split Bregman method is applied to minimize the multi-phase energy functional efficiently. The proposed model has been applied to synthetic and real color images with promising results. The advantages of the proposed active contour model have been demonstrated by numerical results.
暂无评论