Neural compression has benefited from technological advances such as convolutional neural networks (CNNs) to achieve advanced bitrates, especially in image compression. In neural image compression, an encoder and a de...
详细信息
ISBN:
(纸本)9781728185514
Neural compression has benefited from technological advances such as convolutional neural networks (CNNs) to achieve advanced bitrates, especially in image compression. In neural image compression, an encoder and a decoder can run in parallel on a GPU, so the speed is relatively fast. However, the conventional entropy coding for neural image compression requires serialized iterations in which the probability distribution is estimated by multi-layer CNNs and entropy coding is processed on a CPU. Therefore, the total compression and decompression speed is slow. We propose a fast, practical, GPU-intensive entropy coding framework that consistently executes entropy coding on a GPU through highly parallelized tensor operations, as well as an encoder, decoder, and entropy estimator with an improved network architecture. We experimentally evaluated the speed and rate-distortion performance of the proposed framework and found that we could significantly increase the speed while maintaining the bitrate advantage of neural image compression.
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-bas...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-based applications, we have established the first multi-source composite image (MSCI) database for image quality assessment (IQA). Our MSCI database contains 80 reference images and 1600 distorted images, generated by four advanced compression standards with five distortion levels. In particular, these five distortion levels are determined based on the first five just noticeable difference (JND) levels. Moreover, we verify the IQA performance of some representative methods on our MSCI database. The experimental results show that the performance of the existing methods on the MSCI database needs to be further improved.
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images ar...
详细信息
ISBN:
(纸本)9781728185514
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images are available in image quality assessment (IQA) field for deep learning, which heavily hinders the development and application for IQA. To tackle this problem, in this paper, we proposed an error self-learning semi-supervised method for no-reference (NR) IQA (ESSIQA), which is based on deep learning. We employed an advanced full reference (FR) IQA method to expand databases and supervise the training of network. In addition, the network outputs of expanding images were used as proxy labels replacing errors between subjective scores and objective scores to achieve error self-learning. Two weights of error back propagation were designed to reduce the impact of inaccurate outputs. The experimental results show that the proposed method yielded comparative effect.
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with la...
详细信息
ISBN:
(纸本)9781728185514
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with large motion and complex scenes, and are time-consuming and memory intensive. We propose an efficient STVSR framework, which can correctly handle complicated scenes such as occlusion and large motion and generate results with clearer texture. In REDS dataset, our method outperforms all existing one-stage methods. Our method is lightweight and can generate 720p frames at 16fps on a NVIDIA GTX 1080 Ti GPU.
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which...
详细信息
ISBN:
(纸本)9781728185514
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Expe...
详细信息
ISBN:
(纸本)9781728185514
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.
Depth image upsampling is an important issue in three-dimensional (3D) applications. However, edge blurring artifacts are still challenging problems in depth image upsampling, resulting in jagged artifacts in synthesi...
详细信息
ISBN:
(纸本)9781479902880
Depth image upsampling is an important issue in three-dimensional (3D) applications. However, edge blurring artifacts are still challenging problems in depth image upsampling, resulting in jagged artifacts in synthesized views which produce unpleasant visual perception. In this paper, an edge-preserving single depth image interpolation (ESDI) method is proposed. Specifically, local planar hypothesis (LPH) assuming that depth in natural scene are clustered as local planar planes is first explored. Then finite candidates generation (FCG) is proposed to generate limited discrete values satisfied with LPH to interpolated pixels. At last, the optimal combination of candidates is formulated as an energy minimization problem with a constraint in gradient domain, solved by iterated conditional modes (ICM) algorithm. Experiments demonstrate that ESDI achieves high resolution (HR) depth image with clear and sharp edges, and produces synthesized views with desirable quality.
The key task in image set compression is how to efficiently remove set redundancy among images and within a single image. In this paper, we propose the first multi-model prediction (MoP) method for image set compressi...
详细信息
ISBN:
(纸本)9781479902880
The key task in image set compression is how to efficiently remove set redundancy among images and within a single image. In this paper, we propose the first multi-model prediction (MoP) method for image set compression to significantly reduce inter image redundancy. Unlike the previous prediction methods, our MoP enhances the correlation between images using feature-based geometric multi-model fitting. Based on estimated geometric models, multiple deformed prediction images are generated to reduce geometric distortions in different image regions. The block-based adaptive motion compensation is then adopted to further eliminate local variances. Experimental results demonstrate the advantage of our approach, especially for images with complicated scenes and geometric relationships.
Depth for single image is a hot problem in computer vision, which is very important to 2D/3D image conversion. Generally, depth of the object in the scene varies with the amount of blur in the defocus images. So, dept...
详细信息
ISBN:
(纸本)9781479902880
Depth for single image is a hot problem in computer vision, which is very important to 2D/3D image conversion. Generally, depth of the object in the scene varies with the amount of blur in the defocus images. So, depth in the scene can be recovered by measuring the blur. In this paper, a new method for depth estimation based on focus/defocus cue is presented, where the entropy of high frequency subband of wavelet decomposition is regarded as the measure of blur. The proposed method, which is unnecessary to select threshold, can provide pixel-level depth map. The experimental results show that this method is effective and reliable.
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional v...
详细信息
ISBN:
(纸本)9781728185514
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.
暂无评论