Neural compression has benefited from technological advances such as convolutional neural networks (CNNs) to achieve advanced bitrates, especially in image compression. In neural image compression, an encoder and a de...
详细信息
ISBN:
(纸本)9781728185514
Neural compression has benefited from technological advances such as convolutional neural networks (CNNs) to achieve advanced bitrates, especially in image compression. In neural image compression, an encoder and a decoder can run in parallel on a GPU, so the speed is relatively fast. However, the conventional entropy coding for neural image compression requires serialized iterations in which the probability distribution is estimated by multi-layer CNNs and entropy coding is processed on a CPU. Therefore, the total compression and decompression speed is slow. We propose a fast, practical, GPU-intensive entropy coding framework that consistently executes entropy coding on a GPU through highly parallelized tensor operations, as well as an encoder, decoder, and entropy estimator with an improved network architecture. We experimentally evaluated the speed and rate-distortion performance of the proposed framework and found that we could significantly increase the speed while maintaining the bitrate advantage of neural image compression.
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-bas...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of multi-sensor fusion technology in various industrial fields, many composite images closely related to human life have been produced. To meet the rapidly growing needs of various image-based applications, we have established the first multi-source composite image (MSCI) database for image quality assessment (IQA). Our MSCI database contains 80 reference images and 1600 distorted images, generated by four advanced compression standards with five distortion levels. In particular, these five distortion levels are determined based on the first five just noticeable difference (JND) levels. Moreover, we verify the IQA performance of some representative methods on our MSCI database. The experimental results show that the performance of the existing methods on the MSCI database needs to be further improved.
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with la...
详细信息
ISBN:
(纸本)9781728185514
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with large motion and complex scenes, and are time-consuming and memory intensive. We propose an efficient STVSR framework, which can correctly handle complicated scenes such as occlusion and large motion and generate results with clearer texture. In REDS dataset, our method outperforms all existing one-stage methods. Our method is lightweight and can generate 720p frames at 16fps on a NVIDIA GTX 1080 Ti GPU.
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images ar...
详细信息
ISBN:
(纸本)9781728185514
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images are available in image quality assessment (IQA) field for deep learning, which heavily hinders the development and application for IQA. To tackle this problem, in this paper, we proposed an error self-learning semi-supervised method for no-reference (NR) IQA (ESSIQA), which is based on deep learning. We employed an advanced full reference (FR) IQA method to expand databases and supervise the training of network. In addition, the network outputs of expanding images were used as proxy labels replacing errors between subjective scores and objective scores to achieve error self-learning. Two weights of error back propagation were designed to reduce the impact of inaccurate outputs. The experimental results show that the proposed method yielded comparative effect.
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Expe...
详细信息
ISBN:
(纸本)9781728185514
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.
Saliency prediction can be treated as the activity of the human visual system (HVS). The most effective method should highly approximate the response of HVS to the perceived information. Motivated by that orientation ...
详细信息
ISBN:
(纸本)9781728180687
Saliency prediction can be treated as the activity of the human visual system (HVS). The most effective method should highly approximate the response of HVS to the perceived information. Motivated by that orientation selectivity (OS) mechanism occuring in primary visual cortex (PVC) tells us how the HVS extracts visual information for scene understanding, we propose a novel saliency model by combining an orientation selectivity based local feature called "excitement" map and a visual acuity based global feature called "acuity" map. Further, a saliency augmented operator based on visual error sensitivity is designed to enhance the saliency map. Experimental results on three benchmark databases demonstrate the superior performance of the proposed method compared to ten classical/state-of-the-art algorithms.
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which...
详细信息
ISBN:
(纸本)9781728185514
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
Versatile Video Coding (VVC) introduces dependent quantization (DQ) as a key coding tool, but it also has a high computational complexity. Therefore, optimizing DQ is crucial for practical VVC encoder implementations....
详细信息
Early diagnosis of skin diseases is crucial for effective treatment, preventing spread, minimizing long-Term damage, managing chronic conditions, detecting underlying health issues, and promoting psychological well-be...
详细信息
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional v...
详细信息
ISBN:
(纸本)9781728185514
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.
暂无评论