Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, v...
详细信息
ISBN:
(纸本)9781728185514
Recently, the pre-processed video transcoding has attracted wide attention and has been increasingly used in practical applications for improving the perceptual experience and saving transmission resources. However, very few works have been conducted to evaluate the performance of pre-processing methods. In this paper, we select the source (SRC) videos and various pre-processing approaches to construct the first Pre-processed and Transcoded Video Database (PTVD). Then, we conduct the subjective experiment, showing that compared with the video sent to the codec directly at the same bitrate, the appropriate pre-processing methods indeed improve the perceptual quality. Finally, existing image/video quality metrics are evaluated on our database. The results indicate that the performance of the existing image/video quality assessment (IQA/VQA) approaches remain to be improved. We will make our database publicly available soon.
Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increase...
详细信息
ISBN:
(纸本)9781728185514
Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increased because of ocean engineering and underwater robotics. Existing underwater image enhancement methods are based on various assumptions. However, it is almost impossible to define appropriate assumptions for underwater images due to the diversity of underwater images. Therefore, they are only effective for specific types of underwater images. Recently, underwater image enhancement algorisms using CNNs and GANS have been proposed, but they are not as advanced as other imageprocessing methods due to the lack of suitable training data sets and the complexity of the issues. To solve the problems, we propose a novel underwater image enhancement method which combines the residual feature attention block and novel combination of multi-scale and multi-patch structure. Multi-patch network extracts local features to adjust to various underwater images which are often Non-homogeneous. In addition, our network includes multi-scale network which is often effective for image restoration. Experimental results show that our proposed method outperforms the conventional method for various types of images.
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limi...
详细信息
ISBN:
(纸本)9781728185514
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and propose a new GAN image detector based on a limited sub-sampling architecture and a suitable contrastive learning paradigm. Experiments carried out in challenging conditions prove the proposed method to be a first step towards universal GAN image detection, ensuring also good robustness to common image impairments, and good generalization to unseen architectures.
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attentio...
详细信息
ISBN:
(纸本)9781728185514
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attention areas when they observe the same stimulus. Research on the IOVC of video is interesting but lacking. In this paper, we first introduce the measurement to calculate the IOVC of video. And an eye-tracking experiment is conducted in a realistic movie-watching environment to establish a movie scene dataset. Then we propose a method to predict the IOVC of video, which employs a dual-channel network to extract and integrate content and optical flow features. The effectiveness of the proposed prediction model is validated on our dataset. And the correlation between inter-observer congruency and video emotion is analyzed.
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with la...
详细信息
ISBN:
(纸本)9781728185514
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with large motion and complex scenes, and are time-consuming and memory intensive. We propose an efficient STVSR framework, which can correctly handle complicated scenes such as occlusion and large motion and generate results with clearer texture. In REDS dataset, our method outperforms all existing one-stage methods. Our method is lightweight and can generate 720p frames at 16fps on a NVIDIA GTX 1080 Ti GPU.
Simulation of human visual system (HVS) is very crucial for fitting human perception and improving assessment performance in stereoscopic image quality assessment (SIQA). In this paper, a no-reference SIQA method cons...
详细信息
ISBN:
(纸本)9781728185514
Simulation of human visual system (HVS) is very crucial for fitting human perception and improving assessment performance in stereoscopic image quality assessment (SIQA). In this paper, a no-reference SIQA method considering feedback mechanism and orientation selectivity of HVS is proposed. In HVS, feedback connections are indispensable during the process of human perception, which has not been studied in the existing SIQA models. Therefore, we design a new feedback module (FBM) to realize the guidance of the high-level region of visual cortex to the low-level region. In addition, given the orientation selectivity of primary visual cortex cells, a deformable feature extraction block is explored to simulate it, and the block can adaptively select the regions of interest. Meanwhile, retinal ganglion cells (RGCs) with different receptive fields have different sensitivities to objects of different sizes in the image. So a new multi receptive fields information extraction and fusion manner is realized in the network structure. Experimental results show that the proposed model is superior to the state-of-the-art no-reference SIQA methods and has excellent generalization ability.
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction co...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction convolution neural network (CNN) model combined with novel binocular feature interaction consistent with human visual system (HVS). In order to simulate the characteristics of HVS sensing multi-scale information at the same time, parallel multi-scale feature extraction module (PMSFM) followed by compensation information is proposed. And modified convolutional block attention module (MCBAM) with less computational complexity is designed to generate visual attention maps for the multi-scale features extracted by the PMSFM. In addition, we employ cross-stacked strategy for multi-level binocular fusion maps and binocular disparity maps to simulate the hierarchical perception characteristics of HVS. Experimental results show that our method is superior to the state-of-the-art metrics and achieves an excellent performance.
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Expe...
详细信息
ISBN:
(纸本)9781728185514
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising...
详细信息
ISBN:
(纸本)9781665475921
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising results, most of them rely on the local deep features and neglect global relationship information between the local regions. Therefore, inevitably leading to over-smooth or detail loss results. To solve this issue, we design a UNet-based end-to-end architecture for image desnowing. Specially, to better characterize global information and preserve image detail, we combine Window-based Self-Attention (WSA) transformer block with Residue Spatial Attention (RSA) to build basic unit of our network. Besides, to protect the structure of the image effectively, we also introduce a Residue Channel (RC) loss to guide high-quality image restoration. Extensive experimental results on both synthetic and real-world datasets demonstrate that the proposed model achieves new state-of-the-art results.
Due to the large memory requirement and a large amount of computation, traditional deep learning networks cannot run on mobile devices as well as embedded devices. In this paper, we propose a new mobile architecture c...
详细信息
ISBN:
(纸本)9781665475921
Due to the large memory requirement and a large amount of computation, traditional deep learning networks cannot run on mobile devices as well as embedded devices. In this paper, we propose a new mobile architecture combining MobileNetV2 and pruning, which further decreases the Flops and number of parameters. The performance of MobileNetV2 has been widely demonstrated, and pruning operation can not only allow further model compression but also prevent overfitting. We have done ablation experiments at CIIP Tire Data for different pruning combinations. In addition, we introduced a global hyperparameter to effectively weigh the accuracy and precision. Experiments show that the accuracy of 98.3 % is maintained under the premise that the model size is only 804.5 KB, showing better performance than the baseline method.
暂无评论