Simulation of human visual system (HVS) is very crucial for fitting human perception and improving assessment performance in stereoscopic image quality assessment (SIQA). In this paper, a no-reference SIQA method cons...
详细信息
ISBN:
(纸本)9781728185514
Simulation of human visual system (HVS) is very crucial for fitting human perception and improving assessment performance in stereoscopic image quality assessment (SIQA). In this paper, a no-reference SIQA method considering feedback mechanism and orientation selectivity of HVS is proposed. In HVS, feedback connections are indispensable during the process of human perception, which has not been studied in the existing SIQA models. Therefore, we design a new feedback module (FBM) to realize the guidance of the high-level region of visual cortex to the low-level region. In addition, given the orientation selectivity of primary visual cortex cells, a deformable feature extraction block is explored to simulate it, and the block can adaptively select the regions of interest. Meanwhile, retinal ganglion cells (RGCs) with different receptive fields have different sensitivities to objects of different sizes in the image. So a new multi receptive fields information extraction and fusion manner is realized in the network structure. Experimental results show that the proposed model is superior to the state-of-the-art no-reference SIQA methods and has excellent generalization ability.
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional v...
详细信息
ISBN:
(纸本)9781728185514
Light field displays project hundreds of microparallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.
Haze restricts visual quality and degrades the quality of captured images. The aim of single image dehazing is to recover a haze-free image from a hazy one. However, most present image dehazing methods treat different...
详细信息
ISBN:
(纸本)9781665450850
Haze restricts visual quality and degrades the quality of captured images. The aim of single image dehazing is to recover a haze-free image from a hazy one. However, most present image dehazing methods treat different feature information in channels and pixels evenly, which may influence the dehazing result because of the uneven distribution of haze. To address hazy uneven distribution, we propose an end-to-end two-subnet attention network (TSANet), which consists of attention-recurrent (AR) and asymmetric u-shaped dehazing refinement (AUDR). In addition, a feature residual attention (FRA) block is designed to focus on thick-hazy regions and high-frequency regions of a hazy image when dehazing. In the model, the input image is first fed into the AR sub-network to extract feature information like thick hazy regions and high-frequency regions. For further feature refinement, we propose the AUDR sub-network to further process feature information from the AR sub-network. The AUDR sub-network adopts an encoder-decoder module containing FRA and transformer blocks to further process feature information of high-frequency regions and filter hazy feature information, and uses skip connections to enhance the representation of our TSANet. The extensive experimental results demonstrate the effectiveness of our method and outperform other dehazing methods on synthetic and real-world hazy datasets.
Nowadays, Typeface plays an increasingly important role in dynamic digital interfaces, but there still has little direct evaluation of visualimage perception related to the typeface design, especially for the use in ...
详细信息
ISBN:
(纸本)9781665424257
Nowadays, Typeface plays an increasingly important role in dynamic digital interfaces, but there still has little direct evaluation of visualimage perception related to the typeface design, especially for the use in interface typography. The research is based on the analysis of display screen, elaborates upon the connection between display resolution and typeface design, the relationship between display polarity and the principle of vision optics. Furthermore, essential attributes and requirements of the two genre of interface font are inspected from the human visualimage perception. Additionally, the visualprocessing of text information and visual characteristics in scanning state are elaborated, visual Angle and spatial frequency of visual perception are identified as the cornerstones influencing the design of a typeface for user interface. The methodology of visual perception can be adapted to investigate questions relevant to typographic and typeface design.
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limi...
详细信息
ISBN:
(纸本)9781728185514
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and propose a new GAN image detector based on a limited sub-sampling architecture and a suitable contrastive learning paradigm. Experiments carried out in challenging conditions prove the proposed method to be a first step towards universal GAN image detection, ensuring also good robustness to common image impairments, and good generalization to unseen architectures.
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attentio...
详细信息
ISBN:
(纸本)9781728185514
There are individual differences in human visual attention between observers when viewing the same scene. Inter-observer visual congruency (IOVC) describes the dispersion between different people's visual attention areas when they observe the same stimulus. Research on the IOVC of video is interesting but lacking. In this paper, we first introduce the measurement to calculate the IOVC of video. And an eye-tracking experiment is conducted in a realistic movie-watching environment to establish a movie scene dataset. Then we propose a method to predict the IOVC of video, which employs a dual-channel network to extract and integrate content and optical flow features. The effectiveness of the proposed prediction model is validated on our dataset. And the correlation between inter-observer congruency and video emotion is analyzed.
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering th...
详细信息
ISBN:
(纸本)9781728185514
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering them with existing Depth image-Based Rendering (DIBR) approaches, or to triangulate their surface with Structure-from-Motion (SfM). In this paper, we propose an extension of the DIBR paradigm to describe these non-linearities, by replacing the depth maps by more complete multi-channel "non-Lambertian maps", without attempting a 3D reconstruction of the scene. We provide a study of the importance of each coefficient of the proposed map, measuring the trade-off between visual quality and data volume to optimally render non-Lambertian objects. We compare our method to other state-of-the-art image-based rendering methods and outperform them with promising subjective and objective results on a challenging dataset.
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Expe...
详细信息
ISBN:
(纸本)9781728185514
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with la...
详细信息
ISBN:
(纸本)9781728185514
Increasing the spatial resolution and frame rate of a video simultaneously has attracted attention in recent years. The current one-stage space-time video super-resolution (STVSR) methods are difficult to deal with large motion and complex scenes, and are time-consuming and memory intensive. We propose an efficient STVSR framework, which can correctly handle complicated scenes such as occlusion and large motion and generate results with clearer texture. In REDS dataset, our method outperforms all existing one-stage methods. Our method is lightweight and can generate 720p frames at 16fps on a NVIDIA GTX 1080 Ti GPU.
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate featur...
详细信息
ISBN:
(纸本)9781728185514
image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the nonlinear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
暂无评论