The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limi...
详细信息
ISBN:
(纸本)9781728185514
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and propose a new GAN image detector based on a limited sub-sampling architecture and a suitable contrastive learning paradigm. Experiments carried out in challenging conditions prove the proposed method to be a first step towards universal GAN image detection, ensuring also good robustness to common image impairments, and good generalization to unseen architectures.
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we pre...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we present visual analyses on latent features of learned image compression and find that the latent variables are spread over a wide range, which may lead to complex entropy coding processes. To address this, we introduce a Deviation Control (DC) method, which applies a constraint loss on latent features and entropy parameter mu. Training with DC loss, we obtain latent features with smaller values of coding symbols and s, effectively reducing entropy coding complexity. Our experimental results show that the plug-and-play DC loss reduces entropy coding time by 30-40% and improves compression performance.
This paper proposes a new neural network for enhancing underexposed images. Instead of the decomposition method based on Retinex theory, we introduce smooth dilated convolution to estimate global illumination of the i...
详细信息
ISBN:
(纸本)9781728180687
This paper proposes a new neural network for enhancing underexposed images. Instead of the decomposition method based on Retinex theory, we introduce smooth dilated convolution to estimate global illumination of the input image, and implement an end-to-end learning network model. Based on this model, we formulate a multi-term loss function that combines content, color, texture and smoothness losses. Our extensive experiments demonstrate that this method is superior to other methods in underexposed image enhancement. It can cover more color details and be applied to various underexposed images robustly.
We propose a novel method of arbitrarily focused image generation using multiple differently focused images. First, we describe our previously proposed select and merge method for all focused image acquisition. We can...
详细信息
We propose a novel method of arbitrarily focused image generation using multiple differently focused images. First, we describe our previously proposed select and merge method for all focused image acquisition. We can get good results by using this method but it is not easy to extend this method for generating arbitrarily focused images. Then, based on the assumption that depth of a scene changes stepwise, we derive a formula for reconstruction between the desired arbitrarily focused image and multiple acquired images;we can reconstruct the arbitrarily focused image by iterative use of the formula. We also introduce coarse-to-fine estimation of point spread functions (PSFs) of the acquired images. We reconstruct arbitrarily focused images for a natural scene. In other words, we simulate virtual cameras and generate images focused on arbitrary depths. (C) 1998 SPIE and IS&T. [S1017-9909(98)02201-6].
An image anomaly localization method based on the successive subspace learning (SSL) framework, called AnomalyHop, is proposed in this work. AnomalyHop consists of three modules: 1) feature extraction via successive s...
详细信息
ISBN:
(纸本)9781728185514
An image anomaly localization method based on the successive subspace learning (SSL) framework, called AnomalyHop, is proposed in this work. AnomalyHop consists of three modules: 1) feature extraction via successive subspace learning (SSL), 2) normality feature distributions modeling via Gaussian models, and 3) anomaly map generation and fusion. Comparing with state-of-the-art image anomaly localization methods based on deep neural networks (DNNs), AnomalyHop is mathematically transparent, easy to train, and fast in its inference speed. Besides, its area under the ROC curve (ROC-AUC) performance on the MVTec AD dataset is 95.9%, which is among the best of several benchmarking methods.
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Expe...
详细信息
ISBN:
(纸本)9781728185514
Advances in media compression indicate significant potential to drive future media coding standards, e.g., Joint Photographic Experts Group's learning-based image coding technologies (JPEG AI) and Joint Video Experts Team's (JVET) deep neural networks (DNN) based video coding. These codecs in fact represent a new type of media format. As a dire consequence, traditional media security and forensic techniques will no longer be of use. This paper proposes an initial study on the effectiveness of traditional watermarking on two state-of-the-art learning based image coding. Results indicate that traditional watermarking methods are no longer effective. We also examine the forensic trails of various DNN architectures in the learning based codecs by proposing a residual noise based source identification algorithm that achieved 79% accuracy.
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction co...
详细信息
ISBN:
(纸本)9781665475921
With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction convolution neural network (CNN) model combined with novel binocular feature interaction consistent with human visual system (HVS). In order to simulate the characteristics of HVS sensing multi-scale information at the same time, parallel multi-scale feature extraction module (PMSFM) followed by compensation information is proposed. And modified convolutional block attention module (MCBAM) with less computational complexity is designed to generate visual attention maps for the multi-scale features extracted by the PMSFM. In addition, we employ cross-stacked strategy for multi-level binocular fusion maps and binocular disparity maps to simulate the hierarchical perception characteristics of HVS. Experimental results show that our method is superior to the state-of-the-art metrics and achieves an excellent performance.
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular input and overlook the need to decouple different components, leading to mutual interference during the enhancement of each component. Consequently, issues such as insufficient color restoration or blurred edges may arise. In this paper, we introduce a novel tri-branch network for Single image Dehazing that independently extracts low-frequency, high-frequency, and semantic information from images using three distinct sub-networks. A meticulously designed fusion network is then employed to integrate the information from these three branches to produce the final dehazed image. To facilitate the training of such a complex network, we propose a two-stage training approach. Experimental results demonstrate that our approach achieves state-of-the-art (SOTA) performance.
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising...
详细信息
ISBN:
(纸本)9781665475921
Single image desnowing is an important and challenge task for lots of computer vision applications, such as visual tracking and video surveillance. Although existing deep learning-based methods have achieved promising results, most of them rely on the local deep features and neglect global relationship information between the local regions. Therefore, inevitably leading to over-smooth or detail loss results. To solve this issue, we design a UNet-based end-to-end architecture for image desnowing. Specially, to better characterize global information and preserve image detail, we combine Window-based Self-Attention (WSA) transformer block with Residue Spatial Attention (RSA) to build basic unit of our network. Besides, to protect the structure of the image effectively, we also introduce a Residue Channel (RC) loss to guide high-quality image restoration. Extensive experimental results on both synthetic and real-world datasets demonstrate that the proposed model achieves new state-of-the-art results.
In this study, the alignment of video-text and image-text datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to th...
详细信息
ISBN:
(纸本)9798350343557
In this study, the alignment of video-text and image-text datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to the subset which is created via calculated text similarities. A BERT-based embedding vector method is applied to the raw and pure texts. As a visual feature, object-based and CLIP-based methods are used to define video frames. According to the results, alignment with CLIP features achieves the best results in the subset created by filtering using raw text.
暂无评论