DeepFakes pose a significant threat to individual reputations and society as a whole. Existing proactive defense strategies concentrate on adding adversarial perturbations to images to disrupt or nullify the generatio...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
DeepFakes pose a significant threat to individual reputations and society as a whole. Existing proactive defense strategies concentrate on adding adversarial perturbations to images to disrupt or nullify the generation of DeepFakes, but these approaches are easily detectable by human perception and can be removed. To address this challenge, we propose a three-stage framework called LOFT (Latent Space Optimization and Generator Fine-Tuning for Defending against DeepFakes). First, encoding the original image into the latent space to obtain a latent code that captures facial features. Second, utilizing Adversarial Latent Optimization to optimize the latent code for reconstructing the image and defending against DeepFake manipulation. Third, fine-tuning the generator to enhance the reconstructed image's visual quality and defense capability further. Our study evaluates the effectiveness of our proposed framework through two distinct DeepFake tasks: attribute editing and face reenactment. Various experimental results demonstrate that our proposed framework outperforms the existing benchmark in both visual quality and defense capability.
Underwater images suffer from complex and diverse degradation, which inevitably affects the performance of underwater visual tasks. However, most existing learning-based underwater image enhancement (UIE) methods main...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Underwater images suffer from complex and diverse degradation, which inevitably affects the performance of underwater visual tasks. However, most existing learning-based underwater image enhancement (UIE) methods mainly restore such degradations in the spatial domain, and rarely pay attention to the fourier frequency information. In this paper, we develop a novel UIE framework based on spatial-frequency interaction and gradient maps, namely SFGNet, which consists of two stages. Specifically, in the first stage, we propose a dense spatial-frequency fusion network (DSFFNet), mainly including our designed dense fourier fusion block and dense spatial fusion block, achieving sufficient spatial-frequency interaction by cross connections between these two blocks. In the second stage, we propose a gradient-aware corrector (GAC) to further enhance perceptual details and geometric structures of images by gradient map. Experimental results on two real-world underwater image datasets show that our approach can successfully enhance underwater images, and achieves competitive performance in visual quality improvement. The code is available at https://***/zhihefang/SFGNet.
Omnidirectional image (ODI) super-resolution (SR) is an important technique in augmented reality and virtual reality applications to address the low-resolution problem caused by limitations in capturing devices or ban...
详细信息
ISBN:
(纸本)9781728198354
Omnidirectional image (ODI) super-resolution (SR) is an important technique in augmented reality and virtual reality applications to address the low-resolution problem caused by limitations in capturing devices or bandwidth. The ODI projection distortion makes it challenging to apply existing SR methods. In this paper, we propose an ODI SR method by leveraging the characteristics of ODIs and human visual characteristics. Specifically, we firstly design a perception-orientated adaptive loss function by jointly utilizing saliency map and latitude map. In our proposed ODI-SR network, we introduce an attention module to aggregate multi-scale information and leverage spherical convolution to adapt to the spheric format of ODIs. Furthermore, we design a data augmentation strategy for ODIs according to viewpoint distribution to further improve the visual quality of SR images. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance according to both qualitative and quantitative evaluations.
Zero-shot learning (ZSL) directs the challenge of classifying unseen test images without explicit training on those samples. ZSL can identify and classify unlabeled images available in abundance by learning from visua...
详细信息
ISBN:
(纸本)9783031734762;9783031734779
Zero-shot learning (ZSL) directs the challenge of classifying unseen test images without explicit training on those samples. ZSL can identify and classify unlabeled images available in abundance by learning from visual and semantic embedding vectors (feature vectors). Information-enriched visual features extracted from images play a crucial role in ZSL. This paper proposes a hybrid feature approach that integrates low-level (LL), and high-level (HL) features extracted from images. Gray Level Co-occurrence Matrix (GLCM) and Gabor features are employed to obtain LL texture features, while HL features are derived from the ResNet-50 model, renowned for capturing complex hierarchical representations. These hybrid visual features are then mapped with semantic features using linear mapping, where the semantic features are embedding vectors of labels generated by the fastText model. Experiments on the AWA2 and SUN datasets are conducted in a bid to evaluate the proposed approach's effectiveness. The hybrid feature approach has demonstrated enhanced quality in zero-shot image classification, effectively classifying images that the model has not seen during training.
We propose a method for capturing high-quality images in low-light environments using multi-band near-infrared (NIR) images, which offer robustness to brightness variations and provide structural information not prese...
详细信息
Unconditional medical image synthesis is the task of generating realistic and diverse medical images from random noise without any prior information or constraints. Synthesizing realistic medical images can enrich the...
详细信息
ISBN:
(纸本)9798350343557
Unconditional medical image synthesis is the task of generating realistic and diverse medical images from random noise without any prior information or constraints. Synthesizing realistic medical images can enrich the quality and diversity of medical imaging datasets, which in turn, enhance the performance and generalization of deep learning models for medical imaging. Prevalent approach for synthesizing medical images involves generative adversarial networks (GAN) or denoising diffusion probabilistic models (DDPM). However, GAN models that implicitly learn the image distribution are prone to limited sample fidelity and diversity. On the other hand, diffusion models suffer from slow sampling speed due to small diffusion steps. In this paper, we propose a novel diffusion-based method for unconditional medical image synthesis, Diff-Med-Synth, that generates realistic and diverse medical images from random noise. Diff-Med-Synth combines the advantages of denoising diffusion probabilistic models and GANs to achieve fast and efficient image sampling. We evaluate our method on two multi-contrast MRI datasets and show that it outperforms state-of-the-art methods in terms of quality, diversity, and fidelity of the synthesized images.
The attribute-based person search task aims to find matching pedestrian images by text attributes, which is relevant in scenarios where no query image is given. However, the existing methods exhibit inferior performan...
详细信息
High Dynamic Range (HDR) imaging aims to replicate the high visual quality and clarity of real-world scenes. Due to the high costs associated with HDR imaging, the literature offers various data-driven methods for HDR...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
High Dynamic Range (HDR) imaging aims to replicate the high visual quality and clarity of real-world scenes. Due to the high costs associated with HDR imaging, the literature offers various data-driven methods for HDR image reconstruction from Low Dynamic Range (LDR) counterparts. A common limitation of these approaches is missing details in regions of the reconstructed HDR images, which are overor under-exposed in the input LDR images. To this end, we propose a simple and effective method, HistoHDR-Net, to recover the fine details (e.g., color, contrast, saturation, and brightness) of HDR images via a fusion-based approach utilizing histogram-equalized LDR images along with self-attention guidance. Our experiments demonstrate the efficacy of the proposed approach over the state-of-art methods.
Recently, studies on generative models using 3D information are active. GIRAFFE, one of the latest 3D-aware generative models, shows better feature disentanglement than existing generative models because it generates ...
详细信息
The accurate classification of fresh fruit bunch ripeness is crucial for optimizing oil quality and yield in the palm oil industry. Traditional manual inspection methods are labor-intensive, subjective, and prone to e...
详细信息
ISBN:
(纸本)9798350352368
The accurate classification of fresh fruit bunch ripeness is crucial for optimizing oil quality and yield in the palm oil industry. Traditional manual inspection methods are labor-intensive, subjective, and prone to errors, motivating the exploration of automated solutions. This paper examined the potential of vision language models, including LLaVA 1.5, YiVL, and PaliGemma, to automate and enhance FFB ripeness assessment. The models were evaluated on their ability to classify ripeness stages and the accuracy of generated descriptive text using metrics like BLEU and ROUGE scores. Yi-VL achieved the highest descriptive accuracy with a ROUGE-L score of 93.14. However, it processes 0.18 samples per second, which is slower than PaliGemma (0.53 samples/second). PaliGemma is 194.44% more efficient in samples/second than Yi-VL, making it better suited for realtime applications despite its lower accuracy (ROUGE-L: 26.15). LLaVA 1.5 offers a balance between accuracy (ROUGE-L: 82.16) and efficiency (0.22 samples/second). This research highlighted the trade-offs between different VLMs for FFB ripeness assessment, demonstrating their potential to revolutionize the agriculture industry. Future work may focus on optimizing model performance and deploying these technologies in real-world scenarios.
暂无评论