This paper provides a review of the NTIRE 2021 challenge targeting defocus deblurring using dual-pixel (DP) data. The goal of this single-track challenge was to reduce spatially varying defocus blur present in images ...
详细信息
ISBN:
(纸本)9781665448994
This paper provides a review of the NTIRE 2021 challenge targeting defocus deblurring using dual-pixel (DP) data. The goal of this single-track challenge was to reduce spatially varying defocus blur present in images captured with a shallow depth of field. The images used in this challenge were obtained using a DP sensor that provided a pair of DP views per captured image. Submitted solutions were evaluated using conventional signal processing metrics, namely peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). Out of 185 registered participants, nine teams provided methods and competed in the final stage. The paper describes the methods proposed by the participating teams and their results. The winning teams represent the state-of-the-art in terms of defocus deblurring using DP images.
Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, ...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, particularly with diffusion models, have shown remarkable progress. However, there is a growing demand for real-world applications to detect unseen individuals, deepfake techniques, and scenarios. To address this limitation, we propose a Prototype-based Unified Framework for Deepfake Detection (PUDD). PUDD offers a detection system based on similarity, comparing input data against known prototypes for video classification and identifying potential deepfakes or previously unseen classes by analyzing drops in similarity. Our extensive experiments reveal three key findings: (1) PUDD achieves an accuracy of 95.1% on Celeb-DF, outperforming state-of-the-art deepfake detection methods; (2) PUDD leverages image classification as the upstream task during training, demonstrating promising performance in both image classification and deepfake detection tasks during inference; (3) PUDD requires only 2.7 seconds for retraining on new data and emits 10
5
times less carbon compared to the state-of-the-art model, making it significantly more environmentally friendly.
This paper unveils the discoveries and outcomes of the inaugural iteration of the Multi-modal Aerial View Image Challenge (MAVIC) aimed at image translation. The primary objective of this competition is to stimulate r...
This paper unveils the discoveries and outcomes of the inaugural iteration of the Multi-modal Aerial View Image Challenge (MAVIC) aimed at image translation. The primary objective of this competition is to stimulate research efforts towards the development of models capable of translating co-aligned images between multiple modalities. To accomplish the task of image translation, the competition utilizes images obtained from both synthetic aperture radar (SAR) and electro-optical (EO) sources. Specifically, the challenge centers on the translation from the SAR modality to the EO modality, an area of research that has garnered attention. The inaugural challenge demonstrates the feasibility of the task. The dataset utilized in this challenge is derived from the UNIfied COincident Optical and Radar for recognition (UNICORN) dataset. We introduce an new version of the UNICORN dataset that is focused on enabling the sensor translation task. Performance evaluation is conducted using a combination of measures to ensure high fidelity and high accuracy translations.
Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizin...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature accuracy and image quality of low-cost thermal imaging cameras for agricultural applications. Leveraging advancements in computervision techniques, particularly deep learning networks, we propose a method, called VisTA-SR (Visual & Thermal Alignment and Super-Resolution Enhancement) that combines RGB and thermal images to enhance the capabilities of low-resolution thermal cameras. The research includes calibration and validation of temperature measurements, acquisition of paired image datasets, and the development of a deep learning network tailored for agricultural thermal imaging. Our study addresses the challenges of image enhancement in the agricultural domain and explores the potential of low-cost thermal cameras to replace high-resolution industrial cameras. Experimental results demonstrate the effectiveness of our approach in enhancing temperature accuracy and image sharpness, paving the way for more accessible and efficient thermal imaging solutions in agriculture.
Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR). However, early Transformer-based approaches that rely on self-attention within non-overlapping...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR). However, early Transformer-based approaches that rely on self-attention within non-overlapping windows encounter challenges in acquiring global information. To activate more input pixels globally, hybrid attention models have been proposed. Moreover, training by solely minimizing pixel-wise RGB losses, such as l
1
, have been found inadequate for capturing essential high-frequency details. This paper presents two contributions: i) We introduce convolutional non-local sparse attention (NLSA) blocks to extend the hybrid transformer architecture in order to further enhance its receptive field. ii) We employ wavelet losses to train Transformer models to improve quantitative and subjective performance. While wavelet losses have been explored previously, showing their power in training Transformer-based SR models is novel. Our experimental results demonstrate that the proposed model provides state-of-the-art PSNR results as well as superior visual performance across various benchmark datasets.
Estimating what will be the fruit yield in an orchard helps farmers to better plan the resources needed for harvesting, storing, and commercialising the crop, and also to take some agricultural decisions (like pruning...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Estimating what will be the fruit yield in an orchard helps farmers to better plan the resources needed for harvesting, storing, and commercialising the crop, and also to take some agricultural decisions (like pruning) that may increase the quality of the yield and increase profits. Therefore, over the last years, several methods based on computervision were proposed to automate this task, by directly counting the fruits on trees using a video camera. However, existing works and methods usually assume ideal conditions, and may fail under more challenging scenarios with unconstrained camera motion and intermittent occlusions of fruits. Here we show that combining Structure-from-Motion (SfM) with a bipartite graph matching has the potential to address those challenges. We found that our approach applied to real-world datasets, with unconstrained camera motion and low frame rates, outperforms existing methods by a large margin. Our results demonstrate that the proposed method is robust to multiple intermittent occlusions under challenging conditions, and thus suitable to be used in diverse real-world scenarios in orchards, either with a camera operated by hand or mounted on an agricultural vehicle. Although not shown here, we believe that the proposed method can also be applied to other object tracking problems besides counting fruits, under similar settings — i.e. static objects and a freely moving camera.
Modern Neural Radiance Fields (NeRFs) learn a mapping from position to volumetric density leveraging proposal network samplers. In contrast to the coarse-to-fine sampling approach with two NeRFs, this offers significa...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Modern Neural Radiance Fields (NeRFs) learn a mapping from position to volumetric density leveraging proposal network samplers. In contrast to the coarse-to-fine sampling approach with two NeRFs, this offers significant potential for acceleration using lower network capacity. Given that NeRFs utilize most of their network capacity to estimate radiance, they could store valuable density information in their parameters or their deep features. To investigate this proposition, we take one step back and analyze large, trained ReLU-MLPs used in coarse-to-fine sampling. Building on our novel activation visualization method, we find that trained NeRFs, Mip-NeRFs and proposal network samplers map samples with high density to local minima along a ray in activation feature space. We show how these large MLPs can be accelerated by transforming intermediate activations to a weight estimate, without any modifications to the training protocol or the network architecture. With our approach, we can reduce the computational requirements of trained NeRFs by up to 50% with only a slight hit in rendering quality. Extensive experimental evaluation on a variety of datasets and architectures demonstrates the effectiveness of our approach. Consequently, our methodology provides valuable insight into the inner workings of NeRFs.
Employing specific networks to address different types of degradation often proved to be complex and time-consuming in practical applications. The Bracket Image Restoration and Enhancement (BIRE) aimed to address vari...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Employing specific networks to address different types of degradation often proved to be complex and time-consuming in practical applications. The Bracket Image Restoration and Enhancement (BIRE) aimed to address various image restoration tasks in a unified manner by restoring clear single-frame images from multiple-frame shots, including denoising, deblurring, enhancing high dynamic range (HDR), and achieving super-resolution under various degradation conditions. In this paper, we propose LGSTANet, an efficient aggregation restoration network for BIRE. Specifically, inspired by video restoration methods, we adopt an efficient architecture comprising alignment, aggregation, and reconstruction. Additionally, we introduce a Learnable Global Spatio-Temporal Adaptive (LGSTA) aggregation module to effectively aggregate inter-frame complementary information. Furthermore, we propose an adaptive restoration modulator to address specific degradation disturbances of various types, thereby achieving high-quality restoration outcomes. Extensive experiments demonstrate the effectiveness of our method. LGSTANet outperforms other state-of-the-art methods in Bracket Image Restoration and Enhancement and achieves competitive results in the NTIRE2024 BIRE challenge.
In the field of remote sensing, the scarcity of stereo-matched and particularly lack of accurate ground truth data often hinders the training of deep neural networks. The use of synthetically generated images as an al...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In the field of remote sensing, the scarcity of stereo-matched and particularly lack of accurate ground truth data often hinders the training of deep neural networks. The use of synthetically generated images as an alternative, alleviates this problem but suffers from the problem of domain generalization. Unifying the capabilities of image-to-image translation and stereo-matching presents an effective solution to address the issue of domain generalization. Current methods involve combining two networks—an unpaired image-to-image translation network and a stereo-matching network—while jointly optimizing them. We propose an edge-aware GAN-based network that effectively tackles both tasks simultaneously. We obtain edge maps of input images from the Sobel operator and use it as an additional input to the encoder in the generator to enforce geometric consistency during translation. We additionally include a warping loss calculated from the translated images to maintain the stereo consistency. We demonstrate that our model produces qualitatively and quantitatively superior results than existing models, and its applicability extends to diverse domains, including autonomous driving.
Though the object detection performance on standard benchmarks has been improved drastically in the last decade, current object detectors are often vulnerable to domain shift between the training data and testing imag...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Though the object detection performance on standard benchmarks has been improved drastically in the last decade, current object detectors are often vulnerable to domain shift between the training data and testing images. Domain adaptation techniques have been developed to adapt an object detector trained in a source domain to a target domain. However, they assume that the target domain is known and fixed and that a target dataset is available for training, which cannot be satisfied in many real-world applications. To close this gap, this paper investigates fully test-time adaptation for object detection. It means to update a trained object detector on a single testing image before making a prediction, without access to the training data. Through a diagnostic study of a baseline self-training framework, we show that a great challenge of this task is the unreliability of pseudo labels caused by domain shift. We then propose a simple yet effective method, termed the IoU Filter, to address this challenge. It consists of two new IoU-based indicators, both of which are complementary to the detection confidence. Experimental results on five datasets demonstrate that our approach could effectively adapt a trained detector to various kinds of domain shifts at test time and bring substantial performance gains. Code is available at https://***/XiaoqianRuan1/IoU-filter.
暂无评论