This paper summarizes the 3rd NTIRE challenge on stereo image super-resolution (SR) with a focus on new solutions and results. The task of this challenge is to super-resolve a low-resolution stereo image pair to a hig...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This paper summarizes the 3rd NTIRE challenge on stereo image super-resolution (SR) with a focus on new solutions and results. The task of this challenge is to super-resolve a low-resolution stereo image pair to a high-resolution one with a magnification factor of × 4 under a limited computational budget. Compared with single image SR, the major challenge of this challenge lies in how to exploit additional information in another viewpoint and how to maintain stereo consistency in the results. This challenge has 2 tracks, including one track on bicubic degradation and one track on real degradations. In total, 108 and 70 participants were successfully registered for each track, respectively. In the test phase, 14 and 13 teams successfully submitted valid results with PSNR (RGB) scores better than the baseline. This challenge establishes a new benchmark for stereo image SR.
Neuromorphic vision sensors present unique advantages over their frame based counterparts. However, unsupervised learning of efficient visual representations from their asynchronous output is still a challenge, requir...
详细信息
ISBN:
(纸本)9781665448994
Neuromorphic vision sensors present unique advantages over their frame based counterparts. However, unsupervised learning of efficient visual representations from their asynchronous output is still a challenge, requiring a rethinking of traditional image and video processing methods. Here we present a network of leaky integrate and fire neurons that learns representations similar to those of simple and complex cells in the primary visual cortex of mammals from the input of two event-based vision sensors. Through the combination of spike timing-dependent plasticity and homeostatic mechanisms, the network learns visual feature detectors for orientation, disparity, and motion in a fully unsupervised fashion. We validate our approach on a mobile robotic platform.
In the rapidly evolving field of Generative AI, this work takes initial steps towards establishing a systematic approach for comparing image editing methods. Currently, there is a lack of quantitative metrics for eval...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In the rapidly evolving field of Generative AI, this work takes initial steps towards establishing a systematic approach for comparing image editing methods. Currently, there is a lack of quantitative metrics for evaluating image editing tasks, with new methods being evaluated mostly qualitatively. Our methodology involves three key components: 1) The creation of a large synthetic dataset using GAN-Control, which enables the generation of ground-truth images for consistent edits across different facial identities; 2) A matching procedure that pairs the edited images with their corresponding ground-truth; and 3) Application of the Perceptual Distance metric to matched pairs. We assessed the effectiveness of our proposed framework through a user study and a set of simulation experiments. Our results indicate that our approach can rank image-editing methods in a way that aligns with human judgment. This research seeks to lay the foundation for a comprehensive evaluation framework for image editing techniques in subsequent studies, initiating a dialogue on this topic.
Video anomaly detection research is generally evaluated on short, isolated benchmark videos only a few minutes long. However, in real-world environments, security cameras observe the same scene for months or years at ...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Video anomaly detection research is generally evaluated on short, isolated benchmark videos only a few minutes long. However, in real-world environments, security cameras observe the same scene for months or years at a time, and the notion of anomalous behavior critically depends on context, such as the time of day, day of week, or schedule of events. Here, we propose a context-aware video anomaly detection algorithm, Trinity, specifically targeted to these scenarios. Trinity is especially well-suited to crowded scenes in which individuals cannot be easily tracked, and anomalies are due to speed, direction, or absence of group motion. Trinity is a contrastive learning framework that aims to learn alignments between context, appearance, and motion, and uses alignment quality to classify videos as normal or anomalous. We evaluate our algorithm on both conventional benchmarks and a public webcam-based dataset we collected that spans more than three months of activity.
Images taken by panoramic cameras in the upright posture can give viewers a better sense and make the downstream panoramic image-based computervision tasks easier. To estimate the inclination angles of panoramic came...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Images taken by panoramic cameras in the upright posture can give viewers a better sense and make the downstream panoramic image-based computervision tasks easier. To estimate the inclination angles of panoramic camera, we proposed a simple but elegant panoramic image-based network, which combines the advantages of geometry-based and deep-learning-based methods. First, a backbone network with five down-sampling layers is designed to focus on the local distortion features. Then, since non-upright panoramic images have highly uniform geometric distortion for the same camera inclination angles, a multi-scale attention module is proposed for the first time, which can weigh each pixel on the feature maps of the backbone network and allows the network to focus on the global and shallow geometric features. Moreover, apart from angle loss, pixel-level image loss is introduced in our network for the inclination angles estimation task to allow the network to compensate for pixel deviations during training. The experiments show that our method overcomes other leading state-of-the-art methods in this field.
Anomaly Detection is a relevant problem in numerous real-world applications, especially when dealing with images. However, little attention has been paid to the issue of changes over time in the input data distributio...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Anomaly Detection is a relevant problem in numerous real-world applications, especially when dealing with images. However, little attention has been paid to the issue of changes over time in the input data distribution, which may cause a significant decrease in performance. In this study, we investigate the problem of Pixel-Level Anomaly Detection in the Continual Learning setting, where new data arrives over time and the goal is to perform well on new and old data. We implement several state-of-the-art techniques to solve the Anomaly Detection problem in the classic setting and adapt them to work in the Continual Learning setting. To validate the approaches, we use a real-world dataset of images with pixel-based anomalies to provide a reliable benchmark and serve as a foundation for further advancements in the field. We provide a comprehensive analysis, discussing which Anomaly Detection methods and which families of approaches seem more suitable for the Continual Learning setting.
The recent state of the art on monocular 3D face reconstruction from image data has made some impressive advancements, thanks to the advent of Deep Learning. However, it has mostly focused on input coming from a singl...
详细信息
This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs excel in tasks like image captioning, they face challe...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs excel in tasks like image captioning, they face challenges in open-world settings. Our benchmarks test VLMs’ consistency in understanding concepts across semantic granularity levels and their response to varying text specificity. Findings show that VLMs favor moderately fine-grained concepts and struggle with specificity, often misjudging texts that differ from their training data. Extensive evaluations reveal limitations in current VLMs, particularly in distinguishing between correct and subtly incorrect descriptions. While fine-tuning offers some improvements, it doesn’t fully address these issues, highlighting the need for VLMs with enhanced generalization capabilities for real-world applications. This study provides insights into VLM limitations and suggests directions for developing more robust models.
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes. The imbalance in the ratio of positive and negative sam...
详细信息
ISBN:
(纸本)9781665448994
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes. The imbalance in the ratio of positive and negative samples for each class skews network output probabilities further from ground-truth distributions. We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training. By stochastically masking labels during loss computation, the method balances this ratio for each class, leading to improved recall on minority classes and improved precision on frequent classes. The ratio is estimated adaptively based on the network's performance by minimizing the KL divergence between predicted and ground-truth distributions. Whereas most existing approaches addressing data imbalance are mainly focused on single-label classification and do not generalize well to the multi-label case, this work proposes a general approach to solve the long-tail data imbalance issue for multi-label classification. PLM is versatile: it can be applied to most objective functions and it can be used alongside other strategies for class imbalance. Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
Neural networks for Image Aesthetic Assessment are usually initialized with weights of pretrained ImageNet models and then trained using a labeled image aesthetics dataset. We argue that the ImageNet classification ta...
详细信息
ISBN:
(纸本)9781665448994
Neural networks for Image Aesthetic Assessment are usually initialized with weights of pretrained ImageNet models and then trained using a labeled image aesthetics dataset. We argue that the ImageNet classification task is not well-suited for pretraining, since content based classification is designed to make the model invariant to features that strongly influence the image's aesthetics, e.g. style-based features such as brightness or contrast. We propose to use self-supervised aesthetic-aware pretext tasks that let the network learn aesthetically relevant features, based on the observation that distorting aesthetic images with image filters usually reduces their appeal. To ensure that images are not accidentally improved when filters are applied, we introduce a large dataset comprised of highly aesthetic images as the starting point for the distortions. The network is then trained to rank less distorted images higher than their more distorted counterparts. To exploit effects of multiple different objectives, we also embed this task into a multi-task setting by adding either a self-supervised classification or regression task. In our experiments, we show that our pretraining improves performance over the ImageNet initialization and reduces the number of epochs until convergence by up to 47 %. Additionally, we can match the performance of an ImageNet-initialized model while reducing the labeled training data by 20 %. We make our code, data, and pretrained models available.
暂无评论