The goal of stereo image super-resolution is to enhance the quality of low-resolution stereo image pairs by utilizing complementary information across views. Although transformer-based methods have shown high efficien...
详细信息
ISBN:
(纸本)9798350365474
The goal of stereo image super-resolution is to enhance the quality of low-resolution stereo image pairs by utilizing complementary information across views. Although transformer-based methods have shown high efficiency in single-image super-resolution tasks, they have not been fully used in stereo super-resolution tasks. Therefore, it is crucial to incorporate the complementary information of stereo images into the transformer method to improve image details. To address this challenge, we propose a lightweight Hybrid Cross-view Attention Stereo Super-Resolution network (HCASSR), which uses a Transformer-based network for intra-view feature extraction and a cross-view attention module to aggregate stereo image information. We also employ multi-stage training strategies and data ensemble in test-time to improve image quality. Our method has been extensively tested on the KITTI 2012, KITTI 2015, Middlebury, and Flickr1024 datasets, and the experimental results demonstrate that the proposed method outperforms existing works with smaller model size. Additionally, we won 3rd and 2nd place respectively in Track 1 and Track 2 of the NTIRE 2024 Stereo image Super-Resolution Challenge. Codes and models will be released at https://***/YuqiangY/HCASSR.
Disconnectivity and distortion are the two problems which must be coped with when processing.360 degrees equirectangular images. In this paper, we propose a method of estimating the depth of monocular panoramic image ...
详细信息
ISBN:
(纸本)9798350365474
Disconnectivity and distortion are the two problems which must be coped with when processing.360 degrees equirectangular images. In this paper, we propose a method of estimating the depth of monocular panoramic image with a teacher-student model fusing equirectangular and spherical representations. In contrast with the existing methods fusing an equirectangular representation with a cube map representation or tangent representation, a spherical representation is a better choice because a sampling on a sphere is more uniform and can also cope with distortion more effectively. In this processing. a novel spherical convolution kernel computing with sampling points on a sphere is developed to extract features from the spherical representation, and then, a Segmentation Feature Fusion(SFF) methodology is utilized to combine the features with ones extracted from the equirectangular representation. In contrast with the existing methods using a teacher-student model to obtain a lighter model of depth estimation, we use a teacherstudent model to learn the latent features of depth images. This results in a trained model which estimates the depth map of an equirectangular image using not only the feature maps extracted from an input equirectangular image but also the distilled knowledge learnt from the ground truth of depth map of a training set. In experiments, the proposed method is tested on several well-known 360 monocular depth estimation benchmark datasets, and outperforms the existing methods for the most evaluation indexes.
In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has dis...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://***/.
Blurry images usually exhibit similar blur at various locations across the image domain, a property barely captured in nowadays blind deblurring neural networks. We show that when extracting patches of similar underly...
详细信息
ISBN:
(纸本)9798350365474
Blurry images usually exhibit similar blur at various locations across the image domain, a property barely captured in nowadays blind deblurring neural networks. We show that when extracting patches of similar underlying blur is possible, jointly processing.the stack of patches yields superior accuracy than handling them separately. Our collaborative scheme is implemented in a neural architecture with a pooling layer on the stack dimension. We present three practical patch extraction strategies for image sharpening, camera shake removal and optical aberration correction, and validate the proposed approach on both synthetic and real-world benchmarks. For each blur instance, the proposed collaborative strategy yields significant quantitative and qualitative improvements.
Stereo image super-resolution utilizes the cross-view complementary information brought by the disparity effect of left and right perspective images to reconstruct higher-quality images. Cascading feature extraction m...
详细信息
ISBN:
(纸本)9798350365474
Stereo image super-resolution utilizes the cross-view complementary information brought by the disparity effect of left and right perspective images to reconstruct higher-quality images. Cascading feature extraction modules and cross-view feature interaction modules to make use of the information from stereo images is the focus of numerous methods. However, this adds a great deal of network parameters and structural redundancy. To facilitate the application of stereo image super-resolution in downstream tasks, we propose an efficient Multi-Level Feature Fusion Network for Lightweight Stereo image Super-Resolution (MFFSSR). Specifically, MFFSSR utilizes the Hybrid Attention Feature Extraction Block (HAFEB) to extract multi-level intra-view features. Using the channel separation strategy, HAFEB can efficiently interact with the embedded cross-view interaction module. This structural configuration can efficiently mine features inside the view while improving the efficiency of cross-view information sharing. Hence, reconstruct image details and textures more accurately. Abundant experiments demonstrate the effectiveness of MFFSSR. We achieve superior performance with fewer parameters. The source code is available at https:// github. com/KarosLYX/MFFSSR.
Face image Quality Assessment (FIQA) estimates the utility of face images for automated face recognition (FR) systems. We propose in this work a novel approach to assess the quality of face images based on inspecting ...
详细信息
ISBN:
(纸本)9798350365474
Face image Quality Assessment (FIQA) estimates the utility of face images for automated face recognition (FR) systems. We propose in this work a novel approach to assess the quality of face images based on inspecting the required changes in the pre-trained FR model weights to minimize differences between testing samples and the distribution of the FR training dataset. To achieve that, we propose quantifying the discrepancy in Batch Normalization statistics (BNS), including mean and variance, between those recorded during FR training and those obtained by processing.testing samples through the pretrained FR model. We then generate gradient magnitudes of pretrained FR weights by backpropagating the BNS through the pretrained model. The cumulative absolute sum of these gradient magnitudes serves as the FIQ for our approach. Through comprehensive experimentation, we demonstrate the effectiveness of our training-free and quality labeling-free approach, achieving competitive performance to recent state-of-the-art FIQA approaches without relying on quality labeling, the need to train regression networks, specialized architectures, or designing and optimizing specific loss functions.
Although stereo image super-resolution has been extensively studied, many existing works only rely on attention in a single epipolar direction to reconstruct stereo images. In the case of asymmetric parallax images, t...
详细信息
ISBN:
(纸本)9798350365474
Although stereo image super-resolution has been extensively studied, many existing works only rely on attention in a single epipolar direction to reconstruct stereo images. In the case of asymmetric parallax images, these methods often struggle to capture reliable stereo correspondence, resulting in reconstructed images suffering from blurring and artifacts. In this paper, we propose a novel method called Cross-View Aggregation Network for Stereo image Super-Resolution (CANSSR) and explore the relationship between multi-directional epipolar lines to construct reliable stereo correspondence. Specifically, we propose a multidirectional cross-view aggregation module (MCAM) that effectively captures multi-directional stereo correspondence and obtains cross-view complementary information. Furthermore, we design a channel-spatial aggregation module (CSAM) that aggregates multi-order global-local information in intra-view to reconstruct clearer texture features. In addition, we equip a large kernel convolution in the Feed-forward Network to acquire richer detailed texture information. The extensive experiments conclusively demonstrate that CANSSR outperforms the state-of-the-art method both qualitatively and quantitatively in terms of stereo image super-resolution on the Flickr 1024 and Middlebury datasets.
In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method pro...
详细信息
ISBN:
(纸本)9798350365474
In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing.ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-time video applications, including video conferencing, gaming, and augmented reality, by allowing dynamic, text-based control of lighting conditions.
In the realm of intelligent traffic systems, fisheye cameras have emerged as a pivotal tool, distinguished by their expansive field of view which significantly enhances the surveillance of complex street networks and ...
详细信息
ISBN:
(纸本)9798350365474
In the realm of intelligent traffic systems, fisheye cameras have emerged as a pivotal tool, distinguished by their expansive field of view which significantly enhances the surveillance of complex street networks and intersections. However, the inherent distortion characteristics of fisheye lenses, various illumination, tiny objects and confusion of vehicle classes pose significant challenges to conventional imageprocessing.and object detection techniques. To address these challenges, we propose an advanced object detection framework named FE-Det specifically designed for fisheye cameras in traffic monitoring systems. This framework integrates detection models optimized for day and night scene variability. Additionally, it incorporates innovative post-processing.operations which brings detection enhancement, including a Vehicles Classifier Module for precise vehicle identification, a Static Objects processing.Module for more accurate detection of stationary objects and a Confidence Score Refinement Module to adjust confidence scores for improving the detection of peripheral objects. Experimental evidence substantiates that our framework exhibits a 1.4% improvement in distinguishing between day and night scenes compared to traditional models. Moreover, the application of the proposed post-processing.method results in an additional enhancement of 4.1%.
This paper reviews the NTIRE 2024 RAW image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern image Signal processing.(ISP) ...
详细信息
ISBN:
(纸本)9798350365474
This paper reviews the NTIRE 2024 RAW image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern image Signal processing.(ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW image Super-Resolution.
暂无评论