Performing super-resolution of a depth image using the guidance from an RGB image is a problem that concerns several fields, such as robotics, medical imaging, and remotesensing. While deep learning methods have achi...
Performing super-resolution of a depth image using the guidance from an RGB image is a problem that concerns several fields, such as robotics, medical imaging, and remotesensing. While deep learning methods have achieved good results in this problem, recent work highlighted the value of combining modern methods with more formal frameworks. In this work, we propose a novel approach which combines guided anisotropic diffusion with a deep convolutional network and advances the state of the art for guided depth super-resolution. The edge transferring/enhancing properties of the diffusion are boosted by the contextual reasoning capabilities of modern networks, and a strict adjustment step guarantees perfect adherence to the source image. We achieve unprecedented results in three commonly used benchmarks for guided depth superresolution. The performance gain compared to other methods is the largest at larger scales, such as × 32 scaling. Code 1 1 https://***/prs-eth/Diffusion-Super-Resolution for the proposed method is available to promote reproducibility of our results.
The proceedings contain 28 papers. The special focus in this conference is on pattern Analysis and Machine Intelligence. The topics include: Development of a Low Cost 3D LiDAR Using 2D LiDAR and Servo Motor;the Design...
ISBN:
(纸本)9789819633487
The proceedings contain 28 papers. The special focus in this conference is on pattern Analysis and Machine Intelligence. The topics include: Development of a Low Cost 3D LiDAR Using 2D LiDAR and Servo Motor;the Design of Machine Vision-Based Waste Sorting System;ECLNet: Efficient Convolution with Lite Transformer for 3D Medical image Segmentation;exploring High-Performance 3D Object Detection with Partial Depth Completion;full-Scale Network for remotesensing Object Detection;Detection of Pedestrian Movement Poses in High-Speed Autonomous Driving Environments Using DVS;city-Scale Multi-Camera Vehicle Tracking System with Improved Self-Supervised Camera Link Model;an Efficient Transformer-Based Network for remotesensingimage Change Detection;the Method for Three-Dimensional Visual Measurement of Circular Markers Based on Active Fusion Technology;intelligent imagerecognition and Classification Technology in Digital Media;Indoor Visible Light Positioning System Based on the image Sensor and CNN-GRU Fusion Neural Network;Stock Investor Sentiment Analysis Based on NLP;Novel Audiobook System Based on BERT;student Enrollment Consultation Q&A Robot Based on Large Language Model;family Doctor Model Training Based on Large Language Model Tuning;composite Awareness-Based Knowledge Distillation for Medical Anomaly Detection;Improved CNN-GRU RF Fingerprint Feature recognition Method Based on Comb Filter;emotional State recognition of English Learners Based on Deep Learning;Application of Classification Framework Based on CDR and CNN in Ophthalmic Prediagnosis;visual recognition and Recommendation System for Cultural Tourism Attractions Based on Deep Learning;quadruped Robot System Based on Proprioceptive Vision and Complex Ground Mobility Capabilities;a Simulated Dataset to Evaluate the Visual-Inertial Odometry Algorithms.
Artificial intelligence (AI) and autonomous edge computing in space are emerging areas of interest to augment capabilities of nanosatellites, where modern sensors generate orders of magnitude more data than can typica...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Artificial intelligence (AI) and autonomous edge computing in space are emerging areas of interest to augment capabilities of nanosatellites, where modern sensors generate orders of magnitude more data than can typically be transmitted to mission control. Here, we present the hardware and software design of an onboard AI subsystem hosted on SpIRIT. The system is optimised for on-board computer vision experiments based on visible light and long wave infrared cameras. This paper highlights the key design choices made to maximise the robustness of the system in harsh space conditions, and their motivation relative to key mission requirements, such as limited compute resources, resilience to cosmic radiation, extreme temperature variations, distribution shifts, and very low transmission bandwidths. The payload, called Loris, consists of six visible light cameras, three infrared cameras, a camera control board and a Graphics processing Unit (GPU) system-on-module. Loris enables the execution of AI models with on-orbit fine-tuning as well as a next-generation image compression algorithm, including progressive coding. This innovative approach not only enhances the data processing capabilities of nanosatellites but also lays the groundwork for broader applications to remotesensing from space.
Human visual attention is the basis of target recognition, change detection and classification in remotesensingimages. However, the human visual attention of remotesensingimages during target detection remains uni...
详细信息
Currently, machine learning-based methods for remotesensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local ...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Currently, machine learning-based methods for remotesensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this pa-per, we introduce a socalled content-adaptive non-local convolution (CANConv), a novel method tailored for re-mote sensingimage pansharpening. Specifically, CANConv employs adaptive convolution, ensuring spatial adaptability, and incorporates non-local self-similarity through the similarity relationship partition (SRP) and the partition-wise adaptive convolution (PWAC) sub-modules. Furthermore, we also propose a corresponding network architecture, called CANNet, which mainly utilizes the multi-scale self-similarity. Extensive experiments demonstrate the superior performance of CANConv, compared with recent promising fusion methods. Besides, we substantiate the method's effectiveness through visualization, ablation experiments, and comparison with existing methods on multiple test sets. The source code is publicly available at https://***/duany11/CANConv.
Synthetic Aperture Radar (SAR) Calibration & Validation are critical processes to ensure the accuracy, precision and reliability of SAR imaging systems. These activities are particularly essential in dynamic envir...
详细信息
Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poo...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for remotesensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene inter-pretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as strong back-bone models for RS make it hard for the models to align their behavior with user queries. To address these limitations, we propose GeoChat - the first versatile remotesensing VLM that offers multitask conversational capabilities with high-resolution RS images. Specifically, GeoChat can not only answer image-level queries but also accepts region inputs to hold region-specific dialogue. Further-more, it can visually ground objects in its responses by referring to their spatial coordinates. To address the lack of domain-specific datasets, we generate a novel RS multimodal instruction-following dataset by extending image-text pairs from existing diverse RS datasets. We establish a comprehensive benchmarkfor RS multitask conversations and compare with a number of baseline methods. GeoChat demonstrates robust zero-shot performance on various RS tasks, e.g., image and region captioning, visual question answering, scene classification, visually grounded conversations and referring detection. Our code is available here.
In recent years, it has been a research hotspot to apply big data-driven deep learning methods to synthetic aperture radar (SAR) target recognition with limited data. However, the problem caused by the long-tailed cha...
详细信息
In recent years, it has been a research hotspot to apply big data-driven deep learning methods to synthetic aperture radar (SAR) target recognition with limited data. However, the problem caused by the long-tailed characteristics of SAR data has long been ignored. Specifically, a majority of data samples are concentrated in a few categories, leading to a skewed distribution of data. This skewed distribution can cause learning bias toward the majority class, which can subsequently degrade the recognition performance of the minority class. This issue is further exacerbated in limited sample conditions for SAR target recognition. After conducting research on target recognition for long-tailed natural images, this study has found that the existing methods used in this field cannot be easily applied to SAR target recognition. The primary reason is that SAR image data exhibit simultaneous and complex interclass and intraclass long-tailed distributions. In response to this issue, we propose the use of a multibranch expert network and dual-environment sampling to address the long-tail problems in both interclass and intraclass scenarios. The proposed method outperforms popular long-tailed target recognition methods on the long-tailed versions of the MSTAR and FUSAR datasets.
Detecting change through multi-image, multi-date remotesensing is essential to developing and understanding of global conditions. Despite recent advancements in remotesensing realized through deep learning, novel me...
详细信息
ISBN:
(纸本)9781665448994
Detecting change through multi-image, multi-date remotesensing is essential to developing and understanding of global conditions. Despite recent advancements in remotesensing realized through deep learning, novel methods for accurate multi-image change detection remain unrealized. Recently, several promising methods have been proposed to address this topic, but a paucity of publicly available data limits the methods that can be assessed. In particular, there exists limited work on categorizing the nature and status of change across an observation period. This paper introduces the first labeled dataset available for such a task. We present an open-source change detection dataset, termed QFabric, with 450,000 change polygons annotated across 504 locations in 100 different cities covering a wide range of geographies and urban fabrics. QFabric is a temporal multi-task dataset with 6 change types and 9 change status classes. The geography and environment metadata around each polygon provides context that can be leveraged to build robust deep neural networks. We apply multiple benchmarks on our dataset for change detection, change type and status classification tasks. Project page: https://***/qfabric
暂无评论