The development of effective vision-based algorithms has been a significant challenge in achieving autonomous drones, which promise to offer immense potential for many real-world applications. This paper investigates ...
The development of effective vision-based algorithms has been a significant challenge in achieving autonomous drones, which promise to offer immense potential for many real-world applications. This paper investigates learning deep sensorimotor policies for vision-based drone racing, which is a particularly demanding setting for testing the limits of an algorithm. Our method combines feature representation learning to extract task-relevant feature representations from high-dimensional image inputs with a learning-by-cheating framework to train a deep sensorimotor policy for vision-based drone racing. This approach eliminates the need for globally-consistent state estimation, trajectory planning, and handcrafted control design, allowing the policy to directly infer control commands from raw images, similar to human pilots. We conduct experiments using a realistic simulator and show that our vision-based policy can achieve state-of-the-art racing performance while being robust against unseen visual disturbances. Our study suggests that consistent feature embeddings are essential for achieving robust control performance in the presence of visual disturbances. The key to acquiring consistent feature embeddings is utilizing contrastive learning along with data augmentation. Video: https://***/AX_fcnW9yqE
unstructured environments, enabling various real-world applications. However, the lack of effective vision-based algorithms has been a stumbling block to achieving this goal. Existing systems often require hand-engine...
详细信息
Point Cloud Registration (PCR) is a critical and challenging task in computervision and robotics. One of the primary difficulties in PCR is identifying salient and meaningful points that exhibit consistent semantic a...
详细信息
ISBN:
(数字)9798350377705
ISBN:
(纸本)9798350377712
Point Cloud Registration (PCR) is a critical and challenging task in computervision and robotics. One of the primary difficulties in PCR is identifying salient and meaningful points that exhibit consistent semantic and geometric properties across different scans. Previous methods have encountered challenges with ambiguous matching due to the similarity among patch blocks throughout the entire point cloud and the lack of consideration for efficient global geometric consistency. To address these issues, we propose a new framework that includes several novel techniques. Firstly, we introduce a semantic-aware geometric encoder that combines object-level and patch-level semantic information. This encoder significantly improves registration recall by reducing ambiguity in patch-level superpoint matching. Additionally, we incorporate a prior knowledge approach that utilizes an intrinsic shape signature to identify salient points. This enables us to extract the most salient super points and meaningful dense points in the scene. Secondly, we introduce an innovative transformer that encodes High-Order (HO) geometric features. These features are crucial for identifying salient points within initial overlap regions while considering global high-order geometric consistency. We introduce an anchor node selection strategy to optimize this high-order transformer further. By encoding inter-frame triangle or polyhedron consistency features based on these anchor nodes, we can effectively learn high-order geometric features of salient super points. These high-order features are then propagated to dense points and utilized by a Sinkhorn matching module to identify critical correspondences for successful registration. The experiments conducted on the 3DMatch/3DLoMatch and KITTI datasets demonstrate the effectiveness of our method.
Establishing reliable correspondences is crucial for all registration tasks, including 2D image registration, 3D point cloud registration, and 2D-3D image-to-point cloud registration. However, these tasks are often co...
详细信息
Establishing reliable correspondences is crucial for all registration tasks, including 2D image registration, 3D point cloud registration, and 2D-3D image-to-point cloud registration. However, these tasks are often complicated by challenges such as scale inconsistencies, symmetry, and large deformations, which can lead to ambiguous matches. Previous feature-based and correspondence-based methods typically rely on geometric or semantic features to generate or polish initial potential correspondences. Some methods typically leverage specific geometric priors, such as topological preservation, to devise diverse and innovative strategies tailored to a given enhancement goal, which cannot be exhaustively enumerated. Additionally, many previous approaches rely on a single-step prediction head, which can struggle with local minima in complex matching scenarios. To address these challenges, we introduce an innovative paradigm that leverages a diffusion model in matrix space for robust matching matrix estimation. Our model treats correspondence estimation as a denoising diffusion process in the matching matrix space, gradually refining the intermediate matching matrix to the optimal one. Specifically, we apply the diffusion model in the doubly stochastic matrix space for 3D-3D and 2D-3D registration tasks. In the 2D image registration task, we deploy the diffusion model in a matrix sub-space, where dual-softmax projection regularization is applied. For all three registration tasks, we provide adaptive matching matrix embedding implementations tailored to the specific characteristics of each task while maintaining a consistent"match-to-warp" encoding pattern. Furthermore, we adopt a lightweight design for the denoising module. In inference, once points or image features are extracted and fixed, this module performs multi-step denoising predictions through reverse sampling. Evaluations on both 2D and 3D registration tasks demonstrate the effectiveness of our approach. Copyrigh
Establishing reliable correspondences is essential for 3D and 2D-3D registration tasks. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these fea...
详细信息
This work reviews the results of the NTIRE 2023 Challenge on Image Shadow Removal. The described set of solutions were proposed for a novel dataset, which captures a wide range of object-light interactions. It consist...
详细信息
With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets ...
详细信息
This article's main contributions are twofold: 1) to demonstrate how to apply the general European Union's High-Level Expert group's (EU HLEG) guidelines for trustworthy AI in practice for the domain of he...
详细信息
International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from t...
International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multicenter study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and post-processing (66%). The “typical” lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work.
暂无评论