This letter presents an efficient visual anomaly detection framework designed for safe autonomous navigation in dynamic indoor environments, such as university hallways. The approach employs an unsupervised autoencode...
详细信息
This letter presents an efficient visual anomaly detection framework designed for safe autonomous navigation in dynamic indoor environments, such as university hallways. The approach employs an unsupervised autoencoder method within deeplearning to model regular environmental patterns and detect anomalies as deviations in the embedding space. To enhance reliability and safety, the system integrates a statistical framework, conformal prediction, that provides uncertainty quantification with probabilistic guarantees. The proposed solution has been deployed on a real-time robotic platform, demonstrating efficient performance under resource-constrained conditions. Extensive hyperparameter optimization ensures the model remains dynamic and adaptable to changes, while rigorous evaluations confirm its effectiveness in anomaly detection. By addressing challenges related to real-time processing and hardware limitations, this work advances the state-of-the-art in autonomous anomaly detection. The probabilistic insights offered by this framework strengthen operational safety and pave the way for future developments, such as richer sensor fusion and advanced learning paradigms. This research highlights the potential of uncertainty-aware deeplearning to enhance safety monitoring frameworks, thereby enabling the development of more reliable and intelligent autonomous systems for real-world applications.
Interacting with real-world cluttered scenes poses several challenges to robotic agents that need to understand complex spatial dependencies among the observed objects to determine optimal pick sequences or efficient ...
详细信息
Interacting with real-world cluttered scenes poses several challenges to robotic agents that need to understand complex spatial dependencies among the observed objects to determine optimal pick sequences or efficient object retrieval strategies. Existing solutions typically manage simplified scenarios and focus on predicting pairwise object relationships following an initial object detection phase, but often overlook the global context or struggle with handling redundant and missing object relations. In this work, we present a modern take on visual relational reasoning for grasp planning. We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories. Additionally, we propose D3G, a new end-to-end transformer-based dependency graph generation model that simultaneously detects objects and produces an adjacency matrix representing their spatial relationships. Recognizing the limitations of standard metrics, we employ the Average Precision of Relationships for the first time to evaluate model performance, conducting an extensive experimental benchmark. The obtained results establish our approach as the new state-of-the-art for this task, laying the foundation for future research in robotic manipulation.
Recovering 3D geometry and textures of individual objects is crucial for many robotics applications, such as manipulation, pose estimation, and autonomous driving. However, decomposing a target object from a complex b...
详细信息
Recovering 3D geometry and textures of individual objects is crucial for many robotics applications, such as manipulation, pose estimation, and autonomous driving. However, decomposing a target object from a complex background is challenging. Most existing approaches rely on costly manual labels to acquire object instance perception. Recent advancements in 2D self-supervised learning offer new prospects for identifying objects of interest, yet leveraging such noisy 2D features for clean decomposition remains difficult. In this paper, we propose a Decomposed Object Reconstruction (DORec) network based on neural implicit representations. Our key idea is to use 2D self-supervised features to create two levels of masks for supervision: a binary mask for foreground regions and a K-cluster mask for semantically similar regions. These complementary masks result in robust decomposition. Experimental results on different datasets show DORec's superiority in segmenting and reconstructing diverse foreground objects from varied backgrounds enabling downstream tasks such as pose estimation.
Semantic segmentation is crucial for autonomous navigation in off-road environments, enabling precise classification of surroundings to identify traversable regions. However, distinctive factors inherent to off-road c...
详细信息
Semantic segmentation is crucial for autonomous navigation in off-road environments, enabling precise classification of surroundings to identify traversable regions. However, distinctive factors inherent to off-road conditions, such as source-target domain discrepancies and sensor corruption from rough terrain, can result in distribution shifts that alter the data differently from the trained conditions. This often leads to inaccurate semantic label predictions and subsequent failures in navigation tasks. To address this, we propose ST-Seg, a novel framework that expands the source distribution through style expansion (SE) and texture regularization (TR). Unlike prior methods that implicitly apply generalization within a fixed source distribution, ST-Seg offers an intuitive approach for distribution shift. Specifically, SE broadens domain coverage by generating diverse realistic styles, augmenting the limited style information of the source domain. TR stabilizes local texture representation affected by style-augmented learning through a deep texture manifold. Experiments across various distribution-shifted target domains demonstrate the effectiveness of ST-Seg, with substantial improvements over existing methods. These results highlight the robustness of ST-Seg, enhancing the real-world applicability of semantic segmentation for off-road navigation.
Autonomous Vehicles (AVs) are redefining the transportation sector through their ability to navigate, make decisions, and complete autonomous tasks. For accurate perception and comprehension of the surroundings, the A...
详细信息
Autonomous Vehicles (AVs) are redefining the transportation sector through their ability to navigate, make decisions, and complete autonomous tasks. For accurate perception and comprehension of the surroundings, the AVs heavily rely on segmenting high-resolution 3D point cloud data provided by Light Detection and Ranging (LiDAR) sensors for discerning objects and other environmental features. However, the current vehicular segmentation approaches experience shortcomings in data insufficiency, computational performance, and precision concerns. Hence, to counteract these limitations, the paper proposes a Semantic Segmentation approach using Ball-Pivoting Algorithm and U-Net (SSBU) that harmoniously combines the Ball-Pivoting surface reconstruction algorithm and 3D U-Net to enhance image characteristics, leading to highly accurate outcomes with optimal cost efficiency. This SSBU integration involves carrying out augmentations and pre-processing of the raw LiDAR data to transform them into voxels through the process of Voxelization. The voxels are further improved through a surface reconstruction technique that utilizes the Ball Pivoting Algorithm (BPA). The resulting 3D model is analyzed using 3D U-Net deeplearning architecture for robust and real-time interpretation. The implementation has produced a mean Intersection Over Union (IoU) of 83.3 over the NuScenes data and 69.7 on the KITTI dataset, outperforming the state-of-the-art.
We extend our previous work, PoCo (Liang et al. 2024), and present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking int...
详细信息
We extend our previous work, PoCo (Liang et al. 2024), and present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking into an end-to-end model and keeps the consistency of using Context-of-Clusters (CoCs) (Ma, et al. 2023) for feature processing. Unlike prior approaches that primarily focus on the RGB domain for place recognition reranking, CSCPR is designed to handle the RGB-D data. We apply the CoCs to handle cross-sourced and cross-scaled RGB-D point clouds and introduce two novel modules for reranking: the Self-Context Cluster (SCC) and the Cross Source Context Cluster (CSCC), which enhance feature representation and match query-database pairs based on local features, respectively. We also release two new datasets, ScanNetIPR and ARKitIPR. Our experiments demonstrate that CSCPR significantly outperforms state-of-the-art models on these datasets by at least 29.27% in Recall@1 on the ScanNet-PR dataset and 43.24% in the new datasets.
deeplearning offers promising new ways to accurately model aleatoric uncertainty in robotic state estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being...
详细信息
deeplearning offers promising new ways to accurately model aleatoric uncertainty in robotic state estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deeplearning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric uncertainty: parametric, discretized, and generative modeling. We systematically compare the respective strengths and weaknesses of these three methods on simulated non-Gaussian densities as well as on real-world terrain-relative navigation data. Our results show that these deeplearning methods can accurately capture complex uncertainty patterns, highlighting their potential for improving the reliability and robustness of estimation systems.
In the context of autonomous driving, monocular 3D detection is regarded as a fundamental and essential task due to its convenience, speed, and low cost. However, the lack of depth information in monocular images pres...
详细信息
In the context of autonomous driving, monocular 3D detection is regarded as a fundamental and essential task due to its convenience, speed, and low cost. However, the lack of depth information in monocular images presents significant challenges for predicting object 3D information. Although existing methods address this issue using LiDAR guidance or pre-trained depth estimators, their substantial computational resource requirements limit scalability in real-world applications. In this letter, we propose a novel monocular 3D object detector with stereo guidance, called MonoSG. It simulates human visualperception by using stereo images during training to guide learning and retrieves the right-view and depth information from monocular images during inference for accurate 3D detection. The Stereo Guidance Cross Attention Module (SG-CAM) is designed to fuse binocular image information. Intra-view features are extracted from binocular images, and cross-attention is computed from the left to the right view. Then, the cross-attention features are fused with the intra-view features of the left view, enabling stereo guidance for MonoSG. To better adapt to different data distributions and improve the generalization ability of MonoSG, the Stereo Guidance Auxiliary Labels (SG-AL) are introduced for each object of stereo images, with DIoU3D proposed as the label score. Furthermore, the SG-AL score loss is proposed to guide the MonoSG that can reduce gradient variance, facilitate network convergence, and mitigate the issue of insufficient depth information. Comprehensive experiments on the KITTI dataset validate the effectiveness of our method, demonstrating superior performance, particularly on low-resolution images.
In autonomous robot navigation, terrain cost assignment is typically performed using a semantics-based paradigm in which terrain is first labeled using a pre-trained semantic classifier and costs are then assigned acc...
详细信息
In autonomous robot navigation, terrain cost assignment is typically performed using a semantics-based paradigm in which terrain is first labeled using a pre-trained semantic classifier and costs are then assigned according to a user-defined mapping between label and cost. While this approach is rapidly adaptable to changing user preferences, only preferences over the types of terrain that are already known by the semantic classifier can be expressed. In this letter, we hypothesize that a machine-learning-based alternative to the semantics-based paradigm above will allow for rapid cost assignment adaptation to preferences expressed over new terrains at deployment time without the need for additional training. To investigate this hypothesis, we introduce and study pacer, a novel approach to costmap generation that accepts as input a single birds-eye view (BEV) image of the surrounding area along with a user-specified preference context and generates a corresponding BEV costmap that aligns with the preference context. Using a staged training procedure leveraging real and synthetic data, we find that pacer is able to adapt to new user preferences at deployment time while also exhibiting better generalization to novel terrains compared to both semantics-based and representation-learning approaches.
Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, ...
详细信息
Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this letter, we present a novel method Recurrent Gaussian Splatting (RecGS), which takes advantage of today's photorealistic 3D reconstruction technology, 3D Gaussian Splatting (3DGS), to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recurrently and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deeplearning approaches. The results show that our proposed RecGS paradigm can effectively separate the caustic from the seafloor, improving the visual appearance, and can be potentially applied on more problems with inconsistent illumination.
暂无评论