检索结果-内蒙古大学图书馆

Uncertainty-Aware Real-Time visual Anomaly Detection With Conformal Prediction in Dynamic Indoor Environments

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第5期10卷 4468-4475页

作者： Saboury, Arya Uyguroglu, Mustafa Kemal Eastern Mediterranean Univ Dept Elect & Elect Engn TR-99628 Mersin Turkiye

This letter presents an efficient visual anomaly detection framework designed for safe autonomous navigation in dynamic indoor environments, such as university hallways. The approach employs an unsupervised autoencoder method within deep learning to model regular environmental patterns and detect anomalies as deviations in the embedding space. To enhance reliability and safety, the system integrates a statistical framework, conformal prediction, that provides uncertainty quantification with probabilistic guarantees. The proposed solution has been deployed on a real-time robotic platform, demonstrating efficient performance under resource-constrained conditions. Extensive hyperparameter optimization ensures the model remains dynamic and adaptable to changes, while rigorous evaluations confirm its effectiveness in anomaly detection. By addressing challenges related to real-time processing and hardware limitations, this work advances the state-of-the-art in autonomous anomaly detection. The probabilistic insights offered by this framework strengthen operational safety and pave the way for future developments, such as richer sensor fusion and advanced learning paradigms. This research highlights the potential of uncertainty-aware deep learning to enhance safety monitoring frameworks, thereby enabling the development of more reliable and intelligent autonomous systems for real-world applications.

关键词： Anomaly detection Autoencoders Image reconstruction Robots Training Safety Uncertainty Reliability Real-time systems Probabilistic logic deep learning for visual perception probability and statistical methods anomaly detection conformal prediction

来源：评论

学校读者我要写书评

暂无评论

A Modern Take on visual Relationship Reasoning for Grasp Planning

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第2期10卷 1712-1719页

作者： Rabino, Paolo Tommasi, Tatiana Politecn Torino Dept Control & Comp Engn I-10129 Turin Italy

Interacting with real-world cluttered scenes poses several challenges to robotic agents that need to understand complex spatial dependencies among the observed objects to determine optimal pick sequences or efficient object retrieval strategies. Existing solutions typically manage simplified scenarios and focus on predicting pairwise object relationships following an initial object detection phase, but often overlook the global context or struggle with handling redundant and missing object relations. In this work, we present a modern take on visual relational reasoning for grasp planning. We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories. Additionally, we propose D3G, a new end-to-end transformer-based dependency graph generation model that simultaneously detects objects and produces an adjacency matrix representing their spatial relationships. Recognizing the limitations of standard metrics, we employ the Average Precision of Relationships for the first time to evaluate model performance, conducting an extensive experimental benchmark. The obtained results establish our approach as the new state-of-the-art for this task, laying the foundation for future research in robotic manipulation.

关键词： visualization Cognition Planning Robots Clutter Measurement Grasping Transformers Object detection Object recognition AI-based methods deep learning for visual perception perception for grasping and manipulation

来源：评论

学校读者我要写书评

暂无评论

DORec: Decomposed Object Reconstruction and Segmentation Utilizing 2D Self-Supervised Features

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第1期10卷 804-811页

作者： Wu, Jun Li, Sicheng Ji, Sihui Yang, Yifei Wang, Yue Xiong, Rong Liao, Yiyi Zhejiang Univ Hangzhou 310027 Peoples R China

Recovering 3D geometry and textures of individual objects is crucial for many robotics applications, such as manipulation, pose estimation, and autonomous driving. However, decomposing a target object from a complex background is challenging. Most existing approaches rely on costly manual labels to acquire object instance perception. Recent advancements in 2D self-supervised learning offer new prospects for identifying objects of interest, yet leveraging such noisy 2D features for clean decomposition remains difficult. In this paper, we propose a Decomposed Object Reconstruction (DORec) network based on neural implicit representations. Our key idea is to use 2D self-supervised features to create two levels of masks for supervision: a binary mask for foreground regions and a K-cluster mask for semantically similar regions. These complementary masks result in robust decomposition. Experimental results on different datasets show DORec's superiority in segmenting and reconstructing diverse foreground objects from varied backgrounds enabling downstream tasks such as pose estimation.

关键词： Image reconstruction Three-dimensional displays Annotations Solid modeling Rendering (computer graphics) Transformers Training Surface reconstruction Pose estimation Neural radiance field deep learning for visual perception visual learning

来源：评论

学校读者我要写书评

暂无评论

How to Relieve Distribution Shifts in Semantic Segmentation for Off-Road Environments

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第5期10卷 4500-4507页

作者： Hwang, Ji-Hoon Kim, Daeyoung Yoon, Hyung-Suk Kim, Dong-Wook Seo, Seung-Woo Seoul Natl Univ Dept Elect & Comp Engn ASRI INMC Seoul 151742 South Korea Seoul Natl Univ Inst Engn Res Seoul 151742 South Korea

Semantic segmentation is crucial for autonomous navigation in off-road environments, enabling precise classification of surroundings to identify traversable regions. However, distinctive factors inherent to off-road conditions, such as source-target domain discrepancies and sensor corruption from rough terrain, can result in distribution shifts that alter the data differently from the trained conditions. This often leads to inaccurate semantic label predictions and subsequent failures in navigation tasks. To address this, we propose ST-Seg, a novel framework that expands the source distribution through style expansion (SE) and texture regularization (TR). Unlike prior methods that implicitly apply generalization within a fixed source distribution, ST-Seg offers an intuitive approach for distribution shift. Specifically, SE broadens domain coverage by generating diverse realistic styles, augmenting the limited style information of the source domain. TR stabilizes local texture representation affected by style-augmented learning through a deep texture manifold. Experiments across various distribution-shifted target domains demonstrate the effectiveness of ST-Seg, with substantial improvements over existing methods. These results highlight the robustness of ST-Seg, enhancing the real-world applicability of semantic segmentation for off-road navigation.

关键词： Training Semantic segmentation Robot sensing systems Feature extraction Navigation Robots Semantics Standards Robustness Roads deep learning for visual perception computer vision for transportation object detection segmentation and categorization

来源：评论

学校读者我要写书评

暂无评论

Enhanced Surface Reconstruction and Semantic Segmentation of LiDAR Data in Autonomous Vehicle perception Systems

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2025年第6期26卷 7833-7842页

作者： Prathiba, Sahaya Beni Kumar, Suriya Kumar Raghu Anandhan, deepak Kumar Kumar, Aditya Saran Shyam Selvaraj, Arikumar K. Rodrigues, Joel J. P. C. Vellore Inst Technol Ctr Cyber Phys Syst Sch Comp Sci & Engn Chennai 600127 India Vellore Inst Technol Sch Comp Sci & Engn Chennai 600127 India St Josephs Inst Technol Dept Comp Sci & Engn Chennai 600119 India SRM Inst Sci & Technol SRMIST Coll Engn & Technol Dept Data Sci & Business Syst Kattankulathur 603203 India Fed Univ Piaui UFPI Dept Elect Engn BR-60160194 Teresina Piaui Brazil

Autonomous Vehicles (AVs) are redefining the transportation sector through their ability to navigate, make decisions, and complete autonomous tasks. For accurate perception and comprehension of the surroundings, the AVs heavily rely on segmenting high-resolution 3D point cloud data provided by Light Detection and Ranging (LiDAR) sensors for discerning objects and other environmental features. However, the current vehicular segmentation approaches experience shortcomings in data insufficiency, computational performance, and precision concerns. Hence, to counteract these limitations, the paper proposes a Semantic Segmentation approach using Ball-Pivoting Algorithm and U-Net (SSBU) that harmoniously combines the Ball-Pivoting surface reconstruction algorithm and 3D U-Net to enhance image characteristics, leading to highly accurate outcomes with optimal cost efficiency. This SSBU integration involves carrying out augmentations and pre-processing of the raw LiDAR data to transform them into voxels through the process of Voxelization. The voxels are further improved through a surface reconstruction technique that utilizes the Ball Pivoting Algorithm (BPA). The resulting 3D model is analyzed using 3D U-Net deep learning architecture for robust and real-time interpretation. The implementation has produced a mean Intersection Over Union (IoU) of 83.3 over the NuScenes data and 69.7 on the KITTI dataset, outperforming the state-of-the-art.

关键词： Laser radar Three-dimensional displays Surface reconstruction Point cloud compression Accuracy Semantic segmentation Image reconstruction Surface treatment Solid modeling Roads Autonomous vehicles LiDAR semantic segmentation deep learning for visual perception point clouds U-Net surface reconstruction BPA algorithm

来源：评论

学校读者我要写书评

暂无评论

CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第5期10卷 4628-4635页

作者： Liang, Jing Deng, Zhuo Zhou, Zheming Sun, Min Ghasemalizadeh, Omid Kuo, Cheng-Hao Sen, Arnie Manocha, Dinesh Univ Maryland College Pk MD 20742 USA Amazon Bellevue WA 98004 USA

We extend our previous work, PoCo (Liang et al. 2024), and present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking into an end-to-end model and keeps the consistency of using Context-of-Clusters (CoCs) (Ma, et al. 2023) for feature processing. Unlike prior approaches that primarily focus on the RGB domain for place recognition reranking, CSCPR is designed to handle the RGB-D data. We apply the CoCs to handle cross-sourced and cross-scaled RGB-D point clouds and introduce two novel modules for reranking: the Self-Context Cluster (SCC) and the Cross Source Context Cluster (CSCC), which enhance feature representation and match query-database pairs based on local features, respectively. We also release two new datasets, ScanNetIPR and ARKitIPR. Our experiments demonstrate that CSCPR significantly outperforms state-of-the-art models on these datasets by at least 29.27% in Recall@1 on the ScanNet-PR dataset and 43.24% in the new datasets.

关键词： Point cloud compression Databases Training Feature extraction Accuracy Image color analysis Data mining Image recognition Context modeling Attention mechanisms Recognition RGB-D perception deep learning for visual perception

来源：评论

学校读者我要写书评

暂无评论

deep Modeling of Non-Gaussian Aleatoric Uncertainty

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第1期10卷 660-667页

作者： Acharya, Aastha Lee, Caleb D'Alonzo, Marissa Shamwell, Jared Ahmed, Nisar R. Russell, Rebecca Charles Stark Draper Lab Inc Cambridge MA 02139 USA Univ Colorado Boulder Ann & H J Smead Dept Aerosp Engn Sci Boulder CO 80303 USA

deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic state estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric uncertainty: parametric, discretized, and generative modeling. We systematically compare the respective strengths and weaknesses of these three methods on simulated non-Gaussian densities as well as on real-world terrain-relative navigation data. Our results show that these deep learning methods can accurately capture complex uncertainty patterns, highlighting their potential for improving the reliability and robustness of estimation systems.

关键词： Uncertainty Estimation Predictive models Robot sensing systems deep learning Probability density function State estimation Navigation Data models Gaussian distribution deep learning methods deep learning for visual perception vision-based navigation

来源：评论

学校读者我要写书评

暂无评论

MonoSG: Monocular 3D Object Detection With Stereo Guidance

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第4期10卷 3604-3611页

作者： Fan, Zhiwei Xu, Chao Chu, Minghang Huang, Yuling Ma, Yaoyao Wang, Jing Xu, Yishen Wu, Di Soochow Univ Sch Optoelect Sci & Engn Suzhou 215006 Peoples R China Aispeech Dept Res & Dev Suzhou 215006 Peoples R China Suzhou City Univ Sch Opt & Elect Informat Suzhou 215104 Peoples R China

In the context of autonomous driving, monocular 3D detection is regarded as a fundamental and essential task due to its convenience, speed, and low cost. However, the lack of depth information in monocular images presents significant challenges for predicting object 3D information. Although existing methods address this issue using LiDAR guidance or pre-trained depth estimators, their substantial computational resource requirements limit scalability in real-world applications. In this letter, we propose a novel monocular 3D object detector with stereo guidance, called MonoSG. It simulates human visual perception by using stereo images during training to guide learning and retrieves the right-view and depth information from monocular images during inference for accurate 3D detection. The Stereo Guidance Cross Attention Module (SG-CAM) is designed to fuse binocular image information. Intra-view features are extracted from binocular images, and cross-attention is computed from the left to the right view. Then, the cross-attention features are fused with the intra-view features of the left view, enabling stereo guidance for MonoSG. To better adapt to different data distributions and improve the generalization ability of MonoSG, the Stereo Guidance Auxiliary Labels (SG-AL) are introduced for each object of stereo images, with DIoU3D proposed as the label score. Furthermore, the SG-AL score loss is proposed to guide the MonoSG that can reduce gradient variance, facilitate network convergence, and mitigate the issue of insufficient depth information. Comprehensive experiments on the KITTI dataset validate the effectiveness of our method, demonstrating superior performance, particularly on low-resolution images.

关键词： Object detection autonomous vehicle navigation autonomous vehicle navigation deep learning for visual perception deep learning for visual perception

来源：评论

学校读者我要写书评

暂无评论

PACER: Preference-Conditioned All-Terrain Costmap Generation

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第5期10卷 4572-4579页

作者： Mao, Luisa Warnell, Garrett Stone, Peter Biswas, Joydeep Univ Texas Austin Dept Comp Sci Austin TX 78712 USA DEVCOM Army Res Lab Austin TX 78712 USA Sony AI Boston MA 02129 USA NVIDIA Santa Clara CA 95051 USA

In autonomous robot navigation, terrain cost assignment is typically performed using a semantics-based paradigm in which terrain is first labeled using a pre-trained semantic classifier and costs are then assigned according to a user-defined mapping between label and cost. While this approach is rapidly adaptable to changing user preferences, only preferences over the types of terrain that are already known by the semantic classifier can be expressed. In this letter, we hypothesize that a machine-learning-based alternative to the semantics-based paradigm above will allow for rapid cost assignment adaptation to preferences expressed over new terrains at deployment time without the need for additional training. To investigate this hypothesis, we introduce and study pacer, a novel approach to costmap generation that accepts as input a single birds-eye view (BEV) image of the surrounding area along with a user-specified preference context and generates a corresponding BEV costmap that aligns with the preference context. Using a staged training procedure leveraging real and synthetic data, we find that pacer is able to adapt to new user preferences at deployment time while also exhibiting better generalization to novel terrains compared to both semantics-based and representation-learning approaches.

关键词： Costs Navigation Robots Cost function Training Trajectory Planning visualization Semantics Representation learning deep learning for visual perception vision-based navigation

来源：评论

学校读者我要写书评

暂无评论

RecGS: Removing Water Caustic With Recurrent Gaussian Splatting

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第1期10卷 668-675页

作者： Zhang, Tianyi Zhi, Weiming Meyers, Braden Durrant, Nelson Huang, Kaining Mangelson, Joshua Barbalata, Corina Johnson-Roberson, Matthew Carnegie Mellon Univ Robot Inst Sch Comp Sci Pittsburgh PA 15213 USA Brigham Young Univ Dept Elect & Comp Engn Provo UT 84602 USA Louisiana State Univ Dept Mech & Ind Engn Baton Rouge LA 70803 USA

Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this letter, we present a novel method Recurrent Gaussian Splatting (RecGS), which takes advantage of today's photorealistic 3D reconstruction technology, 3D Gaussian Splatting (3DGS), to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recurrently and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deep learning approaches. The results show that our proposed RecGS paradigm can effectively separate the caustic from the seafloor, improving the visual appearance, and can be potentially applied on more problems with inconsistent illumination.

关键词： Three-dimensional displays Cameras Sea floor Filtering Neural radiance field Lighting Robot vision systems Solid modeling Robots Rendering (computer graphics) deep learning for visual perception marine robotics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：