检索结果-内蒙古大学图书馆

2025 IEEE/SICE International Symposium on System Integration, SII 2025

作者： Seliunina, Svetlana Otelepko, Artem Memmesheimer, Raphael Behnke, Sven University of Bonn Autonomous Intelligent Systems Group Computer Science Institute VI - Intelligent Systems and Robotics Lamarr Institute for Machine Learning and Artificial Intelligence Center for Robotics Germany

ISBN: (纸本)9798331531614

Robots need to perceive persons in their surroundings for safety and to interact with them. In this paper, we present a person segmentation and action classification approach that operates on 3D scans of hemisphere field of view LiDAR sensors. We recorded a data set with an Ouster OSDome-64 sensor consisting of scenes where persons perform three different actions and annotated it. We propose a method based on a MaskDINO model to detect and segment persons and to recognize their actions from combined spherical projected multi-channel representations of the LiDAR data with an additional positional encoding. Our approach demonstrates good performance for the person segmentation task and further performs well for the estimation of the person action states walking, waving, and sitting. An ablation study provides insights about the individual channel contributions for the person segmentation task. The trained models, code and dataset are made publicly available. © 2025 IEEE.

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service robotics 23

A Comparison of Prompt Engineering Techniques for Task Plann...

引用

23rd IEEE-RAS International Conference on Humanoid Robots, Humanoids 2024

作者： Bode, Jonas Pätzold, Bastian Memmesheimer, Raphael Behnke, Sven Computer Science Institute Vi - Intelligent Systems and Robotics Lamarr Institute for Machine Learning and Artificial Intelligence Center for Robotics University of Bonn Autonomous Intelligent Systems group Germany

ISBN: (纸本)9798350373578

Recent advances in Large Language Models (LLMs) have been instrumental in autonomous robot control and human-robot interaction by leveraging their vast general knowledge and capabilities to understand and reason across a wide range of tasks and scenarios. Previous works have investigated various prompt engineering techniques for improving the performance of LLMs to accomplish tasks, while others have proposed methods that utilize LLMs to plan and execute tasks based on the available functionalities of a given robot platform. In this work, we consider both lines of research by comparing prompt engineering techniques and combinations thereof within the application of high-level task planning and execution in service robotics. We define a diverse set of tasks and a simple set of functionalities in simulation, and measure task completion accuracy and execution time for several state-of-the-art models. We make our code, including all prompts, available at https://***/AIS-Bonn/Prompt-Engineering. © 2024 IEEE.

关键词： Human robot interaction

来源：评论

学校读者我要写书评

暂无评论

Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation

Epipolar Attention Field Transformers for Bird's Eye View Se...

引用

2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025

作者： Witte, Christian Behley, Jens Stachniss, Cyrill Raaijmakers, Marvin Cariad Se Germany University of Bonn Center for Robotics Germany Lamarr Institute for Machine Learning and Artificial Intelligence Germany

ISBN: (纸本)9798331510831

Spatial understanding of the semantics of the surroundings is a key capability needed by autonomous cars to enable safe driving decisions. Recently, purely vision-based solutions have gained increasing research interest. In particular, approaches extracting a bird's eye view (BEV) from multiple cameras have demonstrated great performance for spatial understanding. This paper addresses the dependency on learned positional encodings to correlate image and BEV feature map elements for transformer-based methods. We propose leveraging epipolar geometric constraints to model the relationship between cameras and the BEV by Epipolar Attention Fields. They are incorporated into the attention mechanism as a novel attribution term, serving as an alternative to learned positional encodings. Experiments show that our method EAFormer outperforms previous BEV approaches by 2% mIoU for map semantic segmentation and exhibits superior generalization capabilities compared to implicitly learning the camera configuration. © 2025 IEEE.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots Using Foundation Models for Perception and Planning

RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service...

引用

27th RoboCup International Symposium, 2024

作者： Memmesheimer, Raphael Nogga, Jan Pätzold, Bastian Kruzhkov, Evgenii Bultmann, Simon Schreiber, Michael Bode, Jonas Karacora, Bertan Park, Juhui Savinykh, Alena Behnke, Sven Autonomous Intelligent Systems Computer Science Institute VI Lamarr Institute for Machine Learning and Artificial Intelligence and Center for Robotics University of Bonn Bonn Germany

ISBN: (纸本)9783031858581

We present the approaches and contributions of the winning team NimbRo@Home at the RoboCup@Home 2024 competition in the Open Platform League held in Eindhoven, NL. Further, we describe our hardware setup and give an overview of the results for the task stages and the final demonstration. For this year’s competition, we put a special emphasis on open-vocabulary object segmentation and grasping approaches that overcome the labeling overhead of supervised vision approaches, commonly used in RoboCup@Home. We successfully demonstrated that we can segment and grasp non-labeled objects by text descriptions. Further, we extensively employed Large Language Model (LLMs) for natural language understanding and task planning. Throughout the competition, our approaches showed robustness and generalization capabilities. A video of our performance can be found online (https://***/videos/RoboCup_2024). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Robot programming

来源：评论

学校读者我要写书评

暂无评论

Sampling-based multi-dimensional recalibration 24

Sampling-based multi-dimensional recalibration

引用

Proceedings of the 41st International Conference on machine learning

作者： Youngseog Chung Ian Char Jeff Schneider Machine Learning Department Machine Learning Department and Robotics Institute Carnegie Mellon University Pittsburgh PA

Calibration of probabilistic forecasts in the regression setting has been widely studied in the single dimensional case, where the output variables are assumed to be univariate. In many problem settings, however, the output variables are multi-dimensional, and in the presence of dependence across the output dimensions, measuring calibration and performing recalibration for each dimension separately can be both misleading and detrimental. In this work, we focus on representing predictive uncertainties via samples, and propose a recalibration method which accounts for the joint distribution across output dimensions to produce calibrated samples. Based on the concept of highest density regions (HDR), we define the notion of HDR calibration, and show that our recalibration method produces samples which are HDR calibrated. We demonstrate the performance of our method and the quality of the recalibrated samples on a suite of benchmark datasets in multi-dimensional regression, a real-world dataset in modeling plasma dynamics during nuclear fusion reactions, and on a decision-making application in forecasting demand.

关键词：

来源：评论

学校读者我要写书评

暂无评论

HyenaPixel: Global Image Context with Convolutions 27

HyenaPixel: Global Image Context with Convolutions

引用

27th European Conference on Artificial Intelligence, ECAI 2024

作者： Spravil, Julian Houben, Sebastian Behnke, Sven Fraunhofer IAIS Germany University of Applied Sciences Bonn-Rhein-Sieg Germany University of Bonn Computer Science Institute VI Center for Robotics Germany Lamarr Institute for Machine Learning and Artificial Intelligence Germany

ISBN: (纸本)9781643685489

In computer vision, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, its quadratic complexity limits its applicability to tasks that benefit from high-resolution input. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to bidirectional data and two-dimensional image space. We scale Hyena's convolution kernels beyond the feature map size, up to 191×191, to maximize ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 84.9% and 85.2%, respectively, with no additional training data, while outperforming other convolutional and large-kernel networks. Combining HyenaPixel with attention further improves accuracy. We attribute the success of bidirectional Hyena to learning the data-dependent geometric arrangement of pixels without a fixed neighborhood definition. Experimental results on downstream tasks suggest that HyenaPixel with large filters and a fixed neighborhood leads to better localization performance. © 2024 The Authors.

关键词： Complex networks

来源：评论

学校读者我要写书评

暂无评论

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

arXiv

引用

arXiv 2025年

作者： Hu, Hanjiang Robey, Alexander Liu, Changliu Robotics Institute Machine Learning Department Carnegie Mellon University United States Machine Learning Department Carnegie Mellon University United States Robotics Institute Carnegie Mellon University United States

Large language models (LLMs) are highly vulnerable to jailbreaking attacks, wherein adversarial prompts are designed to elicit harmful responses. While existing defenses effectively mitigate single-turn attacks by detecting and filtering unsafe inputs, they fail against multi-turn jailbreaks that exploit contextual drift over multiple interactions, gradually leading LLMs away from safe behavior. To address this challenge, we propose a safety steering framework grounded in safe control theory, ensuring invariant safety in multi-turn dialogues. Our approach models the dialogue with LLMs using state-space representations and introduces a novel neural barrier function (NBF) to detect and filter harmful queries emerging from evolving contexts proactively. Our method achieves invariant safety at each turn of dialogue by learning a safety predictor that accounts for adversarial queries, preventing potential context drift toward jailbreaks. Extensive experiments under multiple LLMs show that our NBF-based safety steering outperforms safety alignment baselines, offering stronger defenses against multi-turn jailbreaks while maintaining a better trade-off between safety and helpfulness under different multi-turn jailbreak methods. © 2025, CC BY-NC-SA.

关键词： Invariance

来源：评论

学校读者我要写书评

暂无评论

WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models

WEDGE: A multi-weather autonomous driving dataset built from...

引用

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023

作者： Marathe, Aboli Ramanan, Deva Walambe, Rahee Kotecha, Ketan Carnegie Mellon University Machine Learning Department PA United States Carnegie Mellon University Robotics Institute PA United States India India

ISBN: (纸本)9798350302493

The open road poses many challenges to autonomous perception, including poor visibility from extreme weather conditions. Models trained on good-weather datasets frequently fail at detection in these out-of-distribution settings. To aid adversarial robustness in perception, we introduce WEDGE (WEather images by DALL-E GEneration): a synthetic dataset generated with a vision-language generative model via prompting. WEDGE consists of 3360 images in 16 extreme weather conditions manually annotated with 16513 bounding boxes, supporting research in the tasks of weather classification and 2D object detection. We have analyzed WEDGE from research standpoints, verifying its effectiveness for extreme-weather autonomous perception. We establish baseline performance for classification and detection with 53.87% test accuracy and 45.41 mAP. Most importantly, WEDGE can be used to fine-tune state-of-the-art detectors, improving SOTA performance on real-world weather benchmarks (such as DAWN) by 4.48 AP for well-generated classes like trucks. WEDGE has been collected under OpenAI's terms 1 of use and is released for public use under the CC BY-NC-SA 4.0 license. The repository for this work and dataset is available at https://***/WEDGE. © 2023 IEEE.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

PlaySlot: learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning

arXiv

引用

arXiv 2025年

作者： Villar-Corrales, Angel Behnke, Sven Computer Science Institute VI – Intelligent Systems and Robotics Center for Robotics The Lamarr Institute for Machine Learning and Artificial Intelligence Germany

Predicting future scene representations is a crucial task for enabling robots to understand and interact with the environment. However, most existing methods rely on video sequences and simulations with precise action annotations, limiting their ability to leverage the large amount of available unlabeled video data. To address this challenge, we propose PlaySlot, an object-centric video prediction model that infers object representations and latent actions from unlabeled video sequences. It then uses these representations to forecast future object states and video frames. PlaySlot allows to generate multiple possible futures conditioned on latent actions, which can be inferred from video dynamics, provided by a user, or generated by a learned action policy, thus enabling versatile and interpretable world modeling. Our results show that PlaySlot outperforms both stochastic and object-centric baselines for video prediction across different environments. Furthermore, we show that our inferred latent actions can be used to learn robot behaviors sample-efficiently from unlabeled video demonstrations. Videos and code are available at https://***/PlaySlot/. © 2025, CC BY.

关键词： Stochastic systems

来源：评论

学校读者我要写书评

暂无评论

3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural Representation

3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit...

引用

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Xingguang Zhong Yue Pan Cyrill Stachniss Jens Behley Center for Robotics University of Bonn Lamarr Institute for Machine Learning and Artificial Intelligence

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Building accurate maps is a key building block to enable reliable localization, planning, and navigation of autonomous vehicles. We propose a novel approach for building accurate maps of dynamic environments utilizing a sequence of LiDAR scans. To this end, we propose encoding the 4D scene into a novel spatio-temporal implicit neural map representation by fitting a time-dependent truncated signed distance function to each point. Using our representation, we extract the static map by filtering the dynamic parts. Our neural representation is based on sparse feature grids, a globally shared decoder, and time-dependent basis functions, which we jointly optimize in an unsupervised fashion. To learn this representation from a sequence of Li-DAR scans, we design a simple yet efficient loss function to supervise the map optimization in a piecewise way. We evaluate our approach 1 1 Code: https://***/PRBonn/4dNDF on various scenes containing moving objects in terms of the reconstruction quality of static maps and the segmentation of dynamic point clouds. The experimental results demonstrate that our method is capable of removing the dynamic part of the input point clouds while reconstructing accurate and complete 3D maps, out-performing several state-of-the-art methods.

关键词： Point cloud compression Three-dimensional displays Laser radar Accuracy Buildings Planning Pattern recognition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：