检索结果-内蒙古大学图书馆

A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service robotics

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Bode, Jonas Pätzold, Bastian Memmesheimer, Raphael Behnke, Sven The Autonomous Intelligent Systems group Computer Science Institute VI – Intelligent Systems and Robotics Lamarr Institute for Machine Learning and Artificial Intelligence Center for Robotics University of Bonn Germany

Recent advances in Large Language Models (LLMs) have been instrumental in autonomous robot control and human-robot interaction by leveraging their vast general knowledge and capabilities to understand and reason across a wide range of tasks and scenarios. Previous works have investigated various prompt engineering techniques for improving the performance of LLMs to accomplish tasks, while others have proposed methods that utilize LLMs to plan and execute tasks based on the available functionalities of a given robot platform. In this work, we consider both lines of research by comparing prompt engineering techniques and combinations thereof within the application of high-level task planning and execution in service robotics. We define a diverse set of tasks and a simple set of functionalities in simulation, and measure task completion accuracy and execution time for several state-of-the-art models. We make our code, including all prompts, available at https://***/AIS-Bonn/Prompt_Engineering. Copyright © 2024, The Authors. All rights reserved.

关键词： Human robot interaction

MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Periyasamy, Arul Selvam Behnke, Sven the Autonomous Intelligent Systems group Computer Science Institute VI – Intelligent Systems and Robotics the Center for Robotics the Lamarr Institute for Machine Learning and Artificial Intelligence University of Bonn Germany

Cluttered bin-picking environments are challenging for pose estimation models. Despite the impressive progress enabled by deep learning, single-view RGB pose estimation models perform poorly in cluttered dynamic environments. Imbuing the rich temporal information contained in the video of scenes has the potential to enhance models’ ability to deal with the adverse effects of occlusion and the dynamic nature of the environments. Moreover, joint object detection and pose estimation models are better suited to leverage the co-dependent nature of the tasks for improving the accuracy of both tasks. To this end, we propose attention-based temporal fusion for multi-object 6D pose estimation that accumulates information across multiple frames of a video sequence. Our MOTPose method takes a sequence of images as input and performs joint object detection and pose estimation for all objects in one forward pass. It learns to aggregate both object embeddings and object parameters over multiple time steps using cross-attention-based fusion modules. We evaluate our method on the physically-realistic cluttered bin-picking dataset SynPick and the YCB-Video dataset and demonstrate improved pose estimation accuracy as well as better object detection accuracy. © 2024, CC BY.

关键词： Object detection

Self-Centering 3-DoF Feet Controller for Hands-Free Locomotion Control in Telepresence and Virtual Reality

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Memmesheimer, Raphael Lenz, Christian Schwarz, Max Schreiber, Michael Behnke, Sven The Autonomous Intelligent Systems group Computer Science Institute VI - Intelligent Systems and Robotics Lamarr Institute for Machine Learning and Artificial Intelligence Center for Robotics University of Bonn Germany

We present a novel seated feet controller for handling 3 Degree of Freedom (DoF) aimed to control locomotion for telepresence robotics and virtual reality environments. Tilting the feet on two axes yields in forward, backward and sideways motion. In addition, a separate rotary joint allows for rotation around the vertical axis. Attached springs on all joints self-center the controller. The HTC Vive tracker is used to translate the trackers' orientation into locomotion commands. The proposed self-centering feet controller was used successfully for the ANA Avatar XPRIZE competition, where a naive operator traversed the robot through a longer distance, surpassing obstacles while solving various interaction and manipulation tasks in between. We publicly provide the models of the mostly 3D-printed feet controller for reproduction. Copyright © 2024, The Authors. All rights reserved.

关键词： Biped locomotion

A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service robotics

学校读者我要写书评

暂无评论

A Comparison of Prompt Engineering Techniques for Task Plann...

IEEE-RAS International Conference on Humanoid Robots

作者： Jonas Bode Bastian Pätzold Raphael Memmesheimer Sven Behnke Autonomous Intelligent Systems group Computer Science Institute VI – Intelligent Systems and Robotics Lamarr Institute for Machine Learning and Artificial Intelligence and Center for Robotics University of Bonn Germany

ISBN: (数字)9798350373578

ISBN: (纸本)9798350373585

关键词： Knowledge engineering Service robots Large language models Instruments Humanoid robots Reliability engineering Time measurement Planning Prompt engineering Tuning

Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects

学校读者我要写书评

暂无评论

Grasp Anything: Combining Teacher-Augmented Policy Gradient ...

IEEE International Conference on robotics and Automation (ICRA)

作者： Malte Mosbach Sven Behnke Autonomous Intelligent Systems Group Computer Science Institute VI – Intelligent Systems and Robotics – and the Center for Robotics and the Lamarr Institute for Machine Learning and Artificial Intelligence University of Bonn Germany

ISBN: (数字)9798350384574

ISBN: (纸本)9798350384581

Interactive grasping from clutter, akin to human dexterity, is one of the longest-standing problems in robot learning. Challenges stem from the intricacies of visual perception, the demand for precise motor skills, and the complex interplay between the two. In this work, we present Teacher-Augmented Policy Gradient (TAPG), a novel two-stage learning framework that synergizes reinforcement learning and policy distillation. After training a teacher policy to master the motor control based on object pose information, TAPG facilitates guided, yet adaptive, learning of a sensorimotor policy, based on object segmentation. We zero-shot transfer from simulation to a real robot by using Segment Anything Model for promptable object segmentation. Our trained policies adeptly grasp a wide variety of objects from cluttered scenarios in simulation and the real world based on human-understandable prompts. Furthermore, we show robust zero-shot transfer to novel objects. Videos of our experiments are available at https://***/grasp_anything.

关键词： Training Instance segmentation Motor drives Object segmentation Reinforcement learning Robot sensing systems Motors

SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Cao, Helin Behnke, Sven Autonomous Intelligent Systems group Computer Science Institute VI-Intelligent Systems and Robotics Center for Robotics and the Lamarr Institute for Machine Learning and Artificial Intelligence University of Bonn Germany

We introduce SLCF-Net, a novel approach for the Semantic Scene Completion (SSC) task that sequentially fuses LiDAR and camera data. It jointly estimates missing geometry and semantics in a scene from sequences of RGB images and sparse LiDAR measurements. The images are semantically segmented by a pre-trained 2D U-Net and a dense depth prior is estimated from a depth-conditioned pipeline fueled by Depth Anything. To associate the 2D image features with the 3D scene volume, we introduce Gaussian-decay Depth-prior Projection (GDP). This module projects the 2D features into the 3D volume along the line of sight with a Gaussian-decay function, centered around the depth prior. Volumetric semantics is computed by a 3D U-Net. We propagate the hidden 3D U-Net state using the sensor motion and design a novel loss to ensure temporal consistency. We evaluate our approach on the SemanticKITTI dataset and compare it with leading SSC approaches. The SLCF-Net excels in all SSC metrics and shows great temporal consistency. Copyright © 2024, The Authors. All rights reserved.

关键词： Semantics

Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Mosbach, Malte Behnke, Sven The Autonomous Intelligent Systems group Computer Science Institute VI - Intelligent Systems and Robotics The Center for Robotics The Lamarr Institute for Machine Learning and Artificial Intelligence University of Bonn Germany

关键词： Distillation

DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Cao, Helin Behnke, Sven The Autonomous Intelligent Systems group Computer Science Institute VI – Intelligent Systems and Robotics The Center for Robotics The Lamarr Institute for Machine Learning and Artificial Intelligence University of Bonn Germany

Perception systems play a crucial role in autonomous driving, incorporating multiple sensors and corresponding computer vision algorithms. 3D LiDAR sensors are widely used to capture sparse point clouds of the vehicle’s surroundings. However, such systems struggle to perceive occluded areas and gaps in the scene due to the sparsity of these point clouds and their lack of semantics. To address these challenges, Semantic Scene Completion (SSC) jointly predicts unobserved geometry and semantics in the scene given raw LiDAR measurements, aiming for a more complete scene representation. Building on promising results of diffusion models in image generation and super-resolution tasks, we propose their extension to SSC by implementing the noising and denoising diffusion processes in the point and semantic spaces individually. To control the generation, we employ semantic LiDAR point clouds as conditional input and design local and global regularization losses to stabilize the denoising process. We evaluate our approach on autonomous driving datasets and our approach outperforms the state-of-the-art for SSC. © 2024, CC BY.

关键词： Semantics

SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net

学校读者我要写书评

暂无评论

SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene ...

IEEE International Conference on robotics and Automation (ICRA)

作者： Helin Cao Sven Behnke Autonomous Intelligent Systems Group Computer Science Institute VI – Intelligent Systems and Robotics – and the Center for Robotics and the Lamarr Institute for Machine Learning and Artificial Intelligence University of Bonn Germany

ISBN: (数字)9798350384574

ISBN: (纸本)9798350384581

关键词： Measurement Image segmentation Three-dimensional displays Laser radar Semantics Pipelines Robot sensing systems