Object rearrangement, a fundamental challenge in robotics, demands versatile strategies to handle diverse objects, configurations, and functional needs. To achieve this, the AI robot needs to learn functional rearrang...
详细信息
Object rearrangement, a fundamental challenge in robotics, demands versatile strategies to handle diverse objects, configurations, and functional needs. To achieve this, the AI robot needs to learn functional rearrangement priors to specify precise goals that meet the functional requirements. Previous methods typically learn such priors from either laborious human annotations or manually designed heuristics, which limits scalability and generalization. In this letter, we propose a novel approach that leverages large models to distill functional rearrangement priors. Specifically, our approach collects diverse arrangement examples using both LLMs and VLMs and then distills the examples into a diffusion model. During test time, the learned diffusion model is conditioned on the initial configuration and guides the positioning of objects to meet functional requirements. In this way, we balance zero-shot generalization with time efficiency. Extensive experiments in multiple domains, including real-world scenarios, demonstrate the effectiveness of our approach in generating compatible goals for object rearrangement tasks, significantly outperforming baseline methods.
Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowled...
详细信息
Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation tasks, when given access to only object detection and segmentation vision models. We designed a single, task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers. Then we studied how well it can perform across 30 real-world language-based tasks, such as "open the bottle cap" and "wipe the plate with the sponge", and we investigated which design choices in this prompt are the most important. Our conclusions raise the assumed limit of LLMs for robotics, and we reveal for the first time that LLMs do indeed possess an understanding of low-level robot control sufficient for a range of common tasks, and that they can additionally detect failures and then re-plan trajectories accordingly.
We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an im...
详细信息
We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that goal image. We show that this is possible zero-shot using DALL-E, without needing any further example arrangements, data collection, or training. DALL-E-Bot is fully autonomous and is not restricted to a pre-defined set of objects or scenes, thanks to DALL-E's web-scale pre-training. Encouraging real-world results, with both human studies and objective metrics, show that integrating web-scale diffusion models into robotics pipelines is a promising direction for scalable, unsupervised robot learning.
Root-cause Analysis (RCA) of alarms is a well-established research area in automated Production Systems (aPS). Many RCA algorithms have been proposed and successfully evaluated and new ones are being developed. Recent...
详细信息
Root-cause Analysis (RCA) of alarms is a well-established research area in automated Production Systems (aPS). Many RCA algorithms have been proposed and successfully evaluated and new ones are being developed. Recently, researchers focus on the incorporation of formalized information about the technical process in the analysis to gather further evidence for common root causes. In industrial applications, alarm data are usually preprocessed to accommodate for use case-specific properties and prepare subsequent analysis steps. Consequently, this letter proposes a generalized RCA framework, for which an arbitrary number of preprocessing, data-driven RCA, and postprocessing algorithms can be selected, to support varying use cases. The framework was successfully evaluated in an industrial case study, using 1.8 million alarms recorded over 450 days from an industrial nonwoven production plant and analyzed using formalized information from process documentation and expert interviews. Seven preprocessing algorithms, one data-driven RCA algorithm, and nine postprocessing algorithms typical for continuous and hybrid technical processes were realized in an otherwise entirely use case-agnostic implementation.
In this paper, we propose a novel variable rate deep compression architecture that operates on raw 3D point cloud data. The majority of learning-based point cloud compression methods work on a downsampled representati...
详细信息
ISBN:
(纸本)9781728196817
In this paper, we propose a novel variable rate deep compression architecture that operates on raw 3D point cloud data. The majority of learning-based point cloud compression methods work on a downsampled representation of the data. Moreover, many existing techniques require training multiple networks for different compression rates to generate consolidated point clouds of varying quality. In contrast, our network is capable of explicitly processing point clouds and generating a compressed description at a comprehensive range of bitrates. Furthermore, our approach ensures that there is no loss of information as a result of the voxelization process and the density of the point cloud does not affect the encoder/decoder performance. An extensive experimental evaluation shows that our model obtains state-of-the-art results, it is computationally efficient, and it can work directly with point cloud data thus avoiding an expensive voxelized representation.
Consistently testing autonomous mobile robots in real world scenarios is a necessary aspect of developing autonomous navigation systems. Each time the human safety monitor disengages the robot's autonomy system du...
详细信息
Consistently testing autonomous mobile robots in real world scenarios is a necessary aspect of developing autonomous navigation systems. Each time the human safety monitor disengages the robot's autonomy system due to the robot performing an undesirable maneuver, the autonomy developers gain insight into how to improve the autonomy system. However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate. We present a reinforcement learning approach for learning to navigate from disengagements, or LaND. LaND learns a neural network model that predicts which actions lead to disengagements given the current sensory observation, and then at test time plans and executes actions that avoid disengagements. Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches. Videos, code, and other material are available on our website https://***/view/sidewalk-learning.
Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal. How...
详细信息
Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal. However, a purely geometric view of the world can he insufficient for many navigation problems. For example, a robot navigating based on geometry may avoid a field of tall grass because it believes it is untraversable, and will therefore fail to reach its desired goal. In this work, we investigate how to move beyond these purely geometric-based approaches using a method that learns about physical navigational affordances from experience. Our reinforcement learning approach, which we call BADGR , is an end-to-end learning-based mobile robot navigation system that can be trained with autonomously-labeled off-policy data gathered in real-world environments, without any simulation or human supervision. BADGR can navigate in real-world urban and off-road environments with geometrically distracting obstacles. It can also incorporate terrain preferences, generalize to novel environments, and continue to improve autonomously by gathering more data. Videos, code, and other supplemental material are available on our website https://***/view/badgr
During the execution of a robotic grasping task,the task may fail due to the close proximity of multiple objects if grasping is the only motion ***-prehensile manipulations,such as pushing,can be used to rearrange obj...
详细信息
ISBN:
(数字)9789887581581
ISBN:
(纸本)9798350366907
During the execution of a robotic grasping task,the task may fail due to the close proximity of multiple objects if grasping is the only motion ***-prehensile manipulations,such as pushing,can be used to rearrange objects and benefit *** pushing actions with different speeds,distances,and routines may result in better *** this study,we propose a vision perception-based Adaptive Pushing Assisted Grasping Network(APAGN) system for generating a sequence of actions that includes grasping and adaptive *** can perceive the scene and then predict the locations of objects after an adaptive push,which adjusts the force and direction of pushing based on expected *** achieve a more efficient calculation,an Action Selector of APAGN is designed to choose the object with the highest expected outcome before making a *** value of pushing actions is estimated based on how they benefit grasping,which breaks the limitation of manually designed *** show that APAGN might achieve higher action efficiency than baseline methods,especially in cluttered environments.
Spatiotemporal data are very common in many applications, such as manufacturing systems and transportation systems. Given the intrinsic complex spatial and temporal correlations of such data, short-term and long-term ...
详细信息
Spatiotemporal data are very common in many applications, such as manufacturing systems and transportation systems. Given the intrinsic complex spatial and temporal correlations of such data, short-term and long-term prediction for spatiotemporal data is often very challenging. Most of the traditional statistical models fail to preserve innate features in data alongside their complex correlations. In this paper, we focus on a tensor-based prediction method and propose several practical techniques to improve both long-term and short-term prediction accuracy. For long-term prediction, we propose the "tensor decomposition + 2-Dimensional Auto-Regressive Moving Average (2D-ARMA)" model, and an effective way to update prediction in real-time;For short-term prediction, we propose to conduct tensor completion based on tensor clustering to avoid oversimplification and ensure accuracy. A case study based on the metro passenger flow data is conducted to demonstrate the improved performance.
Humans are capable of learning a new behavior by observing others to perform the skill. Similarly, robots can also implement this by imitation learning. Furthermore, if with external guidance, humans can master the ne...
详细信息
Humans are capable of learning a new behavior by observing others to perform the skill. Similarly, robots can also implement this by imitation learning. Furthermore, if with external guidance, humans can master the new behavior more efficiently. So, how can robots achieve this? To address the issue, we present a novel framework named FIL. It provides a heterogeneous knowledge fusion mechanism for cloud robotic systems. Then, a knowledge fusion algorithm in FIL is proposed. It enables the cloud to fuse heterogeneous knowledge from local robots and generate guide models for robots with service requests. After that, we introduce a knowledge transfer scheme to facilitate local robots acquiring knowledge from the cloud. With FIL, a robot is capable of utilizing knowledge from other robots to increase its imitation learning in accuracy and efficiency. Compared with transfer learning and meta-learning, FIL is more suitable to be deployed in cloud robotic systems. Finally, we conduct experiments of a self-driving task for robots (cars). The experimental results demonstrate that the shared model generated by FIL increases imitation learning efficiency of local robots in cloud robotic systems.
暂无评论