检索结果-内蒙古大学图书馆

DIMINISHING RETURN OF VALUE EXPANSION METHODS IN MODEL-BASED REINFORCEMENT learning 11

学校读者我要写书评

暂无评论

DIMINISHING RETURN OF VALUE EXPANSION METHODS IN MODEL-BASED...

11th International Conference on learning Representations, ICLR 2023

作者： Palenicek, Daniel Lutter, Michael Carvalho, João Peters, Jan Intelligent Autonomous Systems Technical University of Darmstadt Germany Hessian.AI Hochschulstr. 10 Darmstadt64293 Germany Research Department: Systems AI for Robot Learning Germany Centre for Cognitive Science Hochschulstr. 10 Darmstadt64293 Germany

Model-based reinforcement learning is one approach to increase sample efficiency. However, the accuracy of the dynamics model and the resulting compounding error over modelled trajectories are commonly regarded as key limitations. A natural question to ask is: How much more sample efficiency can be gained by improving the learned dynamics models? Our paper empirically answers this question for the class of model-based value expansion methods in continuous control problems. Value expansion methods should benefit from increased model accuracy by enabling longer rollout horizons and better value function approximations. Our empirical study, which leverages oracle dynamics models to avoid compounding model errors, shows that (1) longer horizons increase sample efficiency, but the gain in improvement decreases with each additional expansion step, and (2) the increased model accuracy only marginally increases the sample efficiency compared to learned models with identical horizons. Therefore, longer horizons and increased model accuracy yield diminishing returns in terms of sample efficiency. These improvements in sample efficiency are particularly disappointing when compared to model-free value expansion methods. Even though they introduce no computational overhead, we find their performance to be on-par with model-based value expansion methods. Therefore, we conclude that the limitation of model-based value expansion methods is not the model accuracy of the learned models. While higher model accuracy is beneficial, our experiments show that even a perfect model will not provide an un-rivalled sample efficiency but that the bottleneck lies elsewhere. © 2023 11th International Conference on learning Representations, ICLR 2023. All rights reserved.

关键词： Efficiency

Safe and Efficient Path Planning under Uncertainty via Deep Collision Probability Fields

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Herrmann, Felix Zach, Sebastian Banfi, Jacopo Peters, Jan Chalvatzaki, Georgia Tateo, Davide Computer Science department TU Darmstadt Germany Hessian.AI Germany Research Department: Systems AI for Robot Learning Germany

Estimating collision probabilities between robots and environmental obstacles or other moving agents is crucial to ensure safety during path planning. This is an important building block of modern planning algorithms in many application scenarios such as autonomous driving, where noisy sensors perceive obstacles. While many approaches exist, they either provide too conservative estimates of the collision probabilities or are computationally intensive due to their sampling-based nature. To deal with these issues, we introduce Deep Collision Probability Fields, a neural-based approach for computing collision probabilities of arbitrary objects with arbitrary unimodal uncertainty distributions. Our approach relegates the computationally intensive estimation of collision probabilities via sampling at the training step, allowing for fast neural network inference of the constraints during planning. In extensive experiments, we show that Deep Collision Probability Fields can produce reasonably accurate collision probabilities (up to 10−3) for planning and that our approach can be easily plugged into standard path planning approaches to plan safe paths on 2-D maps containing uncertain static and dynamic obstacles. Additional material, code, and videos are available at https://***/view/ral-dcpf. Copyright © 2024, The Authors. All rights reserved.

关键词： Risk assessment

ROBUST ADVERSARIAL REINFORCEMENT learning VIA BOUNDED RATIONALITY CURRICULA 12

学校读者我要写书评

暂无评论

ROBUST ADVERSARIAL REINFORCEMENT LEARNING VIA BOUNDED RATION...

12th International Conference on learning Representations, ICLR 2024

作者： Reddi, Aryaman Tölle, Maximilian Peters, Jan Chalvatzaki, Georgia D'Eramo, Carlo Department of Computer Science TU Darmstadt Germany Germany Systems AI for Robot Learning Germany Center for Cognitive Science TU Darmstadt Germany Center for Artificial Intelligence and Data Science University of Würzburg Germany

Robustness against adversarial attacks and distribution shifts is a long-standing goal of Reinforcement learning (RL). To this end, Robust Adversarial Reinforcement learning (RARL) trains a protagonist against destabilizing forces exercised by an adversary in a competitive zero-sum Markov game, whose optimal solution, i.e., rational strategy, corresponds to a Nash equilibrium. However, finding Nash equilibria requires facing complex saddle point optimization problems, which can be prohibitive to solve, especially for high-dimensional control. In this paper, we propose a novel approach for adversarial RL based on entropy regularization to ease the complexity of the saddle point optimization problem. We show that the solution of this entropy-regularized problem corresponds to a Quantal Response Equilibrium (QRE), a generalization of Nash equilibria that accounts for bounded rationality, i.e., agents sometimes play random actions instead of optimal ones. Crucially, the connection between the entropy-regularized objective and QRE enables free modulation of the rationality of the agents by simply tuning the temperature coefficient. We leverage this insight to propose our novel algorithm, Quantal Adversarial RL (QARL), which gradually increases the rationality of the adversary in a curriculum fashion until it is fully rational, easing the complexity of the optimization problem while retaining robustness. We provide extensive evidence of QARL outperforming RARL and recent baselines across several MuJoCo locomotion and navigation problems in overall performance and robustness. © 2024 12th International Conference on learning Representations, ICLR 2024. All rights reserved.

关键词： Reinforcement learning

MULTI-TASK REINFORCEMENT learning WITH MIXTURE OF ORTHOGONAL EXPERTS 12

学校读者我要写书评

暂无评论

MULTI-TASK REINFORCEMENT LEARNING WITH MIXTURE OF ORTHOGONAL...

12th International Conference on learning Representations, ICLR 2024

作者： Hendawy, Ahmed Peters, Jan D'Eramo, Carlo Department of Computer Science TU Darmstadt Germany Germany Center for Cognitive Science TU Darmstadt Germany Systems AI for Robot Learning Germany Center for Artificial Intelligence and Data Science University of Würzburg Germany

Multi-Task Reinforcement learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld. © 2024 12th International Conference on learning Representations, ICLR 2024. All rights reserved.

关键词： Reinforcement learning

Gait in Eight: Efficient On-robot learning for Omnidirectional Quadruped Locomotion

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Bohlinger, Nico Kinzel, Jonathan Palenicek, Daniel Antczak, Lukasz Peters, Jan Department of Computer Science Technical University of Darmstadt Germany hessian.AI. MAB Robtics Poznan Poland Research Department: Systems AI for Robot Learning Germany Germany

On-robot Reinforcement learning is a promising approach to train embodiment-aware policies for legged robots. However, the computational constraints of real-time learning on robots pose a significant challenge. We present a framework for efficiently learning quadruped locomotion in just 8 minutes of raw real-time training utilizing the sample efficiency and minimal computational overhead of the new off-policy algorithm CrossQ. We investigate two control architectures: Predicting joint target positions for agile, high-speed locomotion and Central Pattern Generators for stable, natural gaits. While prior work focused on learning simple forward gaits, our framework extends on-robot learning to omnidirectional locomotion. We demonstrate the robustness of our approach in different indoor and outdoor environments and provide the videos and code for our experiments at: https://***/gait_in_eight_website Copyright © 2025, The Authors. All rights reserved.

关键词： Biped locomotion

Safe Reinforcement learning of Dynamic High-Dimensional robotic Tasks: Navigation, Manipulation, Interaction

学校读者我要写书评

暂无评论

Safe Reinforcement Learning of Dynamic High-Dimensional Robo...

IEEE International Conference on robotics and Automation (ICRA)

作者： Puze Liu Kuo Zhang Davide Tateo Snehal Jauhri Zhiyuan Hu Jan Peters Georgia Chalvatzaki Computer Science Department Technical University Darmstadt Research Department: Systems AI for Robot Learning German Research Center for AI (DFKI) Hessian.AI Centre for Cognitive Science

Safety is a fundamental property for the real-world deployment of robotic platforms. Any control policy should avoid dangerous actions that could harm the environment, humans, or the robot itself. In reinforcement learning (RL), safety is crucial when exploring a new environment to learn a new skill. This paper introduces a new formulation of safe exploration for robotic RL in the tangent space of the constraint manifold that effectively transforms the action space of the RL agent for always respecting safety constraints locally. We show how to apply this approach to a wide range of robotic platforms and how to define safety constraints that represent dynamic articulated objects like humans in the context of robotic RL. Our proposed approach achieves state-of-the-art performance in simulated high-dimensional and dynamic tasks while avoiding collisions with the environment. We show safe real-world deployment of our learned controller on a $\text{TIAGo}++$ robot, achieving remarkable performance in manipulation and human-robot interaction tasks.

关键词：

Diminishing Return of Value Expansion Methods

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Palenicek, Daniel Lutter, Michael Carvalho, João Dennert, Daniel Ahmad, Faran Peters, Jan Technical University of Darmstadt Germany FG Intelligent Autonomous Systems Hessian.AI Germany Research Department: Systems AI for Robot Learning The Centre for Cognitive Science Technical University of Darmstadt Germany Germany

Model-based reinforcement learning aims to increase sample efficiency, but the accuracy of dynamics models and the resulting compounding errors are often seen as key limitations. This paper empirically investigates potential sample efficiency gains from improved dynamics models in model-based value expansion methods. Our study reveals two key findings when using oracle dynamics models to eliminate compounding errors. First, longer rollout horizons enhance sample efficiency, but the improvements quickly diminish with each additional expansion step. Second, increased model accuracy only marginally improves sample efficiency compared to learned models with identical horizons. These diminishing returns in sample efficiency are particularly noteworthy when compared to model-free value expansion methods. These model-free algorithms achieve comparable performance without the computational overhead. Our results suggest that the limitation of model-based value expansion methods cannot be attributed to model accuracy. Although higher accuracy is beneficial, even perfect models do not provide unrivaled sample efficiency. Therefore, the bottleneck exists elsewhere. These results challenge the common assumption that model accuracy is the primary constraint in model-based reinforcement learning. © 2024, CC BY.

关键词：

Model-Based Uncertainty in Value Functions

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Luis, Carlos E. Bottero, Alessandro G. Vinogradska, Julia Berkenkamp, Felix Peters, Jan Bosch Center for Artificial Intelligence India Institute for Intelligent Autonomous Systems TU Darmstadt Germany Research Department: Systems AI for Robot Learning Germany Hessian.AI

We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency. Copyright © 2023, The Authors. All rights reserved.

关键词： Reinforcement learning

Coherent soft imitation learning 23

学校读者我要写书评

暂无评论

Coherent soft imitation learning

Proceedings of the 37th International Conference on Neural Information Processing systems

作者： Joe Watson Sandy H. Huang Nicolas Heess TU Darmstadt Dannstadt Gennany and Systems AI for Robot Learning German Research Center for AI Google DeepMind London United Kingdom

Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing between BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are rare, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement learning setting, we show that the behavioral-cloned policy can be used as both a shaped reward and a critic hypothesis space by inverting the regularized policy update. This coherency facilitates fine-tuning cloned policies using the reward estimate and additional interactions with the environment. Our approach conveniently achieves imitation learning through initial behavioral cloning and subsequent refinement via RL with online or offline data sources. The simplicity of the approach enables graceful scaling to high-dimensional and vision-based tasks, with stable learning and minimal hyperparameter tuning, in contrast to adversarial approaches. For the open-source implementation and simulation results, see ***/csil.

关键词：