检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Kroemer, Oliver Peters, Jan Max Planck Institute 38 Spemannstr. Tuebingen 72012 Germany

ISBN: (纸本)9781424498888

As the complexity of robots and other autonomous systems increases, it becomes more important that these systems can adapt and optimize their settings actively. However, such optimization is rarely trivial. Sampling from the system is often expensive in terms of time and other costs, and excessive sampling should therefore be avoided. The parameter space is also usually continuous and multi-dimensional. Given the inherent exploration-exploitation dilemma of the problem, we propose treating it as an episodic reinforcement learning problem. In this reinforcement learning framework, the policy is defined by the system's parameters and the rewards are given by the system's performance. The rewards accumulate during each episode of a task. In this paper, we present a method for efficiently sampling and optimizing in continuous multidimensional spaces. The approach is based on Gaussian process regression, which can represent continuous non-linear mappings from parameters to system performance. We employ an upper confidence bound policy, which explicitly manages the trade-off between exploration and exploitation. Unlike many other policies for this kind of problem, we do not rely on a discretization of the action space. The presented method was evaluated on a real robot. The robot had to learn grasping parameters in order to adapt its grasping execution to different objects. The proposed method was also tested on a more general gain tuning problem. The results of the experiments show that the presented method can quickly determine suitable parameters and is applicable to real online learning applications. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning in multidimensional continuous action spaces

Reinforcement learning in multidimensional continuous action...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Pazis, Jason Lagoudakis, Michail G. Department of Computer Science Duke University Durham NC 27708-0129 United States Department of Electronic and Computer Engineering Technical University of Crete Chania Crete 73100 Greece

ISBN: (纸本)9781424498888

The majority of learning algorithms available today focus on approximating the state (V ) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major obstacle in successfully applying reinforcement learning to real-world problems. In this paper we present an effective approach to learning and acting in domains with multidimensional and/or continuous control variables where efficient action selection is embedded in the learning process. Instead of learning and representing the state or state-action value function of the MDP, we learn a value function over an implied augmented MDP, where states represent collections of actions in the original MDP and transitions represent choices eliminating parts of the action space at each step. Action selection in the original MDP is reduced to a binary search by the agent in the transformed MDP, with computational complexity logarithmic in the number of actions, or equivalently linear in the number of action dimensions. Our method can be combined with any discrete-action reinforcement learning algorithm for learning multidimensional continuous-action policies using a state value approximator in the transformed MDP. Our preliminary results with two well-known reinforcement learning algorithms (Least-Squares Policy Iteration and Fitted Q-Iteration) on two continuous action domains (1-dimensional inverted pendulum regulator, 2-dimensional bicycle balancing) demonstrate the viability and the potential of the proposed approach. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

High-order local dynamic programming

High-order local dynamic programming

引用

作者： Tassa, Yuval Todorov, Emanuel Interdisciplinary Center for Neural Computation Hebrew University Jerusalem Israel Applied Mathematics and Computer Science and Engineering University of Washington Seattle United States

ISBN: (纸本)9781424498888

We describe a new local dynamic programming algorithm for solving stochastic continuous Optimal Control problems. We use cubature integration to both propagate the state distribution and perform the Bellman backup. The algorithm can approximate the local policy and cost-to-go with arbitrary function bases. We compare the classic quadratic cost-to-go/linear-feedback controller to a cubic cost-to-go/quadratic policy controller on a 10-dimensional simulated swimming robot, and find that the higher order approximation yields a more general policy with a larger basin of attraction. © 2011 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

DHP adaptive critic motion control of autonomous wheeled mobile robot

DHP adaptive critic motion control of autonomous wheeled mob...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Lin, Wei-Song Yang, Ping-Chieh Natl Taiwan Univ Dept Elect Engn Inst Elect Engn 1 Sec 4Roosevelt Rd Taipei 106 Taiwan

ISBN: (纸本)9781424407064

Autonomous drive of wheeled mobile robot (WMR) needs implementing velocity and path tracking control subject to complex dynamical constraints. Conventionally, this control design is obtained by analysis and synthesis of the WMR system. This paper presents the dual heuristic programming (DHF) adaptive critic design of the motion control system that enables WMR to achieve the control purpose simply by learning through trial. The design consists of an adaptive critic velocity neuro-control loop and a posture neuro-control loop. The neural weights in the velocity neuro-controller (VNC) are corrected with the DHP adaptive critic method. The designer simply expresses the control objective with a utility function. The VNC learns by sequential optimization to satisfy the control objective. The posture neuro-controller (PNC) approximates the inverse velocity model of WMR so as to map planned positions to desired velocities. Supervised drive of WMR in variant velocities supplies training samples for the PNC and VNC to setup the neural weights. In autonomous drive, the learning mechanism keeps improving the PNC and VNC. The design is evaluated on an experimental WMR. The excellent results make it certain that the DHP adaptive critic motion control design enables WMR to develop the control ability autonomously.

关键词： adaptive critic design autonomous robot neuro-control dual heuristic programming reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Continuous-time ADP for linear systems with partially unknown dynamics

Continuous-time ADP for linear systems with partially unknow...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Vrabie, Draguna Abu-Khalaf, Murad Lewis, Frank L. Wang, Youyi Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA Nanyang Technol Univ Sch Elect & Elect Engn Singapore Singapore

ISBN: (纸本)9781424407064

Approximate dynamic programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model knowledge requirements. In this paper is presented a novel online adaptive critic (AC) scheme, based on approximate dynamic programming (ADP), to solve the infinite horizon optimal control problem for continuous-time dynamical systems;thus bringing together concepts from the fields of computational intelligence and control theory. Only partial knowledge about the system model is used, as knowledge about the plant internal dynamics is not needed. The method is thus useful to determine the optimal controller for plants with partially unknown dynamics. It is shown that the proposed iterative ADP algorithm is in fact a Quasi-Newton method to solve the underlying Algebraic Riccati Equation (ARE) of the optimal control problem. An initial gain that determines a stabilizing control policy is not required. In control theory terms, in this paper is developed a direct adaptive control algorithm for obtaining the optimal control solution without knowing the system A matrix.

关键词： approximate dynamic programming adaptive critics policy iterations V-learning

来源：评论

学校读者我要写书评

暂无评论

Agent self-assessment: Determining policy quality without execution

Agent self-assessment: Determining policy quality without ex...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Hans, Alexander Duell, Siegmund Udluft, Steffen Neuroinformatics and Cognitive Robotics Lab Ilmenau University of Technology Ilmenau Germany Machine Learning Group Berlin Institute of Technology Berlin Germany Intelligent Systems and Control Siemens AG Corporate Technology Munich Munich Germany

ISBN: (纸本)9781424498888

With the development of data-efficient reinforcement learning (RL) methods, a promising data-driven solution for optimal control of complex technical systems has become available. For the application of RL to a technical system, it is usually required to evaluate a policy before actually applying it to ensure it operates the system safely and within required performance bounds. In benchmark applications one can use the system dynamics directly to measure the policy quality. In real applications, however, this might be too expensive or even impossible. Being unable to evaluate the policy without using the actual system hinders the application of RL to autonomous controllers. As a first step toward agent self-assessment, we deal with discrete MDPs in this paper. We propose to use the value function along with its uncertainty to assess a policy's quality and show that, when dealing with an MDP estimated from observations, the value function itself can be misleading. We address this problem by determining the value function's uncertainty through uncertainty propagation and evaluate the approach using a number of benchmark applications. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Online adaptive learning of optimal control solutions using integral reinforcement learning

Online adaptive learning of optimal control solutions using ...

引用

作者： Vamvoudakis, Kyriakos G. Vrabie, Draguna Lewis, Frank L. Automation and Robotics Research Institute University of Texas at Arlington Fort Worth TX 76118 United States

ISBN: (纸本)9781424498888

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system's drift dynamics. The adaptive algorithm is based on policy iteration, and it is implemented on an actor/critic structure. Both actor and critic neural networks are adapted simultaneously a persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result. © 2011 ieee.

关键词： adaptive algorithms

来源：评论

学校读者我要写书评

暂无评论

Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof

Discrete-time nonlinear HJB solution using approximate dynam...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Al-Tamimi, Asma Lewis, Frank Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424407064

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely Heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used- one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the Algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

关键词： adaptive critics approximate dynamic programming HJB policy iterations.

来源：评论

学校读者我要写书评

暂无评论

adaptive, Optimal, Virtual Synchronous Generator Control of Three-Phase Grid-Connected Inverters Under Different Grid Conditions-An adaptive dynamic programming Approach

引用

ieee TRANSACTIONS ON INDUSTRIAL INFORMATICS 2022年第11期18卷 7388-7399页

作者： Wang, Zhongyang Yu, Yunjun Gao, Weinan Davari, Masoud Deng, Chao Fuzhou Inst Technol Sch Appl Sci & Engn Fuzhou 350506 Peoples R China Nanchang Univ Dept Automat Informat Engn Nanchang 330031 Jiangxi Peoples R China Florida Inst Technol Florida Tech Coll Engn & Sci Dept Mech & Civil Engn Melbourne FL 32901 USA Georgia Southern Univ Dept Elect & Comp Engn Statesboro Campus Statesboro GA 30460 USA Nanjing Univ Posts & Telecommun Inst Adv Technol Nanjing 210023 Peoples R China

This article proposes an adaptive, optimal, data-driven control approach based on reinforcement learning and adaptive dynamic programming to the three-phase grid-connected inverter employed in virtual synchronous generators (VSGs). This article takes into account unknown system dynamics and different grid conditions, including balanced/unbalanced grids, voltage drop/sag, and weak grids. The proposed method is based on value iteration, which does not rely on an initial admissible control policy for learning. Considering the premise that the VSG control should stabilize the closed-loop dynamics, the VSG outputs are optimally regulated through the adaptive, optimal control strategy proposed in this article. Comparative simulations and experimental results validate the proposed method's effectiveness and reveal its practicality and implementation.

关键词： Voltage control Power system stability Synchronous generators Inverters Damping reinforcement learning Optimal control adaptive dynamic programming (ADP) adaptive optimal control reinforcement learning value iteration virtual synchronous generator (VSG)

来源：评论

学校读者我要写书评

暂无评论

Policy Iteration adaptive dynamic programming Algorithm for Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第3期25卷 621-634页

作者： Liu, Derong Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming discrete-time policy iteration neural networks neurodynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：