检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Buşoniu, Lucian Ernst, Damien De Schutter, Bart Babuška, Robert Delft Center for Systems and Control Delft Univ. of Technology Netherlands Research Associate of the FRS-FNRS Systems and Modeling Unit University of Liège Liège Belgium

ISBN: (纸本)9781424498888

reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Using reward-weighted imitations for robot reinforcement learning

Using reward-weighted imitations for robot reinforcement lea...

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2009

作者： Peters, Jan Kober, Jens Department of Empirical Inference and Machine Leartling Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tlibingen Germany

ISBN: (纸本)9781424427611

reinforcement learning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcement learning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-aCup task using a real Barrett WAMTM robot arm) and learning task-space control. © 2009 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Feedback controller parameterizations for reinforcement learning

Feedback controller parameterizations for Reinforcement Lear...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Roberts, John W. Manchester, Ian R. Tedrake, Russ CSAIL MIT Cambridge MA 02139 United States

ISBN: (纸本)9781424498888

reinforcement learning offers a very general framework for learning controllers, but its effectiveness is closely tied to the controller parameterization used. Especially when learning feedback controllers for weakly stable systems, ineffective parameterizations can result in unstable controllers and poor performance both in terms of learning convergence and in the cost of the resulting policy. In this paper we explore four linear controller parameterizations in the context of REINFORCE, applying them to the control of a reaching task with a linearized flexible manipulator. We find that some natural but naive parameterizations perform very poorly, while the Youla Parameterization (a popular parameterization from the controls literature) offers a number of robustness and performance advantages. © 2011 ieee.

关键词： Parameterization

来源：评论

学校读者我要写书评

暂无评论

learning-Based Neural dynamic Surface Predictive Control for MMC

引用

ieee TRANSACTIONS ON POWER ELECTRONICS 2023年第1期38卷 53-59页

作者： Liu, Xing Qiu, Lin Rodriguez, Jose Wang, Kui Li, Yongdong Fang, Youtong Zhejiang Univ Coll Elect Engn Hangzhou 310027 Peoples R China Zhejiang Univ Univ Illinois Urbana Champaign Inst Hangzhou 310027 Peoples R China Tsinghua Univ Dept Elect Engn State Key Lab Power Syst Beijing 100084 Peoples R China Univ San Sebastian Santiago Fac Engn Santiago 8420524 Chile

reinforcement learning technique was developed recently as an interesting topic in designing adaptive optimal controllers. This technique explicitly provided a feasible solution to circumvent the "curse of dimensionality" and requiring a system model inherent in the classical dynamic programming algorithm. By virtue of this property, in our work, by introducing this technique into a predictor-based online adaptive neural dynamic surface predictive control architecture, we concentrate on a novel robust predictive control framework subject to system uncertainties. To be specific, in this presented framework, an adaptive dynamic programming control strategy utilizing a critic neural network point of view is developed to learn the optimal control policy. Our modification is able to facilitate the alleviation of performance deterioration caused by system uncertainties and enable the smooth and fast learning, while keeping the merits of the finite control-set model predictive control. Finally, the interest and applicability of the proposed control methodology are verified by performance evaluation.

关键词： dynamic surface control finite control-set model predictive control neural network reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

An Improved N-Step Value Gradient learning adaptive dynamic programming Algorithm for Online learning

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2020年第4期31卷 1155-1169页

作者： Al-Dabooni, Seaar Wunsch, Donald C., II Missouri Univ Sci & Technol ACIL Rolla MO 65401 USA Basra Oil Co Basra 61030 Iraq Missouri Univ Sci & Technol Dept Elect & Comp Engn ACIL Rolla MO 65401 USA

In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter (lambda). This approach is known as TD(lambda), and its DHP extension is known as VGL(lambda), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N-step VGL (NSVGL), that does not need the eligibility-trace-workspace matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n-step TD(lambda) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n-step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.

关键词： adaptive dynamic programming (ADP) convergence analysis eligibility traces online learning reinforcement learning temporal difference (TD) value gradient learning (VGL)

来源：评论

学校读者我要写书评

暂无评论

Generalized Policy Iteration adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2015年第12期45卷 1577-1591页

作者： Liu, Derong Wei, Qinglai Yan, Pengfei Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptive dynamic programming algorithm to obtain iterative control laws which make the iterative value functions converge to the optimum. Initialized by an admissible control law, it is shown that the iterative value functions are monotonically nonincreasing and converge to the optimal solution of Hamilton-Jacobi-Bellman equation, under the assumption that a perfect function approximation is employed. The admissibility property is analyzed, which shows that any of the iterative control laws can stabilize the nonlinear system. Neural networks are utilized to implement the generalized policy iteration algorithm, by approximating the iterative value function and computing the iterative control law, respectively, to achieve approximate optimal control. Finally, numerical examples are presented to verify the effectiveness of the present generalized policy iteration algorithm.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming generalized policy iteration neural networks neuro-dynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Value Iteration adaptive dynamic programming for Optimal Control of Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON CYBERNETICS 2016年第3期46卷 840-853页

作者： Wei, Qinglai Liu, Derong Lin, Hanquan Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming neural networks neuro-dynamic programming optimal control reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

DRL-ECMS: An adaptive Hierarchical Equivalent Consumption Minimization Strategy Based on Deep reinforcement learning

DRL-ECMS: An Adaptive Hierarchical Equivalent Consumption Mi...

引用

33rd ieee Intelligent Vehicles symposium (ieee IV)

作者： Lin, Yang Chu, Liang Hu, Jincheng Zhang, Yuanjian Hou, Zhuoran Jilin Univ State Key Lab Automot Dynam Simulat & Control Changchun Peoples R China Univ Glasgow Sch Comp Sci Glasgow Lanark Scotland Queens Univ Belfast W Tech Ctr Belfast Antrim North Ireland

ISBN: (纸本)9781665488211

With the rise of machine learning, reinforcement learning (RL) is gradually applied to the energy management strategy (EMS) of plug-in hybrid electric vehicle (PHEV). Some old algorithms have also achieved better results by combining with reinforcement learning. In order to learn from the advantages of previous algorithms and explore the application potential of reinforcement learning algorithm, this paper proposes an adaptive hierarchical management strategy combining equivalent consumption minimization strategy (ECMS) knowledge with proximal policy optimization (PPO). This system is an advanced data-driven RL algorithm at present. For a more comprehensive comparison, this paper compares the proposed EMS with dynamic programming (DP), ECMS with constant equivalence factor and q-learning. The results show that the fuel consumption of the proposed control strategy is very close to that of the DP-based control strategy and the performance is better than the other two strategies. It shows that deep reinforcement learning can help ECMS solve the problem of dynamic factor planning and DRL-ECMS has the potential of deployment in real-time system.

关键词： reinforcement learning energy management equivalent consumption minimization strategy

来源：评论

学校读者我要写书评

暂无评论

Experimental Validation of Data-Driven adaptive Optimal Control for Continuous-Time Systems Via Hybrid Iteration: An Application to Rotary Inverted Pendulum

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2024年第6期71卷 6210-6220页

作者： Qasem, Omar Gutierrez, Hector Gao, Weinan Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Florida Inst Technol Coll Engn & Sci Dept Mech & Civil Engn Melbourne FL 32901 USA

In this article, a successive approximation learning framework for adaptive optimal control problems, named hybrid iteration (HI), is presented and validated experimentally. The HI strategy outperforms two well-known adaptive dynamic programming strategies, i.e., policy iteration (PI) and value iteration (VI). Using HI, an approximated optimal control policy is learned without the prior knowledge of an admissible control policy required by PI. At the same time, comparing to VI, the HI algorithm converges to the optimal solution with tremendously less learning iterations and CPU-time. Initially, we present the settings of the data-driven HI for continuous-time nonlinear systems and continuous-time linear systems to learn the optimal control policy without any information of the dynamics of the system. Following that, the proposed HI method is implemented on a nonlinear rotary inverted pendulum, and its online learning performance is validated by learning the optimal control policy using online data. The experimental results reveal the efficacy and practicality of the HI method, and demonstrate its superior online learning performance over the traditional PI and VI methods.

关键词： adaptive dynamic programming (ADP) adaptive optimal control hybrid iteration (HI) linear systems nonlinear systems reinforcement learning (RL) rotary inverted pendulum (RIP)

来源：评论

学校读者我要写书评

暂无评论

Continuous-time ADP for linear systems with partially unknown dynamics

Continuous-time ADP for linear systems with partially unknow...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Vrabie, Draguna Abu-Khalaf, Murad Lewis, Frank L. Wang, Youyi Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA Nanyang Technol Univ Sch Elect & Elect Engn Singapore Singapore

ISBN: (纸本)9781424407064

Approximate dynamic programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model knowledge requirements. In this paper is presented a novel online adaptive critic (AC) scheme, based on approximate dynamic programming (ADP), to solve the infinite horizon optimal control problem for continuous-time dynamical systems;thus bringing together concepts from the fields of computational intelligence and control theory. Only partial knowledge about the system model is used, as knowledge about the plant internal dynamics is not needed. The method is thus useful to determine the optimal controller for plants with partially unknown dynamics. It is shown that the proposed iterative ADP algorithm is in fact a Quasi-Newton method to solve the underlying Algebraic Riccati Equation (ARE) of the optimal control problem. An initial gain that determines a stabilizing control policy is not required. In control theory terms, in this paper is developed a direct adaptive control algorithm for obtaining the optimal control solution without knowing the system A matrix.

关键词： approximate dynamic programming adaptive critics policy iterations V-learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：