检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kengy Barty Pierre Girardeau Jean-Sebastien Roy Cyrille Strugarek EDF Research and Development Clamart France

This paper aims to present an original technique in order to compute the optimal policy of a Markov decision problem with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced in 1989 by Watkins for discrete Markov decision problems. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We state under mild assumptions a converge theorem for this algorithm. Finally, we illustrate our algorithm by solving two classical problems: the mountain car task and the puddle world task

关键词： State-space methods Kernel Costs dynamic programming Stochastic processes Recursive estimation Random variables learning Approximation algorithms Uncertainty

来源：评论

学校读者我要写书评

暂无评论

Editorial Special Issue on adaptive dynamic programming and reinforcement learning

引用

ieee Transactions on Systems, Man, and Cybernetics: Systems 2020年第11期50卷 3944-3947页

作者： Liu, Derong Lewis, Frank L. Wei, Qinglai School of Automation Guangdong University of Technology Guangzhou510006 China Uta Research Institute University of Texas at Arlington Fort WorthTX76118 United States State Key Laboratory of Management and Control for Complex Systems Istitute of Automation Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China

The past decade has witnessed a surge in research activities related to adaptive dynamic programming (ADP) and reinforcement learning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamic programming problems. In a 1995 article by Barto et al. [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamic programming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray et al. [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix].

关键词： Special issues and sections reinforcement learning learning systems Control systems dynamic programming Real-time systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Control of a Real Mobile Robot Using Approximate Policy Iteration

引用

6th International symposium on Neural Networks

作者： Zhang, Pengchen Xu, Xin Liu, Chunming Yuan, Qiping Natl Univ Def Technol Inst Automat Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783642015120

Machine learning for mobile robots has attracted lots of research interests in recent years. However, there are still many challenges to apply learning techniques in real mobile robots, e.g., generalization ill Continuous spaces, learning efficiency and convergence, etc. In this paper, a reinforcement learning path-following control strategy based oil approximate policy iteration (API) is developed for a real mobile robot. It has some advantages such as optimized control policies call be obtained without Much a Priori knowledge oil dynamic models of mobile robot, etc. Two kinds of API-based control method. i.e.. API with linear approximation and API with kernel machines, are implemented ill the path following control task and the efficiency of the proposed control strategy is illustrated in the experimental studies oil the real mobile robot based oil the Pioneer3-AT platform. Experimental results verify that the API-based learning, controller has better convergence and path following accuracy compared to conventional PD control methods. Finally, the learning control performance of the two API methods is also evaluated and compared.

关键词： Mobile robots Approximate policy iteration reinforcement learning Path following Approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning

A Theoretical Analysis of Cooperative Behavior in Multi-agen...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Ludo Waltman Uzay Kaymak Erasmus Erasmus University Rotterdam Rotterdam Netherlands

A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This paper provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner's dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisoner's dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results presented in this paper are quite robust to violations of the underlying assumptions

关键词： Helium Oligopoly Nash equilibrium dynamic programming learning Environmental economics Robustness Performance analysis Algorithm design and analysis Microeconomics

来源：评论

学校读者我要写书评

暂无评论

Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

Evaluation of Policy Gradient Methods and Variants on the Ca...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Martin Riedmiller Jan Peters Stefan Schaal NeuroInformatics Group University of Osnabrück Germany Computational Learning and Motor Control University of Southern California USA

In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, 'vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease

关键词： Gradient methods learning Finite difference methods Solids Legged locomotion Stochastic processes dynamic programming Motor drives Optimization methods Regulators

来源：评论

学校读者我要写书评

暂无评论

Efficient learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem

Efficient Learning in Cellular Simultaneous Recurrent Neural...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Roman Ilin Robert Kozma Paul J. Werbos Department of Mathematical Sciences University of Memphis Memphis TN USA National Science Foundation Arlington VA USA

Cellular simultaneous recurrent neural networks (SRN) show great promise in solving complex function approximation problems. In particular, approximate dynamic programming is an important application area where SRNs have significant potential advantages compared to other approximation methods. learning in SRNs, however, proved to be a notoriously difficult problem, which prevented their broader use. This paper introduces an extended Kalman filter approach to train SRNs. Using the two-dimensional maze navigation problem as a testbed, we illustrate the operation of the method and demonstrate its benefits in generalization and testing performance

关键词： Cellular networks Recurrent neural networks Motion planning Function approximation dynamic programming Electronic mail Testing Cost function Equations Feedforward systems

来源：评论

学校读者我要写书评

暂无评论

A Novel Fuzzy reinforcement learning Approach in Two-Level Intelligent Control of 3-DOF Robot Manipulators

A Novel Fuzzy Reinforcement Learning Approach in Two-Level I...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Nasser Sadati Mohammad Mollaie Emamzadeh Electrical Engineering Department Sharif University of Technology Tehran Tehran Iran Electrical Engineering Department Sharif University of Technology Tehran Iran

In this paper, a fuzzy coordination method based on interaction prediction principle (IPP) and reinforcement learning is presented for the optimal control of robot manipulators with three degrees-of-freedom. For this purpose, the robot manipulator is considered as a two-level large-scale system where in the first level, the robot manipulator is decomposed into several subsystems. In the second level, a fuzzy interaction prediction system is introduced for coordination of the overall system where a critic vector is also used for evaluating its performance. The simulation results on using the proposed novel approach, for optimal control of robot manipulators show its effectiveness and superiority in comparison with the centralized optimization methods

关键词： Fuzzy control learning Intelligent control Intelligent robots Manipulators Robot kinematics Optimal control Large-scale systems Fuzzy systems Optimization methods

来源：评论

学校读者我要写书评

暂无评论

Strategy Generation with Cognitive Distance in Two-Player Games

Strategy Generation with Cognitive Distance in Two-Player Ga...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kosuke Sekiyama Ricardo Carnieri Toshio Fukuda Department of Micro-Nano Systems Engineering University of Nagoya Nagoya Japan

In game theoretical approaches to multi-agent systems, a payoff matrix is often given a priori and used by agents in action selection. By contrast, in this paper we approach the problem of decision making by use of the concept of cognitive distance, which is a notion of the difficulty of an action perceived subjectively by the agent. As opposed to ordinary physical distance, cognitive distance depends on the situation and skills of the agent, ultimately representing the perceived difficulty in performing an action given the current state. The concept of cognitive distance is applied to a two-player game scenario, and it is shown how an agent can learn a model of its skills by estimating and observing the outcomes of its actions. This skill model is then used during play in a minimax search for the best actions

关键词： Game theory Uncertainty Decision making dynamic programming learning Systems engineering and theory Multiagent systems Minimax techniques Stochastic processes

来源：评论

学校读者我要写书评

暂无评论

Two Novel On-policy reinforcement learning Algorithms based on TD(λ)-methods

Two Novel On-policy Reinforcement Learning Algorithms based ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Hado van Hasselt Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms

关键词： learning automata Neural networks dynamic programming Intelligent systems State estimation Probability distribution Stochastic systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Alexandre Sztykgold Gilles Coppin Olivier Hudry GET/ENST-Bretagne LUSSI Department France GET/ENST Computer Science Department France

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to temporal differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues

关键词： Game theory dynamic programming learning Decision making Computer science Military computing Stochastic processes Shortest path problem Qualifications Graph theory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：