检索结果-内蒙古大学图书馆

8th International symposium on Neural Networks

作者： Wang, Ding Liu, Derong Chinese Acad Sci Inst Automat Beijing 100190 Peoples R China

ISBN: (纸本)9783642210891

Using the neural-network-based iterative adaptive dynamic programming (ADP) algorithm, an optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is proposed in this paper. The optimal controller is designed with convergence analysis in terms of cost function and control law. In order to implement the algorithm via globalized dual heuristic programming (CDHP) technique, a neural network is constructed first to identify the unknown nonlinear system, and then two other neural networks are used to approximate the cost function and the control law, respectively. An example is provided to verify the effectiveness of the present approach.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming intelligent control neural networks optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive Dual Heuristic programming Based on Delta-Bar-Delta learning Rule

Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta...

引用

8th International symposium on Neural Networks

作者： Wu, Jun Xu, Xin Lian, Chuanqiang Huang, Yan Natl Univ Def Technol Coll Mechatron & Automat Inst Automat Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783642211102

Dual Heuristic programming (DHP) is a class of approximate dynamic programming methods using neural networks. Although there have been some successful applications of DHP, its performance and convergence are greatly influenced by the design of the step sizes in the critic module as well as the actor module. In this paper, a Delta-Bar-Delta learning rule is proposed for the DHP algorithm, which helps the two modules adjust learning rate individually and adaptively. Finally, the feasibility and effectiveness of the proposed method are illustrated in the learning control task of an inverted pendulum.

关键词： reinforcement learning adaptive critic design dual heuristic programming Delta-Bar-Delta neural networks

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning with adaptive Kanerva Coding for Xpilot Game AI

Reinforcement Learning with Adaptive Kanerva Coding for Xpil...

引用

ieee Congress on Evolutionary Computation (CEC)

作者： Allen, Martin Fritzsche, Phil Univ Wisconsin Dept Comp Sci La Crosse WI 54601 USA Coll New London Comp Sci Dept Connecticut New London CT USA

ISBN: (纸本)9781424478354

The Xpilot-AI video game platform allows the creation of artificially intelligent and autonomous control agents. At the same time, the Xpilot environment is highly complex, with very many state variables and action choices. Basic reinforcement learning (RL) techniques are somewhat limited in their application when dealing with such large state-and action-spaces, since the repetition of exposure that is key to their value updates can proceed very slowly. To solve this problem, state-abstractions are often generated, allowing learning to move more quickly, but often requiring the programmer to hand-craft state representations, reward functions, and action choices in an ad hoc manner. We apply an automated technique for generating useful abstractions for learning, adaptive Kanerva coding. This method employs a small sub-set of the original states as a proxy for the full environment, updating values over the abstract representative prototype states in a manner analogous to Q-learning. Over time, the set of prototypes is adjusted to provide more effective coverage and abstraction, again automatically. Our results show that this technique allows a simple learning agent to double its survival time when navigating the Xpilot environment, using only a small fraction of the full state-space as a stand-in and greatly increasing the potential for more rapid learning.

关键词： Autonomous agents reinforcement learning dynamic programming real time systems

来源：评论

学校读者我要写书评

暂无评论

A new approach for power management in sensor node based on reinforcement learning

A new approach for power management in sensor node based on ...

引用

International symposium on Computer Networks and Distributed Systems

作者： Kianpisheh, Somayeh Charkari, Nasrolah Moghadam Faculty of Electrical and Computer Engineering Tarbiat Modares University Tehran Iran

ISBN: (纸本)9781424491544

Wireless sensor networks are composed of small nodes with limited battery life and computational ability. Energy reduction in these networks is an important issue to extend network lifetime. dynamic power management is a technique to conserve energy. DPM [1] uses dynamic programming to manage power in sensor nodes. This approach is model based and exploiting it in a multi hop scenario is difficult. In this paper, we propose RLPM which is based on reinforcement learning. It is model free and easily applicable in both single hop and multi hop scenario. Experiments show that RLPM behaves similar to DPM while it does not have those constraints of DPM. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Approximate Policy Iteration with Binary-Tree State Space Decomposition

引用

ieee TRANSACTIONS ON NEURAL NETWORKS 2011年第12期22卷 1863-1877页

作者： Xu, Xin Liu, Chunming Yang, Simon X. Hu, Dewen Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China Univ Guelph Sch Engn Guelph ON N1G 2W1 Canada

In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), e. g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large or continuous state spaces. To address this problem, this paper presents a hierarchical API (HAPI) method with binary-tree state space decomposition for RL in a class of absorbing MDPs, which can be formulated as time-optimal learning control tasks. In the proposed method, after collecting samples adaptively in the state space of the original MDP, a learning-based decomposition strategy of sample sets was designed to implement the binary-tree state space decomposition process. Then, API algorithms were used on the sample subsets to approximate local optimal policies of sub-MDPs. The original MDP was decomposed into a binary-tree structure of absorbing sub-MDPs, constructed during the learning process, thus, local near-optimal policies were approximated by API algorithms with reduced complexity and higher precision. Furthermore, because of the improved quality of local policies, the combined global policy performed better than the near-optimal policy obtained by a single API algorithm in the original MDP. Three learning control problems, including path-tracking control of a real mobile robot, were studied to evaluate the performance of the HAPI method. With the same setting for basis function selection and sample collection, the proposed HAPI obtained better near-optimal policies than previous API methods such as LSPI and KLSPI.

关键词： adaptive dynamic programming approximate policy iteration binary-tree hierarchical reinforcement learning Markov decision processes time-optimal control

来源：评论

学校读者我要写书评

暂无评论

Bayesian active learning with basis functions

Bayesian active learning with basis functions

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Ilya O. Ryzhov Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton NJ USA

A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the exploration/exploitation dilemma: an inaccurate approximation may lead to poor decisions, making it necessary to sometimes explore actions that appear to be suboptimal. We propose a Bayesian strategy for active learning with basis functions, based on the knowledge gradient concept from the optimal learning literature. The new method performs well in numerical experiments conducted on an energy storage problem.

关键词： Tin Function approximation Bayesian methods Covariance matrix Mathematical model dynamic programming

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming with balanced weights seeking strategy

Adaptive dynamic programming with balanced weights seeking s...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jian Fu Haibo He Zhen Ni School of Automation Wuhan University of Technology Wuhan Hubei China Department of Electrical Computer and Biomedical Engineering University of Rhode Island Kingston RI USA

In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptive dynamic programming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consider a balanced weight updating strategy with the consideration of both robustness and convergence during the online learning process. Specifically, a modified recursive Levenberg-Marquardt (LM) method is integrated into both the action network and critic network of the ADP design, and a detailed learning algorithm is proposed to implement this approach. We test the performance of our approach based on the triple link inverted pendulum, a popular benchmark in the community, to demonstrate online learning and control strategy. Experimental results and comparative study under different noise conditions demonstrate the effectiveness of this approach.

关键词： Artificial neural networks Equations Jacobian matrices Convergence Algorithm design and analysis Damping Robustness

来源：评论

学校读者我要写书评

暂无评论

Feedback controller parameterizations for reinforcement learning

Feedback controller parameterizations for Reinforcement Lear...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： John W. Roberts Ian R. Manchester Russ Tedrake MIT CSAIL Cambridge MA USA

reinforcement learning offers a very general framework for learning controllers, but its effectiveness is closely tied to the controller parameterization used. Especially when learning feedback controllers for weakly stable systems, ineffective parameterizations can result in unstable controllers and poor performance both in terms of learning convergence and in the cost of the resulting policy. In this paper we explore four linear controller parameterizations in the context of REINFORCE, applying them to the control of a reaching task with a linearized flexible manipulator. We find that some natural but naive parameterizations perform very poorly, while the Youla Parameterization (a popular parameterization from the controls literature) offers a number of robustness and performance advantages.

关键词： adaptive control Torque Observers Trajectory Transfer functions Linear systems Kalman filters

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems

Adaptive dynamic programming for optimal control of unknown ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Derong Liu Ding Wang Dongbin Zhao Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China

An intelligent optimal control scheme for unknown nonlinear discrete-time systems with discount factor in the cost function is proposed in this paper. An iterative adaptive dynamic programming (ADP) algorithm via globalized dual heuristic programming (GDHP) technique is developed to obtain the optimal controller with convergence analysis. Three neural networks are used as parametric structures to facilitate the implementation of the iterative algorithm, which will approximate at each iteration the cost function, the optimal control law, and the unknown nonlinear system, respectively. Two simulation examples are provided to verify the effectiveness of the presented optimal control approach.

关键词： Artificial neural networks Riccati equations Integrated optics Optimal control Neurons

来源：评论

学校读者我要写书评

暂无评论

Supervised adaptive dynamic programming based adaptive cruise control

Supervised adaptive dynamic programming based adaptive cruis...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Dongbin Zhao Zhaohui Hu Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China

This paper proposes a supervised adaptive dynamic programming (SADP) algorithm for the full range adaptive cruise control (ACC) system. The full range ACC system considers both the ACC situation in highway system and the stop and go (SG) situation in urban street way system. It can autonomously drive the host vehicle with desired speed and distance to the preceding vehicle in both situations. A traditional adaptive dynamic programming (ADP) algorithm is suited for this problem, but it suffers from the low learning efficiency. We propose the concept of inducing range to construct the supervisor and finally formulate the SADP algorithm, which greatly speeds up the learning efficiency. Several driving scenarios are designed and tested with the trained controller compared to traditional ones by simulation results, showing that trained SADP performs very well in all the scenarios, so that it provides an effective approach for the full range ACC problem.

关键词： Vehicles Acceleration Driver circuits Training Control systems Artificial neural networks Real time systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：