检索结果-内蒙古大学图书馆

Policy Iteration adaptive dynamic programming Algorithm for Discrete-Time Nonlinear Systems

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第3期25卷 621-634页

作者： Liu, Derong Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming discrete-time policy iteration neural networks neurodynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Novel Iterative θ-adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第4期11卷 1176-1190页

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new iterative theta-adaptive dynamic programming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative theta-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative theta-ADP algorithm can stabilize the nonlinear system which means that the iterative theta-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative theta-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming neural networks neuro-dynamic programming nonlinear systems optimal control policy iteration reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Optimal Self-learning Battery Control in Smart Residential Grids by Iterative Q-learning Algorithm

Optimal Self-Learning Battery Control in Smart Residential G...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Wei, Qinglai Liu, Derong Shi, Guang Liu, Yu Guan, Qiang Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100864 Peoples R China Chinese Acad Sci Inst Automat Beijing 100864 Peoples R China

ISBN: (纸本)9781479945528

In this paper, a novel dual iterative Q-learning algorithm is developed to solve the optimal battery management and control problems in smart residential environments. The main idea is to use adaptive dynamic programming (ADP) technique to obtain the optimal battery management and control scheme iteratively for residential energy systems. In the developed dual iterative Q-learning algorithm, two iterations, including external and internal iterations, are introduced, where internal iteration minimizes the total cost of power loads in each period and the external iteration makes the iterative Q function converge to the optimum. For the first time, the convergence property of iterative Q-learning method is proven to guarantee the convergence property of the iterative Q function. Finally, numerical results are given to illustrate the performance of the developed algorithm.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Tunable and Generic Problem Instance Generation for Multi-objective reinforcement learning

Tunable and Generic Problem Instance Generation for Multi-ob...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Garrett, Deon Bieger, Jordi Throisson, Kristinn R. Reykjavik Univ Iceland Inst Intelligent Machines Reykjavik Iceland Reykjavik Univ Reykjavik Iceland

ISBN: (纸本)9781479945528

A significant problem facing researchers in reinforcement learning, and particularly in multi-objective learning, is the dearth of good benchmarks. In this paper, we present a method and software tool enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. This tool, called Merlin (for Multi-objective Environments for reinforcement learning), provides the ability to control these features in predictable ways, thus allowing researchers to begin to build a more detailed understanding about what features of a problem interact with a given learning algorithm to improve or degrade the algorithm's performance. We present this method and tool, and briefly discuss the controls provided by the generator, its supported options, and their implications on the generated benchmark instances.

关键词： learning (artificial intelligence) software tools Merlin learning algorithm multiobjective environments for reinforcement learning multiobjective learning problem multiobjective reinforcement learning problem facing researcher random problem instance software tool structural property Benchmark testing Correlation Covariance matrices Generators Heuristic algorithms learning (artificial intelligence) Optimization Software Tools Neurofibromin 2 Benchmark testing Structural properties dynamos learning algorithms learning (artificial intelligence) Heuristic algorithms variance covariance matrix Covariance matrix

来源：评论

学校读者我要写书评

暂无评论

Using Approximate dynamic programming for Estimating the Revenues of a Hydrogen-based High-Capacity Storage Device

Using Approximate Dynamic Programming for Estimating the Rev...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Francois-Lavet, Vincent Fonteneau, Raphael Ernst, Damien Univ Liege Dept Elect Engn & Comp Sci B-4000 Liege Belgium

ISBN: (纸本)9781479945528

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the dynamic programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be advisable to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity.

关键词： dynamic programming electrolysis fuel cells hydrogen storage power markets Belgian market day-ahead electricity market dynamic programming principle high-capacity storage device hydrogen-based storage devices interseasonal price fluctuations maximum revenue estimation optimal revenue dynamic programming Electricity Electrochemical processes Fuel cells Hydrogen Hydrogen storage

来源：评论

学校读者我要写书评

暂无评论

adaptive Fault Identification for a Class of Nonlinear dynamic Systems

Adaptive Fault Identification for a Class of Nonlinear Dynam...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Wu, Li-Bing Ye, Dan Zhao, Xin-Gang Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China Univ Sci & Technol Liaoning Coll Sci Anshan 114051 Liaoning Peoples R China Chinese Acad Sci State Key Lab Robot Shenyang 110016 Liaoning Peoples R China Chinese Acad Sci Shenyang Inst Automat Shenyang 110016 Liaoning Peoples R China

ISBN: (纸本)9781479945528

This paper is concerned with the diagnosis problem of actuator faults for a class of nonlinear systems. It is assumed that the upper bound of the Lipschtiz constant of the nonlinearity in the faulty system is unknown. Then, a new nonlinear observer for fault diagnosis based on an adaptive estimator is proposed. Moreover, by making use of the designed adaptive observer with on-line update control law without sigma-modification condition to approximate the faulty system, it is proved that the estimate error of the adaptive control parameter, the output observation error and the error between the system fault and the corresponding estimate value are uniformly ultimately bounded via Lyapunov stability analysis. Finally, simulation examples are provided to illustrate the efficiency of the proposed fault identification approach.

关键词： Errors

来源：评论

学校读者我要写书评

暂无评论

Cognitive Control in Cognitive dynamic Systems: A New Way of Thinking Inspired by The Brain

Cognitive Control in Cognitive Dynamic Systems: A New Way of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Haykin, Simon Amiri, Ashkan Fatemi, Mehdi McMaster Univ Cognit Syst Lab Hamilton ON L8S 4K1 Canada

ISBN: (纸本)9781479945528

Briefly, main purpose of the paper is fourfold: a) Cognitive perception, which consists of two functional blocks: improved sparse-coding under the influence of perceptual attention for extracting relevant information from the observables and ignoring irrelevant information, followed by a Bayesian algorithm for state estimation. b) Entropic state of the perceptor, which provides feedback information to the controller. c) Cognitive control, which also consists of two functional blocks: executive learning algorithm computed by processing the entropic state, followed by predictive planning to set the stage for policy to act on the environment, thereby establishing the global perception-action cycle. d) Experimental results for exploiting the perceptual as well as executive attention in a co-operative manner, which is aimed at the first demonstration of risk control in the presence of a severe disturbance in the environment.

关键词： Cognition Cognitive dynamic Systems Cognitive perception Cognitive Control Perceptual attention Executive attention Predictive planning Pre-adaptation

来源：评论

学校读者我要写书评

暂无评论

Heuristics for Multiagent reinforcement learning in Decentralized Decision Problems

Heuristics for Multiagent Reinforcement Learning in Decentra...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Allen, Martin W. Hahn, David MacFarland, Douglas C. Univ Wisconsin Dept Comp Sci La Crosse WI 54601 USA

ISBN: (纸本)9781479945528

Decentralized partially observable Markov decision processes (Dec-POMDPs) model cooperative multiagent scenarios, providing a powerful general framework for team-based artificial intelligence. While optimal algorithms exist for Dec-POMDPs, theoretical and empirical results demonstrate that they are impractical for many problems of real interest. We examine the use of reinforcement learning (RL) as a means to generate adequate, if not optimal, joint policies for Dec-POMDPs. It is easily demonstrated (and expected) that single-agent RL produces results of little joint utility. We therefore investigate heuristic methods, based upon the dynamics of the Dec-POMDP formulation, that bias the learning process to produce coordinated action. Empirical tests on a benchmark problem show that these heuristics significantly enhance learning performance, even out-performing a hand-crafted heuristic in cases where the learning process converges quickly.

关键词： Markov processes learning (artificial intelligence) multi-agent systems Dec-POMDP model cooperative multiagent decentralized decision problem heuristic method multiagent reinforcement learning partially observable Markov decision process team-based artificial intelligence Benchmark testing Complexity theory Equations Heuristic algorithms Joints learning (artificial intelligence) Markov chain Benchmark testing Complexity theory Heuristic algorithms Heuristics Joints Joints Multi-agent systems learning (artificial intelligence) heuristic methods learning processes Decentralized

来源：评论

学校读者我要写书评

暂无评论

Accelerated Gradient Temporal Difference learning Algorithms

Accelerated Gradient Temporal Difference Learning Algorithms

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Meyer, Dominik Degenne, Remy Omrane, Ahmed Shen, Hao Tech Univ Munich Inst Data Proc D-80290 Munich Germany

ISBN: (纸本)9781479945528

In this paper we study Temporal Difference (TD) learning with linear value function approximation. The classic TD algorithm is known to be unstable with linear function approximation and off-policy learning. Recently developed Gradient TD (GTD) algorithms have addressed this problem successfully. Despite their prominent properties of good scalability and convergence to correct solutions, they inherit the potential weakness of slow convergence as they are a stochastic gradient descent algorithm. Accelerated stochastic gradient descent algorithms have been developed to speed up convergence, while still keeping computational complexity low. In this work, we develop an accelerated stochastic gradient descent method for minimizing the Mean Squared Projected Bellman Error (MSPBE), and derive a bound for the Lipschitz constant of the gradient of the MSPBE, which plays a critical role in our proposed accelerated GTD algorithms. Our comprehensive numerical experiments demonstrate promising performance in solving the policy evaluation problem, in comparison to the GTD algorithm family. In particular, accelerated TDC surpasses state-of-the-art algorithms.

关键词： learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning

Using supervised training signals of observable state dynami...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Elliott, Daniel L. Anderson, Charles Colorado State Univ Dept Comp Sci Ft Collins CO 80523 USA

ISBN: (纸本)9781479945528

A common complaint about reinforcement learning (RL) is that it is too slow to learn a value function which gives good performance. This issue is exacerbated in continuous state spaces. This paper presents a straight-forward approach to speeding-up and even improving RL solutions by reusing features learned during a pre-training phase prior to Q-learning. During pre-training, the agent is taught to predict state change given a state/action pair. The effect of pre-training is examined using the model-free Q-learning approach but could readily be applied to a number of RL approaches including model-based RL. The analysis of the results provides ample evidence that the features learned during pre-training is the reason behind the improved RL performance.

关键词： learning (artificial intelligence) neural nets state-space methods RL performance improvement RL solution improvement continuous state spaces feature reuse model-based RL approach model-free Q-learning approach observable state dynamics pretraining phase reinforcement learning state change prediction state-action pair supervised training signals value function learning Artificial neural networks Computational modeling Data models Heuristic algorithms learning (artificial intelligence) Supervised learning Training learning (artificial intelligence) State-space methods data models Neural network Semi-supervised learning Heuristic algorithms Computational modeling Artificial neural networks Functional training learning State Change Training

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：