检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xiaofeng Lin Qiang Ding Weikai Kong Chunning Song Qingbao Huang School of Electrical Engineering Guangxi University Nanning China

ISBN: (纸本)9781479945511

For the optimal tracking control problem of affine nonlinear systems, a general value iteration algorithm based on adaptive dynamic programming is proposed in this paper. By system transformation, the optimal tracking problem is converted into the optimal regulating problem for the tracking error dynamics. Then, general value iteration algorithm is developed to obtain the optimal control with convergence analysis. Considering the advantages of echo state network, we use three echo state networks with levenberg-Marquardt (LM) adjusting algorithm to approximate the system, the cost function and the control law. A simulation example is given to demonstrate the effectiveness of the presented scheme.

关键词： Cost function Nonlinear systems Optimal control Trajectory dynamic programming Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

Full-range adaptive cruise control based on supervised adaptive dynamic programming

引用

NEUROCOMPUTING 2014年 125卷 57-67页

作者： Zhao, Dongbin Hu, Zhaohui Xia, Zhongpu Alippi, Cesare Zhu, Yuanheng Wang, Ding Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Guangdong Power Grid Corp Elect Power Res Inst Guangzhou 510080 Guangdong Peoples R China Politecn Milan Dipartimento Elettron & Informaz I-20133 Milan Italy

The paper proposes a supervised adaptive dynamic programming (SADP) algorithm for a full-range adaptive cruise control (ACC) system, which can be formulated as a dynamic programming problem with stochastic demands. The suggested ACC system has been designed to allow the host vehicle to drive both in highways and in Stop and Go (SG) urban scenarios. The ACC system can autonomously drive the host vehicle to a desired speed and/or a given distance from the target vehicle in both operational cases. Traditional adaptive dynamic programming (ADP) is a suitable tool to address the problem but training usually suffers from low convergence rates and hardly achieves an effective controller. A SADP algorithm which introduces the concept of inducing region is here introduced to overcome such training drawbacks. The SADP algorithm performs very well in all simulation scenarios and always better than more traditional controllers. The conclusion is that the proposed SADP algorithm is an effective control methodology able to effectively address the full-range ACC problem. (C) 2013 Elsevier B.V. All rights reserved.

关键词： adaptive dynamic programming Supervised reinforcement learning Neural networks adaptive cruise control Stop and go

来源：评论

学校读者我要写书评

暂无评论

A data-based online reinforcement learning algorithm with high-efficient exploration

A data-based online reinforcement learning algorithm with hi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yuanheng Zhu Dongbin Zhao The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781479945511

An online reinforcement learning algorithm is proposed in this paper to directly utilizes online data efficiently for continuous deterministic systems without system parameters. The dependence on some specific approximation structures is crucial to limit the wide application of online reinforcement learning algorithms. We utilize the online data directly with the kd-tree technique to remove this limitation. Moreover, we design the algorithm in the Probably Approximately Correct principle. Two examples are simulated to verify its good performance.

关键词： Approximation algorithms learning (artificial intelligence) Approximation methods Optimal control Upper bound Partitioning algorithms DC motors

来源：评论

学校读者我要写书评

暂无评论

ADP-based optimal control for a class of nonlinear discrete-time systems with inequality constraints

ADP-based optimal control for a class of nonlinear discrete-...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yanhong Luo Geyang Xiao College of Information Science and Engineering Northeastern University

ISBN: (纸本)9781479945511

In this paper, the adaptive dynamic programming (ADP) approach is utilized to design a neural-network-based optimal controller for a class of nonlinear discrete-time (DT) systems with inequality constraints. To begin with, the initial constrained optimal control problem is transformed into an infinite horizon optimal control problem by introducing the penalty function. Then, the iterative ADP algorithm is developed to handle the nonlinear optimal control problem with two neural networks. The two neural networks are aimed at generating the optimal cost and the optimal control policy respectively. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

关键词： Optimal control Biological neural networks Nonlinear systems dynamic programming Cost function

来源：评论

学校读者我要写书评

暂无评论

Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning

Using supervised training signals of observable state dynami...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Daniel L Elliott Charles Anderson Dept of Computer Science Colorado State University

A common complaint about reinforcement learning (RL) is that it is too slow to learn a value function which gives good performance. This issue is exacerbated in continuous state spaces. This paper presents a straight-forward approach to speeding-up and even improving RL solutions by reusing features learned during a pre-training phase prior to Q-learning. During pre-training, the agent is taught to predict state change given a state/action pair. The effect of pre-training is examined using the model-free Q-learning approach but could readily be applied to a number of RL approaches including model-based RL. The analysis of the results provides ample evidence that the features learned during pre-training is the reason behind the improved RL performance.

关键词： Artificial neural networks Data models Training learning (artificial intelligence) Heuristic algorithms Supervised learning Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Tunable and generic problem instance generation for multi-objective reinforcement learning

Tunable and generic problem instance generation for multi-ob...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Deon Garrett Jordi Bieger Kristinn R. Thórisson Icelandic Institute for Intelligent Machines Reykjavík University Iceland

A significant problem facing researchers in reinforcement learning, and particularly in multi-objective learning, is the dearth of good benchmarks. In this paper, we present a method and software tool enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. This tool, called Merlin (for Multi-objective Environments for reinforcement learning), provides the ability to control these features in predictable ways, thus allowing researchers to begin to build a more detailed understanding about what features of a problem interact with a given learning algorithm to improve or degrade the algorithm's performance. We present this method and tool, and briefly discuss the controls provided by the generator, its supported options, and their implications on the generated benchmark instances.

关键词： learning (artificial intelligence) Correlation Generators Covariance matrices Benchmark testing Heuristic algorithms Optimization

来源：评论

学校读者我要写书评

暂无评论

Continuous-time differential dynamic programming with terminal constraints

Continuous-time differential dynamic programming with termin...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Wei Sun Evangelos A. Theodorou Panagiotis Tsiotras Mobile and Internet Systems Laboratory University College Cork Ireland

In this work, we revisit the continuous-time Differential dynamic programming (DDP) approach for solving optimal control problems with terminal state constraints. We derive two algorithms, each for different order of expansion of the system dynamics and we investigate their performance in terms of their convergence speed. Compared to previous work, we provide a set of backward differential equations for the value function expansion by relaxing the assumption that the initial nominal control must be very close to the optimal control solution. We apply the derived algorithms to two classical optimal control problems, namely, the inverted pendulum and the Dreyfus rocket problem and show the benefit of second order expansion.

关键词： Optimal control Heuristic algorithms Differential equations Equations Convergence Rockets Trajectory

来源：评论

学校读者我要写书评

暂无评论

Using approximate dynamic programming for estimating the revenues of a hydrogen-based high-capacity storage device

Using approximate dynamic programming for estimating the rev...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Vincent François-Lavet Raphael Fonteneau Damien Ernst Department of Electrical Engineering and Computer Science University of Liège Belgium

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the dynamic programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be advisable to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity.

关键词： Electricity Hydrogen Fuel cells Electrochemical processes Hydrogen storage dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Neural-network-based adaptive dynamic surface control for MIMO systems with unknown hysteresis

Neural-network-based adaptive dynamic surface control for MI...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Lei Liu Zhanshan Wang Zhengwei Shen College of Information Science and Engineering Northeastern University Shenyang Liaoning China

ISBN: (纸本)9781479945511

This paper focuses on the composite adaptive tracking control for a class of nonlinear multiple-input-multiple-output (MIMO) systems with unknown backlash-like hysteresis nonlinearities. A dynamic surface control method is incorporated into the proposed control strategy to eliminate the problem of explosion of complexity. Compared with some existing methods, the prediction error between system state and serial-parallel estimation model is combined with compensated tracking error to construct the adaptive laws for neural network (NN) weights. It is shown that the proposed control approach can guarantee that all the signals of the resulting closed-loop systems are semi-globally uniformly ultimately bounded and the tracking error converges to a small neighborhood. Finally, simulation results are provided to confirm the effectiveness of the proposed approaches.

关键词： Hysteresis Approximation methods adaptive systems MIMO Educational institutions Nonlinear systems Vectors

来源：评论

学校读者我要写书评

暂无评论

Model-based multi-objective reinforcement learning

Model-based multi-objective reinforcement learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Maikel Withagen Mădălina M Drugan Institute of Artificial Intelligence University of Groningen The Netherlands Artificial Intelligence Lab Vrije Universiteit Brussel Belgium

This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multi-objective dynamic programming method to compute Pareto optimal policies. The advantage of this model-based multi-objective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic programming method will compute all Pareto optimal policies. Therefore it is important that the agent explores the environment in an intelligent way by using a good exploration strategy. In this paper we have supplied the agent with two different exploration strategies and compare their effectiveness in estimating accurate models within a reasonable amount of time. The experimental results show that our method with the best exploration strategy is able to quickly learn all Pareto optimal policies for the Deep Sea Treasure problem.

关键词： Computational modeling Pareto optimization learning (artificial intelligence) Heuristic algorithms dynamic programming Vectors Markov processes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：