检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Deon Garrett Jordi Bieger Kristinn R. Thórisson Icelandic Institute for Intelligent Machines Reykjavík University Iceland

A significant problem facing researchers in reinforcement learning, and particularly in multi-objective learning, is the dearth of good benchmarks. In this paper, we present a method and software tool enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. This tool, called Merlin (for Multi-objective Environments for reinforcement learning), provides the ability to control these features in predictable ways, thus allowing researchers to begin to build a more detailed understanding about what features of a problem interact with a given learning algorithm to improve or degrade the algorithm's performance. We present this method and tool, and briefly discuss the controls provided by the generator, its supported options, and their implications on the generated benchmark instances.

关键词： learning (artificial intelligence) Correlation Generators Covariance matrices Benchmark testing Heuristic algorithms Optimization

来源：评论

学校读者我要写书评

暂无评论

Using approximate dynamic programming for estimating the revenues of a hydrogen-based high-capacity storage device

Using approximate dynamic programming for estimating the rev...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Vincent François-Lavet Raphael Fonteneau Damien Ernst Department of Electrical Engineering and Computer Science University of Liège Belgium

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the dynamic programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be advisable to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity.

关键词： Electricity Hydrogen Fuel cells Electrochemical processes Hydrogen storage dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Neural-network-based adaptive dynamic surface control for MIMO systems with unknown hysteresis

Neural-network-based adaptive dynamic surface control for MI...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Lei Liu Zhanshan Wang Zhengwei Shen College of Information Science and Engineering Northeastern University Shenyang Liaoning China

ISBN: (纸本)9781479945511

This paper focuses on the composite adaptive tracking control for a class of nonlinear multiple-input-multiple-output (MIMO) systems with unknown backlash-like hysteresis nonlinearities. A dynamic surface control method is incorporated into the proposed control strategy to eliminate the problem of explosion of complexity. Compared with some existing methods, the prediction error between system state and serial-parallel estimation model is combined with compensated tracking error to construct the adaptive laws for neural network (NN) weights. It is shown that the proposed control approach can guarantee that all the signals of the resulting closed-loop systems are semi-globally uniformly ultimately bounded and the tracking error converges to a small neighborhood. Finally, simulation results are provided to confirm the effectiveness of the proposed approaches.

关键词： Hysteresis Approximation methods adaptive systems MIMO Educational institutions Nonlinear systems Vectors

来源：评论

学校读者我要写书评

暂无评论

Model-based multi-objective reinforcement learning

Model-based multi-objective reinforcement learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Maikel Withagen Mădălina M Drugan Institute of Artificial Intelligence University of Groningen The Netherlands Artificial Intelligence Lab Vrije Universiteit Brussel Belgium

This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multi-objective dynamic programming method to compute Pareto optimal policies. The advantage of this model-based multi-objective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic programming method will compute all Pareto optimal policies. Therefore it is important that the agent explores the environment in an intelligent way by using a good exploration strategy. In this paper we have supplied the agent with two different exploration strategies and compare their effectiveness in estimating accurate models within a reasonable amount of time. The experimental results show that our method with the best exploration strategy is able to quickly learn all Pareto optimal policies for the Deep Sea Treasure problem.

关键词： Computational modeling Pareto optimization learning (artificial intelligence) Heuristic algorithms dynamic programming Vectors Markov processes

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for discrete-time LQR optimal tracking control problems with unknown dynamics

Adaptive dynamic programming for discrete-time LQR optimal t...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yang Liu Yanhong Luo Huaguang Zhang School of Information Science and Engineering Northeastern University Shenyang Liaoning China

ISBN: (纸本)9781479945511

In this paper, an optimal tracking control approach based on adaptive dynamic programming (ADP) algorithm is proposed to solve the linear quadratic regulation (LQR) problems for unknown discrete-time systems in an online fashion. First, we convert the optimal tracking problem into designing infinite-horizon optimal regulator for the tracking error dynamics based on the system transformation. Then we expand the error state equation by the history data of control and state. The iterative ADP algorithm of policy iteration (PI) and value iteration (VI) are introduced to solve the value function of the controlled system. It is shown that the proposed ADP algorithm solves the LQR without requiring any knowledge of the system dynamics. The simulation results show the convergence and effectiveness of the proposed control scheme.

关键词： Heuristic algorithms Trajectory dynamic programming Equations Algorithm design and analysis History Optimal control

来源：评论

学校读者我要写书评

暂无评论

Integral reinforcement learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown dynamics

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第3期11卷 706-714页

作者： Li, Hongliang Liu, Derong Wang, Ding Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics. This algorithm is a fully model-free method solving the game algebraic Riccati equation forward in time. The developed algorithm updates value function, control and disturbance policies simultaneously. The convergence of the algorithm is demonstrated to be equivalent to Newton's method. To implement this algorithm, one critic network and two action networks are used to approximate the game value function, control and disturbance policies, respectively, and the least squares method is used to estimate the unknown parameters. The effectiveness of the developed scheme is demonstrated in the simulation by designing an H-infinity state feedback controller for a power system. Note to Practitioners-Noncooperative zero-sum differential game provides an ideal tool to study multiplayer optimal decision and control problems. Existing approaches usually solve the Nash equilibrium solution by means of offline iterative computation, and require the exact knowledge of the system dynamics. However, it is difficult to obtain the exact knowledge of the system dynamics for many real-world industrial systems. The algorithm developed in this paper is a fully model-free method which solves the zero-sum differential game problem forward in time by making use of online measured data. This method is not affected by errors between an identification model and a real system, and responds fast to changes of the system dynamics. Exploration signals are required to satisfy the persistence of excitation condition to update the value function and the policies, and these signals do not affect the convergence of the learning process. The least squares method is used to obtain the approximate solution for the zero-sum games with unknown dynamics. The developed a

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming reinforcement learning policy iteration zero-sum games

来源：评论

学校读者我要写书评

暂无评论

Continuous-time differential dynamic programming with terminal constraints

Continuous-time differential dynamic programming with termin...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Wei Sun Evangelos A. Theodorou Panagiotis Tsiotras Mobile and Internet Systems Laboratory University College Cork Ireland

In this work, we revisit the continuous-time Differential dynamic programming (DDP) approach for solving optimal control problems with terminal state constraints. We derive two algorithms, each for different order of expansion of the system dynamics and we investigate their performance in terms of their convergence speed. Compared to previous work, we provide a set of backward differential equations for the value function expansion by relaxing the assumption that the initial nominal control must be very close to the optimal control solution. We apply the derived algorithms to two classical optimal control problems, namely, the inverted pendulum and the Dreyfus rocket problem and show the benefit of second order expansion.

关键词： Optimal control Heuristic algorithms Differential equations Equations Convergence Rockets Trajectory

来源：评论

学校读者我要写书评

暂无评论

Convergent reinforcement learning control with neural networks and continuous action search

Convergent reinforcement learning control with neural networ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Minwoo Lee Charles W. Anderson Department of Computer Science Colorado State University Fort Collins CO USA

We combine a convergent TD-learning method and direct continuous action search with neural networks for function approximation to obtain both stability and generalization over inexperienced state-action pairs. We extend linear Greedy-GQ to nonlinear neural networks for convergent learning. Direct continuous action search with back-propagation leads to efficient high-precision control. A high dimensional continuous state and action problem, octopus arm control, is examined to test the proposed algorithm. Comparing TD, linear Greedy-GQ, and nonlinear Greedy-GQ, we discuss how the correction term contributes to learning with nonlinear Greedy-GQ algorithm and how continuous action search contributes to learning speed and stability.

关键词： Function approximation Neural networks Approximation algorithms Vectors learning (artificial intelligence) Legged locomotion

来源：评论

学校读者我要写书评

暂无评论

Data-driven partially observable dynamic processes using adaptive dynamic programming

Data-driven partially observable dynamic processes using ada...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xiangnan Zhong Zhen Ni Yufei Tang Haibo He Department of Electrical University of Rhode Island Kingston RI USA

ISBN: (纸本)9781479945511

adaptive dynamic programming (ADP) has been widely recognized as one of the “core methodologies” to achieve optimal control for intelligent systems in Markov decision process (MDP). Generally, ADP control design requires all the information of the system dynamics. However, in many practical situations, the measured input and output data can only represent part of the system states. This means the complete information of the system cannot be available in many real-world cases, which narrows the range of application of the ADP design. In this paper, we propose a data-driven ADP method to stabilize the system with partially observable dynamics based on neural network techniques. A state network is integrated into the typical actor-critic architecture to provide an estimated state from the measured input/output sequences. The theoretical analysis and the stability discussion of this data-driven ADP method are also provided. Two examples are studied to verify our proposed method.

关键词： dynamic programming Performance analysis Neural networks Optimal control Stability analysis Equations Markov processes

来源：评论

学校读者我要写书评

暂无评论

Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm

引用

NEUROCOMPUTING 2014年 125卷 46-56页

作者： Huang, Yuzhu Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, an optimal tracking control scheme is proposed for a class of unknown discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) algorithm. First, in order to obtain the dynamics of the system, an identifier is constructed by a three-layer feedforward neural network (NN). Second, a feedforward neuro-controller is designed to get the desired control input of the system. Third, via system transformation, the original tracking problem is transformed into a regulation problem with respect to the state tracking error. Then, the iterative ADP algorithm based on heuristic dynamic programming is introduced to deal with the regulation problem with convergence analysis. In this scheme, feedforward NNs are used as parametric structures for facilitating the implementation of the iterative algorithm. Finally, simulation results are also presented to demonstrate the effectiveness of the proposed scheme. (C) 2013 Elsevier B.V. All rights reserved.

关键词： adaptive dynamic programming Convergence analysis Heuristic dynamic programming Neural networks Optimal tracking control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：