检索结果-内蒙古大学图书馆

Model-Free Dual Heuristic dynamic programming

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2015年第8期26卷 1834-1839页

作者： Ni, Zhen He, Haibo Zhong, Xiangnan Prokhorov, Danil V. Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Toyota Tech Ctr Toyota Res Inst North Amer Ann Arbor MI 48105 USA

Model-based dual heuristic dynamic programming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting in extra computational cost. In this brief, we propose a model-free DHP (MF-DHP) design based on finite-difference technique. In particular, we adopt multilayer perceptron with one hidden layer for both the action and the critic networks design, and use delayed objective functions to train both the action and the critic networks online over time. We test both the MF-DHP and MB-DHP approaches with a discrete time example and a continuous time example under the same parameter settings. Our simulation results demonstrate that the MF-DHP approach can obtain a control performance competitive with that of the traditional MB-DHP approach while requiring less computational resources.

关键词： Action-dependent dual heuristic dynamic programming (DHP) adaptive critic designs (ACDs) adaptive dynamic programming (ADP) online learning reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Intelligent Control of Grid-Connected Microgrids: An adaptive Critic-Based Approach

引用

ieee JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS 2015年第2期3卷 493-504页

作者： Seidi, Sima Bakhshai, Alireza Queens Univ Queens Ctr Energy & Power Elect Res Kingston ON K7L 3N6 Canada Queens Univ Dept Elect & Comp Engn Kingston ON K7L 3N6 Canada

This paper presents an adaptive and intelligent power control approach for microgrid systems in the gridconnected operation mode. The proposed critic-based adaptive control system contains a neuro-fuzzy controller and a fuzzy critic agent. The fuzzy critic agent employs a reinforcement learning algorithm based on neuro-dynamic programming. The system feedback is made available to the critic agent's input as the controller's action in the previous state. The evaluation or reinforcement signal produced by the critic agent together with the back-propagation of error is then used for online tuning of the output layer weights of the neuro-fuzzy controller. The proposed controller shows superior results compared with the traditional PI control. The transient response time is significantly reduced, power oscillations are eliminated, and fast convergence is achieved. The simple design and improved dynamic behavior of the proposed controller make it a promising nominee for power control of microgrid systems.

关键词： Critic-based learning microgrids neuro-fuzzy control synchronous reference frame voltage sourced converters (VSCs)

来源：评论

学校读者我要写书评

暂无评论

adaptive learning solution of the nonzero-sum differential game with unknown dynamics using adaptive dynamic programming

Adaptive learning solution of the nonzero-sum differential g...

引用

第28届中国控制与决策会议

作者： Chunbin Qin Hongfei Sun Xianxing Liu Jiaqi Chen The School of Computer and Information Engineering Henan University The College of Environment and Planning Henan University The School of Software Henan University

ISBN: (纸本)9781467397155

In this paper,a novel partially model-free adaptive dynamic programming(ADP) algorithm is presented to solve online the nonzero-sum differential games of continuous-time linear systems with unknown drift ***,by using the integral reinforcement learning technique,the partially model-free ADP algorithm is developed to solve online the set of coupled algebraic Riccati equation(ARE) underlying the game problem without the requirement of the complete knowledge of the system *** then,the convergence of the partially model-free ADP algorithm is proved by demonstrating that it is mathematically equivalent to the extended Kleiman's algorithm,previously proposed in the literature,that solves in an offline sense the set of coupled algebraic Riccati equation using the complete knowledge of the system ***,one example is given to demonstrate the efficiency of the proposed algorithm.

关键词： Nonzero-sum differential game adaptive dynamic programming Unknown drift dynamics

来源：评论

学校读者我要写书评

暂无评论

GrDHP: A General Utility Function Representation for Dual Heuristic dynamic programming

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2015年第3期26卷 614-627页

作者： Ni, Zhen He, Haibo Zhao, Dongbin Xu, Xin Prokhorov, Danil V. Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China Toyota Res Inst NA Toyota Tech Ctr Ann Arbor MI 48105 USA

A general utility function representation is proposed to provide the required derivable and adjustable utility function for the dual heuristic dynamic programming (DHP) design. Goal representation DHP (GrDHP) is presented with a goal network being on top of the traditional DHP design. This goal network provides a general mapping between the system states and the derivatives of the utility function. With this proposed architecture, we can obtain the required derivatives of the utility function directly from the goal network. In addition, instead of a fixed predefined utility function in literature, we conduct an online learning process for the goal network so that the derivatives of the utility function can be adaptively tuned over time. We provide the control performance of both the proposed GrDHP and the traditional DHP approaches under the same environment and parameter settings. The statistical simulation results and the snapshot of the system variables are presented to demonstrate the improved learning and controlling performance. We also apply both approaches to a power system example to further demonstrate the control capabilities of the GrDHP approach.

关键词： adaptive control adaptive dynamic programming (ADP) dual heuristic dynamic programming (DHP) general utility function goal representation reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

reinforcement-learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints

引用

ieee TRANSACTIONS ON CYBERNETICS 2015年第7期45卷 1372-1385页

作者： Liu, Derong Yang, Xiong Wang, Ding Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.

关键词： Approximate dynamic programming (ADP) neural networks (NNs) neuro-dynamic programming nonlinear systems optimal control reinforcement learning (RL) robust control

来源：评论

学校读者我要写书评

暂无评论

ADP with MCTS algorithm for Gomoku

ADP with MCTS algorithm for Gomoku

引用

ieee symposium Series on Computational Intelligence (SSCI)

作者： Zhentao Tang Dongbin Zhao Kun Shao Le L.V. The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences Beijing China Institute of Automation Chinese Academy of Sciences Beijing Beijing CN

ISBN: (纸本)9781509042418

Inspired by the core idea of AlphaGo, we combine a neural network, which is trained by adaptive dynamic programming (ADP), with Monte Carlo Tree Search (MCTS) algorithm for Gomoku. MCTS algorithm is based on Monte Carlo simulation method, which goes through lots of simulations and generates a game search tree. We rollout it and search the outcomes of the leaf nodes in the tree. As a result, we obtain the MCTS winning rate. The ADP and MCTS methods are used to estimate the winning rates respectively. We weight the two winning rates to select the action position with the maximum one. Experiment result shows that this method can effectively eliminate the neural network evaluation function's “short-sighted” defect. With our proposed method, the game's final prediction result is more accurate, and it outperforms the Gomoku with ADP algorithm.

关键词： Games Monte Carlo methods Biological neural networks learning (artificial intelligence) Heuristic algorithms dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Boundedness Theoretical Analysis for GrADP Design: A Case Study on Maze Navigation

A Boundedness Theoretical Analysis for GrADP Design: A Case ...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Ni, Zhen Zhong, Xiangnan He, Haibo Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

ISBN: (纸本)9781479919598

A new theoretical analysis towards the goal representation adaptive dynamic programming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptive dynamic programming (ADP) in literature, here we provide a new insight for the error bound between the estimated value function and the expected value function. Then we employ the critic network in GrADP approach to approximate the Q value function, and use the action network to provide the control policy. The goal network is adopted to provide the internal reinforcement signal for the critic network over time. Finally, we illustrate that the estimated Q value function is close to the expected value function in an arbitrary small bound on the maze navigation example.

关键词： adaptive dynamic programming (ADP) goal representation ADP (GrADP) reinforcement learning theoretical analysis maze navigation

来源：评论

学校读者我要写书评

暂无评论

A New Discrete-Time Iterative adaptive dynamic programming Algorithm Based on Q-learning 12th

引用

12th International symposium on Neural Networks (ISNN)

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

ISBN: (纸本)9783319253930;9783319253923

In this paper, a novel Q-learning based policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming Q-learning policy iteration neural networks nonlinear systems optimal control

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning approach for cost- and energy-aware mobile data offloading

A reinforcement learning approach for cost- and energy-aware...

引用

Asia-Pacific Network Operations and Management symposium (APNOMS)

作者： Cheng Zhang Bo Gu Zhi Liu Kyoko Yamori Yoshiaki Tanaka Department of Computer Science and Communications Engineering Waseda University Tokyo Japan Department of Information and Communications Engineering Kogakuin University Tokyo Japan Global Information and Telecommunication Institute Waseda University Tokyo Japan Department of Management Information Asahi University Mizuho-shi Japan Department of Communications and Computer Engineering Waseda University Tokyo Japan

ISBN: (纸本)9781467390576

With rapid increases in demand for mobile data, mobile network operators are trying to expand wireless network capacity by deploying WiFi hotspots to offload their mobile traffic. However, these network-centric methods usually do not fulfill interests of mobile users (MUs). MUs consider many problems to decide whether to offload their traffic to a complementary WiFi network. In this paper, we study the WiFi offloading problem from MU's perspective by considering delay-tolerance of traffic, monetary cost, energy consumption as well as the availability of MU's mobility pattern. We first formulate the WiFi offloading problem as a finite-horizon discrete-time Markov decision process (FDTMDP) with known MU's mobility pattern and propose a dynamic programming based offloading algorithm. Since MU's mobility pattern may not be known in advance, we then propose a reinforcement learning based offloading algorithm, which can work well with unknown MU's mobility pattern. Extensive simulations are conducted to validate our proposed offloading algorithms.

关键词： ieee 802.11 Standard Energy consumption Mobile computing Heuristic algorithms Cellular networks Markov processes

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming Boundary Control of Uncertain Coupled Semi-Linear Parabolic PDE

Adaptive Dynamic Programming Boundary Control of Uncertain C...

引用

ieee International symposium on Intelligent Control (ISIC)

作者： Talaei, B. Jagannathan, S. Singler, J. Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA Univ Sci & Technol Dept Math & Stat Rolla MO 65409 USA

ISBN: (纸本)9781479977888

This paper develops an adaptive dynamic programming (ADP) based near optimal boundary control of distributed parameter systems (DPS) governed by uncertain coupled semi-linear parabolic partial differential equations (PDE) under Neumann boundary control condition. First, Hamilton-Jacobi-Bellman (HJB) equation is formulated without any model reduction and the optimal control policy is derived. Subsequently, a novel identifier is developed to estimate the unknown nonlinearity in PDE dynamics. Accordingly, the sub-optimal control policy is obtained by forward-in-time estimation of the value functional using a neural network (NN) online approximator and the identifier. adaptive tuning laws are proposed for learning the value functional online. Local ultimate boundedness (UB) of the closed-loop system is verified by using Lyapunov theory. The performance of proposed controller is verified via simulation on an unstable coupled diffusion reaction process.

关键词： Closed loop systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：