检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Sahoo, Avimanyu Xu, Hao Jagannathan, S. Missouri Univ Sc & Tech Dept Elect & Comp Engn Rolla MO 65409 USA Texas A&M Univ Coll Sci & Engn Dept Elect Engn Corpus Christi TX USA

ISBN: (纸本)9781479945528

This paper presents a novel stochastic event-based near optimal control strategy to regulate a networked control system (NCS) represented as an uncertain nonlinear continuous time system. An online stochastic actor-critic neural network (NN) based approach is utilized to achieve the near optimal regulation in the presence of network constraints, such as, network induced time-varying delays and random packet losses under event-based transmission of the feedback signals. The transformed nonlinear NCS in discrete-time after the incorporation the delays and packet losses is utilized for the actor-critic NN based controller design. To relax the knowledge of the control coefficient matrix, a NN based identifier is used. Event sampled state vector is utilized as NN inputs and their respective weights are updated non-periodically at the occurrence of events. Further, an event-trigger condition is designed by using the Lyapunov technique to ensure ultimate boundedness of all the closed-loop signals and save network resources and computation. Moreover, policy and value iterations are not utilized for the stochastic optimal regulator design. Finally, the analytical design is verified by using a numerical example by carrying out Monte-Carlo simulations.

关键词： Event-triggered control optimal control adaptive dynamic programming neural networks networked control systems

来源：评论

学校读者我要写书评

暂无评论

Off-Policy Integral reinforcement learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2017年第3期28卷 704-713页

作者： Song, Ruizhuo Lewis, Frank L. Wei, Qinglai Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Univ Texas Arlington UTA Res Inst Arlington TX 76019 USA Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming integral reinforcement learning (IRL) nonlinear systems nonzero sum (NZS) off-policy

来源：评论

学校读者我要写书评

暂无评论

Beyond Exponential Utility Functions: A Variance-Adjusted Approach for Risk-Averse reinforcement learning

Beyond Exponential Utility Functions: A Variance-Adjusted Ap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Gosavi, Abhijit A. Das, Sajal K. Murray, Susan L. Missouri Univ Sci & Technol Dept Engn Management & Syst Engn Rolla MO 65409 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

ISBN: (纸本)9781479945528

Utility theory has served as a bedrock for modeling risk in economics. Where risk is involved in decision-making, for solving Markov decision processes (MDPs) via utility theory, the exponential utility (EU) function has been used in the literature as an objective function for capturing risk-averse behavior. The EU function framework uses a so-called risk-averseness coefficient (RAC) that seeks to quantify the risk appetite of the decision-maker. Unfortunately, as we show in this paper, the EU framework suffers from computational deficiencies that prevent it from being useful in practice for solution methods based on reinforcement learning (RL). In particular, the value function becomes very large and typically the computer overflows. We provide a simple example to demonstrate this. Further, we show empirically how a variance-adjusted (VA) approach, which approximates the EU function objective for reasonable values of the RAC, can be used in the RL algorithm. The VA framework in a sense has two objectives: maximize expected returns and minimize variance. We conduct empirical studies on a VA-based RL algorithm on the semi-MDP (SMDP), which is a more general version of the MDP. We conclude with a mathematical proof of the boundedness of the iterates in our algorithm.

关键词： Markov processes decision making economics learning (artificial intelligence) mathematical analysis risk analysis utility theory EU function MDP Markov decision process RAC VA approach exponential utility functions mathematical proof risk-averse reinforcement learning risk-averseness coefficient variance-adjusted approach Computers Equations learning (artificial intelligence) Linear programming Mathematical model Measurement Markov chain utility theory formal proof economics AKT1 gene Computers decision making mathematical analysis linear programming Risk Management risk analysis learning (artificial intelligence) Mathematical Model

来源：评论

学校读者我要写书评

暂无评论

Randomly sampling actions in dynamic programming

Randomly sampling actions in dynamic programming

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Atkeson, Christopher G. Carnegie Mellon Univ Inst Robot Pittsburgh PA 15213 USA

ISBN: (纸本)9781424407064

We describe an approach towards reducing the curse of dimensionality for deterministic dynamic programming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching over a discretized multidimensional grid. We present results on finding time invariant control laws for two, four, and six dimensional deterministic swing up problems with up to 480 million discretized states.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Novel Discounted adaptive Critic Control Designs With Accelerated learning Formulation

引用

ieee TRANSACTIONS ON CYBERNETICS 2024年第5期54卷 3003-3016页

作者： Ha, Mingming Wang, Ding Liu, Derong Ant Grp MYbank Beijing 100020 Peoples R China Univ Sci & Technol Beijing Sch Automation & Elect Engn Beijing 100083 Peoples R China Beijing Univ Technol Fac Informat Technol Beijing Key Lab Computat Intelligence & Intelligen Beijing 100124 Peoples R China Southern Univ Sci & Technol Sch Syst Design & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

Inspired by the successive relaxation method, a novel discounted iterative adaptive dynamic programming framework is developed, in which the iterative value function sequence possesses an adjustable convergence rate. The different convergence properties of the value function sequence and the stability of the closed-loop systems under the new discounted value iteration (VI) are investigated. Based on the properties of the given VI scheme, an accelerated learning algorithm with convergence guarantee is presented. Moreover, the implementations of the new VI scheme and its accelerated learning design are elaborated, which involve value function approximation and policy improvement. A nonlinear fourth-order ball-and-beam balancing plant is used to verify the performance of the developed approaches. Compared with the traditional VI, the present discounted iterative adaptive critic designs greatly accelerate the convergence rate of the value function and reduce the computational cost simultaneously.

关键词： Iterative methods Convergence Power system stability Optimal control Stability criteria Cost function Closed loop systems adaptive critic designs adaptive dynamic programming (ADP) discrete-time nonlinear systems fast convergence rate reinforcement learning value iteration (VI)

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning for Partially Observable dynamic Processes: adaptive dynamic programming Using Measured Output Data

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2011年第1期41卷 14-25页

作者： Lewis, F. L. Vamvoudakis, Kyriakos G. Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

Approximate dynamic programming (ADP) is a class of reinforcement learning methods that have shown their importance in a variety of applications, including feedback control of dynamical systems. ADP generally requires full information about the system internal states, which is usually not available in practical situations. In this paper, we show how to implement ADP methods using only measured input/output data from the system. Linear dynamical systems with deterministic behavior are considered herein, which are systems of great interest in the control system community. In control system theory, these types of methods are referred to as output feedback (OPFB). The stochastic equivalent of the systems dealt with in this paper is a class of partially observable Markov decision processes. We develop both policy iteration and value iteration algorithms that converge to an optimal controller that requires only OPFB. It is shown that, similar to Q-learning, the new methods have the important advantage that knowledge of the system dynamics is not needed for the implementation of these learning algorithms or for the OPFB control. Only the order of the system, as well as an upper bound on its "observability index," must be known. The learned OPFB controller is in the form of a polynomial autoregressive moving-average controller that has equivalent performance with the optimal state variable feedback gain.

关键词： Approximate dynamic programming (ADP) data-based optimal control policy iteration (PI) output feedback (OPFB) value iteration (VI)

来源：评论

学校读者我要写书评

暂无评论

Power Control for Wireless VBR Video Streaming: From Optimization to reinforcement learning

引用

ieee TRANSACTIONS ON COMMUNICATIONS 2019年第8期67卷 5629-5644页

作者： Ye, Chuang Gursoy, M. Cenk Velipasalar, Senem Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA

In this paper, we investigate the problem of power control for streaming variable bit rate (VBR) videos over wireless links. A system model involving a transmitter (e.g., a base station) that sends VBR video data to a receiver (e.g., a mobile user) equipped with a playout buffer is adopted, as used in dynamic adaptive streaming video applications. In this setting, we analyze power control policies considering the following two objectives: 1) the minimization of the transmit power consumption and 2) the minimization of the transmission completion time of the communication session. In order to play the video without interruptions, the power control policy should also satisfy the requirement in which the VBR video data is delivered to the mobile user without causing playout buffer underflow or overflows. A directional water-filling algorithm, which provides a simple and concise interpretation of the necessary optimality conditions, is identified as the optimal offline policy. Following this, two online policies are proposed for power control based on channel side information (CSI) prediction within a short time window. dynamic programming is employed to implement the optimal offline and the initial online power control policies that minimize the transmit power consumption in the communication session. Subsequently, reinforcement learning (RL)-based approach is employed for the second online power control policy. Through the simulation results, we show that the optimal offline power control policy that minimizes the overall power consumption leads to substantial energy savings compared with the strategy of minimizing the time duration of video streaming. We also demonstrate that the RL algorithm performs better than the dynamic programming-based online grouped water-filling (GWF) strategy unless the channel is highly correlated.

关键词： dynamic programming playout buffer underflow playout buffer overflow power control reinforcement learning variable bit rate (VBR) video video streaming

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning and adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2022年第7期33卷 2781-2790页

作者： Bian, Tao Jiang, Zhong-Ping NYU Control & Networks Lab Tandon Sch Engn Dept Elect & Comp Engn Brooklyn NY 11201 USA

This article studies the adaptive optimal control problem for continuous-time nonlinear systems described by differential equations. A key strategy is to exploit the value iteration (VI) method proposed initially by Bellman in 1957 as a fundamental tool to solve dynamic programming problems. However, previous VI methods are all exclusively devoted to the Markov decision processes and discrete-time dynamical systems. In this article, we aim to fill up the gap by developing a new continuous-time VI method that will be applied to address the adaptive or nonadaptive optimal control problems for continuous-time systems described by differential equations. Like the traditional VI, the continuous-time VI algorithm retains the nice feature that there is no need to assume the knowledge of an initial admissible control policy. As a direct application of the proposed VI method, a new class of adaptive optimal controllers is obtained for nonlinear systems with totally unknown dynamics. A learning-based control algorithm is proposed to show how to learn robust optimal controllers directly from real-time data. Finally, two examples are given to illustrate the efficacy of the proposed methodology.

关键词： Nonlinear systems Optimal control adaptive systems dynamical systems Mathematical model Heuristic algorithms Linear systems adaptive optimal control nonlinear systems value iteration (VI)

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Decentralized Stabilization of Uncertain Nonlinear Large-Scale Systems With Mismatched Interconnections

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2020年第8期50卷 2870-2882页

作者： Yang, Xiong He, Haibo Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

This paper presents a novel decentralized control strategy for a class of uncertain nonlinear large-scale systems with mismatched interconnections. First, it is shown that the decentralized controller for the overall system can be represented by an array of optimal control policies of auxiliary subsystems. Then, within the framework of adaptive dynamic programming, a simultaneous policy iteration (SPI) algorithm is developed to solve the Hamilton-Jacobi-Bellman equations associated with auxiliary subsystem optimal control policies. The convergence of the SPI algorithm is guaranteed by an equivalence relationship. To implement the present SPI algorithm, actor and critic neural networks are applied to approximate the optimal control policies and the optimal value functions, respectively. Meanwhile, both the least squares method and the Monte Carlo integration technique are employed to derive the unknown weight parameters. Furthermore, by using Lyapunov's direct method, the overall system with the obtained decentralized controller is proved to be asymptotically stable. Finally, the effectiveness of the proposed decentralized control scheme is illustrated via simulations for nonlinear plants and unstable power systems.

关键词： Large-scale systems Decentralized control Optimal control dynamic programming Robustness Approximation algorithms adaptive dynamic programming (ADP) decentralized control large-scale systems mismatched interconnections reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

adaptive Critic Nonlinear Robust Control: A Survey

引用

ieee TRANSACTIONS ON CYBERNETICS 2017年第10期47卷 3429-3451页

作者： Wang, Ding He, Haibo Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Sch Comp & Control Engn Beijing 100049 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Guangdong Univ Technol Sch Automat Guangzhou 510006 Guangdong Peoples R China

adaptive dynamic programming (ADP) and reinforcement learning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of evaluation and improvement, at the background of information technology, such as artificial intelligence, big data, and deep learning. Although great progresses have been achieved and surveyed when addressing nonlinear optimal control problems, the research on robustness of ADP-based control strategies under uncertain environment has not been fully summarized. Hence, this survey reviews the recent main results of adaptive-critic-based robust control design of continuous-time nonlinear systems. The ADP-based nonlinear optimal regulation is reviewed, followed by robust stabilization of nonlinear systems with matched uncertainties, guaranteed cost control design of unmatched plants, and decentralized stabilization of interconnected systems. Additionally, further comprehensive discussions are presented, including event-based robust control design, improvement of the critic learning rule, nonlinear H-infinity control design, and several notes on future perspectives. By applying the ADP-based optimal and robust control methods to a practical power system and an overhead crane plant, two typical examples are provided to verify the effectiveness of theoretical results. Overall, this survey is beneficial to promote the development of adaptive critic control methods with robustness guarantee and the construction of higher level intelligent systems.

关键词： adaptive critic designs adaptive/approximate dynamic programming (ADP) boundedness convergence neural networks optimal control reinforcement learning robust control stability

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：