检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Reindorp, Matthew J. Fu, Michael C. Department of Industrial Engineering and Innovation Sciences Eindhoven University of Technology Netherlands Robert H. Smith School of Business Institute for Systems Research University of Maryland United States

ISBN: (纸本)9781424498888

We consider a make-to-order business that serves customers in multiple priority classes. Orders from customers in higher classes bring greater revenue, but they expect shorter lead times than customers in lower classes. In making lead time promises, the firm must recognize preexisting order commitments, uncertainty over future demand from each class, and the possibility of supply chain disruptions. We model this scenario as a Markov decision problem and use reinforcement learning to determine the firm's lead time policy. In order to achieve tractability on large problems, we utilize a sequential decision-making approach that effectively allows us to eliminate one dimension from the state space of the system. Initial numerical results from the sequential dynamic approach suggest that the resulting policies more closely approximate optimal policies than static optimization approaches. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Theoretical Foundation of Goal Representation Heuristic dynamic programming

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2016年第12期27卷 2513-2525页

作者： Zhong, Xiangnan Ni, Zhen He, Haibo Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA South Dakota State Univ Dept Elect Engn & Comp Sci Brooking SD 57007 USA

Goal representation heuristic dynamic programming (GrHDP) control design has been developed in recent years. The control performance of this design has been demonstrated in several case studies, and also showed applicable to industrial-scale complex control problems. In this paper, we develop the theoretical analysis for the GrHDP design under certain conditions. It has been shown that the internal reinforcement signal is a bounded signal and the performance index can converge to its optimal value monotonically. The existence of the admissible control is also proved. Although the GrHDP control method has been investigated in many areas before, to the best of our knowledge, this is the first study of presenting the theoretical foundation of the internal reinforcement signal and how such an internal reinforcement signal can provide effective information to improve the control performance. Numerous simulation studies are used to validate the theoretical analysis and also demonstrate the effectiveness of the GrHDP design.

关键词： adaptive dynamic programming (ADP) convergence analysis goal representation neural network online learning and control

来源：评论

学校读者我要写书评

暂无评论

Safe reinforcement learning in high-risk tasks through policy improvement

Safe reinforcement learning in high-risk tasks through polic...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Garcia Polo, Francisco Javier Fernandez Rebollo, Fernando Computer Science Department Universidad Carlos III de Madrid Avenida de la Universidad 30 28911 Leganés Madrid Spain

ISBN: (纸本)9781424498888

reinforcement learning (RL) methods are widely used for dynamic control tasks. In many cases, these are high risk tasks where the trial and error process may select actions which execution from unsafe states can be catastrophic. In addition, many of these tasks have continuous state and action spaces, making the learning problem harder and unapproachable with conventional RL algorithms. So, when the agent begins to interact with a risky and large state-action space environment, an important question arises: how can we avoid that the exploration of the state-action space causes damages in the learning (or other) systems. In this paper, we define the concept of risk and address the problem of safe exploration in the context of RL. Our notion of safety is concerned with states that can lead to damage. Moreover, we introduce an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. We report experimental results using the helicopter hovering task from the RL Competition. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Distributed Approximate dynamic Control for Traffic Management of Busy Railway Networks

引用

ieee TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2020年第9期21卷 3788-3798页

作者： Ghasempour, Taha Nicholson, Gemma L. Kirkwood, David Fujiyama, Taku Heydecker, Benjamin UCL Ctr Transport Studies Fac Engn Sci London WC1E 6BT England Univ Birmingham Birmingham Ctr Railway Res & Educ Birmingham B15 2TT W Midlands England

Railway operations are prone to disturbances that can rapidly propagate through large networks, causing delays and poor performance. Automated re-scheduling tools have shown the potential to limit such undesirable outcomes. This study presents the network-wide effects of local deployment of an adaptive traffic controller for real-time operations that is built on approximate dynamic programming (ADP). The controller aims to limit train delays by advantageously controlling the sequencing of trains at critical locations. By using an approximation to the optimised value function of dynamic programming that is updated by reinforcement learning techniques, ADP reduces the computational burden substantially. This framework has been established for isolated local control, so here we investigate the effects of distributed deployment. Our ADP controller is interfaced with a microscopic railway traffic simulator to evaluate its effect on a large and dynamic railway system, which controls critical points independently. The proposed approach achieved a reduction in train delays by comparison with First-Come-First-Served control. We also found the improvements to be greater at terminal stations compared to the vicinity of our control areas.

关键词： Rail transportation Real-time systems Delays dynamic programming Rails Aerospace electronics Tools Approximate dynamic programming railway traffic management adaptive control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Robust Event-Driven Tracking Control of Nonlinear Systems With Asymmetric Input Constraints

引用

ieee TRANSACTIONS ON CYBERNETICS 2024年第11期54卷 6333-6344页

作者： Yang, Xiong Wei, Qinglai Tianjin Univ Sch Elect & Informat Engn Tianjin Key Lab Intelligent Unmanned Swarm Techno Tianjin 300072 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This article considers the robust dynamic event-driven tracking control problem of nonlinear systems having mismatched disturbances and asymmetric input constraints. Initially, to tackle the asymmetric constraints, a novel nonquadratic value function is constructed for the original system. This makes the asymmetrically constrained tracking control problem transformed into an unconstrained optimal regulation problem. Then, a dynamic event-driven mechanism is proposed. Meanwhile, the event-driven Hamilton-Jacobi-Bellman equation (ED-HJBE) is developed for the optimal regulation problem in order to acquire the optimal control with distinctly decreased computational burden. To solve the ED-HJBE, a single critic neural network (CNN) is designed in the adaptive dynamic programming framework. Meanwhile, the gradient descent method is employed to update the CNN's weights. After that, both the weight estimation error and the tracking error are proved to be uniformly ultimately bounded via Lyapunov's direct method. Finally, simulations of the spring-mass-damper system and the pendulum plant are separately utilized to validate the established theoretical claims.

关键词： Nonlinear systems Control systems Optimal control Regulation dynamic programming Perturbation methods adaptive systems adaptive dynamic programming (ADP) asymmetric constraint event-driven mechanism (EDM) neural network (NN) reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Algorithm and Stability of ATC Receding Horizon Control

Algorithm and Stability of ATC Receding Horizon Control

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Zhang, Hongwei Huang, Jie Lewis, Frank L. Chinese Univ Hong Kong Dept Mech & Automat Engn Shatin Hong Kong Peoples R China Univ Texas Arlingto Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424427611

Receding horizon control (RHC), also known as model predictive control (MPC), is a suboptimal control scheme that solves a finite horizon open-loop optimal control problem in an infinite horizon context and yields a measured state feedback control law. A lot of efforts have been made to study the closed-loop stability, leading to various stability conditions involving constraints on either the terminal state, or the terminal cost, or the horizon size, or their different combinations. In this paper, we propose a modified RHC scheme, called adaptive terminal cost RHC (ATC-RHC). The control law generated by ATC-RHC algorithm converges to the solution of the infinite horizon optimal control problem. Moreover, it ensures the closed-loop system to be uniformly ultimately exponentially stable without imposing any constraints on the terminal state, the horizon size, or the terminal cost. Finally we show that when the horizon size is one, the underlying problems of ATC-RHC and heuristic dynamic programming (RDP) are the same. Thus, ATC-RHC can be implemented using HDP techniques without knowing the system matrix A.

关键词： Receding horizon control adaptive terminal cost receding horizon control Stability Heuristic dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Note on State Parameterizations in Output Feedback reinforcement learning Control of Linear Systems

引用

ieee TRANSACTIONS ON AUTOMATIC CONTROL 2023年第10期68卷 6200-6207页

作者： Rizvi, Syed Ali Asad Lin, Zongli Tennessee Technol Univ Dept Elect & Comp Engn Cookeville TN 38505 USA Univ Virginia Charles L Brown Dept Elect & Comp Engn Charlottesville VA 22904 USA

This note presents an analysis of the state parameterizations used in output feedback reinforcement learning (RL) control. Output feedback algorithms based on state parameterization involve additional conditions on the state parameterization beyond the standard conditions on the system matrices for their convergence to the optimal solution. It is shown that the state parameterization matrix needs to be of full row rank to guarantee the convergence of the output feedback RL algorithms. We present conditions in terms of the system matrices and the user-defined observer dynamics that ensure full row rank of the state parameterization matrix.

关键词： Output feedback State feedback Convergence Observers Q-learning Regulators Observability adaptive dynamic programming optimal control output feedback control reinforcement learning (RL) state parameterization

来源：评论

学校读者我要写书评

暂无评论

A Comparison of Approximate dynamic programming Techniques on Benchmark Energy Storage Problems: Does Anything Work?

A Comparison of Approximate Dynamic Programming Techniques o...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Jiang, Daniel R. Pham, Thuy V. Powell, Warren B. Salas, Daniel F. Scott, Warren R.

ISBN: (纸本)9781479945528

As more renewable, yet volatile, forms of energy like solar and wind are being incorporated into the grid, the problem of finding optimal control policies for energy storage is becoming increasingly important. These sequential decision problems are often modeled as stochastic dynamic programs, but when the state space becomes large, traditional (exact) techniques such as backward induction, policy iteration, or value iteration quickly become computationally intractable. Approximate dynamic programming (ADP) thus becomes a natural solution technique for solving these problems to near-optimality using significantly fewer computational resources. In this paper, we compare the performance of the following: various approximation architectures with approximate policy iteration (API), approximate value iteration (AVI) with structured lookup table, and direct policy search on a benchmarked energy storage problem (i.e., the optimal solution is computable).

关键词： dynamic programming energy storage power engineering computing power system management renewable energy sources table lookup ADP API AVI approximate dynamic programming approximate policy iteration approximate value iteration backward induction dynamic programming techniques energy storage control policy lookup table natural solution technique solar energy stochastic dynamic programs wind energy Approximation algorithms Benchmark testing Energy storage Equations Function approximation Mathematical model Table lookup dynamic programming energy storage Power system management AVI Benchmark testing Power engineering computing function approximation Approximation algorithms Adenosine Diphosphate Automatic data processing Renewable energy renewable energy sources Wind energy Solar Energy

来源：评论

学校读者我要写书评

暂无评论

Application of reinforcement learning-based algorithms in CO2 allowance and electricity markets

Application of reinforcement learning-based algorithms in CO...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Nanduri, Vishnuteja Department of Industrial and Manufacturing Engineering University of Wisconsin-Milwaukee Milwaukee WI 53211 United States

ISBN: (纸本)9781424498888

Climate change is one of the most important challenges faced by the world this century. In the U.S., the electric power industry is the largest emitter of CO2, contributing to the climate crisis. Federal emissions control bills in the form of cap-and-trade programs are currently idling in the U.S. Congress. In the mean time, ten states in the northeastern U.S. have adopted a regional cap-and-trade program to reduce CO2 levels and also to increase investments in cleaner technologies. Many of the states in which the cap-and-trade programs are active operate under a restructured market paradigm, where generators compete to supply power. This research presents a bi-level game-theoretic model to capture competition between generators in cap-and-trade markets and restructured electricity markets. The solution to the game-theoretic model is obtained using a reinforcement learning based algorithm. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Active learning for Classification: An Optimistic Approach

Active Learning for Classification: An Optimistic Approach

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Collet, Timothe Pietquin, Olivier Supelec MaLIS Res Grp Gif Sur Yvette France GeorgiaTech CNRS UMI 2958 Metz France Univ Lille 1 F-59655 Villeneuve Dascq France CNRS LIFL UMR 8022 Lille 1SequeL Team F-75700 Paris France Inst Univ France Paris France

ISBN: (纸本)9781479945528

In this paper, we propose to reformulate the active learning problem occurring in classification as a sequential decision making problem. We particularly focus on the problem of dynamically allocating a fixed budget of samples. This raises the problem of the trade off between exploration and exploitation which is traditionally addressed in the framework of the multiarmed bandits theory. Based on previous work on bandit theory applied to active learning for regression, we introduce four novel algorithms for solving the online allocation of the budget in a classification problem. Experiments on a generic classification problem demonstrate that these new algorithms compare positively to state-of-the-art methods.

关键词： decision making learning (artificial intelligence) optimisation pattern classification regression analysis active learning classification multiarmed bandits theory optimistic approach regression sequential decision making problem Algorithm design and analysis Noise Noise measurement Partitioning algorithms Resource management Shape Uncertainty Experiential learning Algorithm design and analysis Partitioning algorithms Noise measurement Pattern recognition management of resources Noise regression analysis decision making

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：