检索结果-内蒙古大学图书馆

Higher level application of ADP: A next phase for the control field?

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2008年第4期38卷 901-912页

作者： Lendaris, George G. Portland State Univ Dept Elect & Comp Engn NW Computat Intelligence Lab Syst Sci Grad Program Portland OR 97207 USA

Two distinguishing features of humanlike control vis-a-vis current technological control are the ability to make use of experience while selecting a control policy for distinct situations and the ability to do so faster and faster as more experience is gained (in contrast to current technological implementations that slow down as more knowledge is stored). The notions of context and context discernment are important to understanding this human ability. Whereas methods known as adaptive control and learning control focus on modifying the design of a controller as changes in context occur, experience-based (EB) control entails selecting a previously designed controller that is appropriate to the current situation. Developing the EB approach entails a shift of the technologist's focus "up a level" away from designing individual (optimal) controllers to that of developing online algorithms that efficiently and effectively select designs from a repository of existing controller solutions. A key component of the notions presented here is that of higher level learning algorithm. This is a new application of reinforcement learning and, in particular, approximate dynamic programming, with its focus shifted to the posited higher level, and is employed, with very promising results. The author's hope for this paper is to inspire and guide future work in this promising area.

关键词： approximate dynamic programming (ADP) artificial intelligence (AI) context context discernment experience-based identification and control (EBIC) neural networks (NNs) optimal control reinforcement learning (RL) system identification (SID)

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-Based Event-Triggered FCS-MPC for Power Converters

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2023年第12期70卷 11841-11852页

作者： Liu, Xing Qiu, Lin Fang, Youtong Rodriguez, Jose Zhejiang Univ Coll Elect Engn Hangzhou 310027 Peoples R China Zhejiang Univ Univ Illinois Urbana Champaign Inst Hangzhou 310027 Peoples R China Univ San Sebastian Santiago Fac Engn Santiago 8420524 Chile

This article aims to first focus on an improvement of finite control-set model predictive control strategy for power converters that is based on reinforcement learning event-triggered predictive control architecture with the help of adaptive dynamic programming technique and event-triggered mechanism subject to system uncertainties. Our development, endowed with the merits of reinforcement learning and event-triggered control as well as a predictive control solution, is able to alleviate the issues of parametric uncertainties and high switching frequency inherent in the existing scheme, while retaining the merits of the finite control-set model predictive control. Finally, this proposal is experimentally evaluated, where robust performance tests confirm the interest and applicability of the proposed control methodology.

关键词： Event-triggered mechanism finite control-set model predictive control neural network reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Approximate Real-Time Optimal Control Based on Sparse Gaussian Process Models

Approximate Real-Time Optimal Control Based on Sparse Gaussi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Boedecker, Joschka Springenberg, Jost Tobias Wuelfing, Jan Riedmiller, Martin Univ Freiburg Dept Comp Sci Machine Learning Lab D-79110 Freiburg Germany

ISBN: (纸本)9781479945528

In this paper we present a fully automated approach to (approximate) optimal control of non-linear systems. Our algorithm jointly learns a non-parametric model of the system dynamics - based on Gaussian Process Regression (GPR) - and performs receding horizon control using an adapted iterative LQR formulation. This results in an extremely data-efficient learning algorithm that can operate under real-time constraints. When combined with an exploration strategy based on GPR variance, our algorithm successfully learns to control two benchmark problems in simulation (two-link manipulator, cart-pole) as well as to swing-up and balance a real cart-pole system. For all considered problems learning from scratch, that is without prior knowledge provided by an expert, succeeds in less than 10 episodes of interaction with the system.

关键词： Gaussian processes learning systems linear quadratic control manipulators nonlinear dynamical systems regression analysis GPR variance Gaussian process regression approximate real-time optimal control cart-pole system data-efficient learning algorithm iterative LQR formulation nonlinear systems receding horizon control sparse Gaussian process models system dynamics nonparametric model two-link manipulator Approximation algorithms Approximation methods Computational modeling Optimal control Optimization Predictive models Trajectory Gaussian processes Optimal control linear quadratic control Nonlinear systems learning systems Approximation method Nonlinear dynamical systems Approximation algorithms Manipulators Computational modeling Prediction models trajectory exploration strategy regression analysis Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

Bi-Level adaptive Storage Expansion Strategy for Microgrids Using Deep reinforcement learning

引用

ieee TRANSACTIONS ON SMART GRID 2024年第2期15卷 1362-1375页

作者： Huang, Bin Zhao, Tianqiao Yue, Meng Wang, Jianhui Southern Methodist Univ Elect & Comp Engn Dept Dallas TX 75205 USA Brookhaven Natl Lab Interdisciplinary Sci Dept Upton NY 11973 USA

Battery energy storage (BES) is a versatile resource for the secure and economic operation of microgrids (MGs). Prevailing stochastic optimization-based approaches for BES expansion planning for MGs are computationally complicated. This work proposes a data-driven bi-level multi-period BES expansion planning framework to determine the siting, sizing, and timing of BES installations. The proposed planning framework unifies deep reinforcement learning (DRL) and linear programming, thereby decoupling the determinations for the integer and continuous decision variables in two time scales, respectively. In the upper level, a rainbow DRL agent with quantile regression is trained to provide dynamic planning policies to accommodate stochastic renewable energy resources (RESs), load, and battery price changes efficiently. The lower level computes the optimal operation of MGs with frequency constraints to hedge the islanding contingency. The two levels communicate with one another by exchanging storage configuration and operating expenses in order to accomplish the shared goal of minimizing investment and operation costs. Comparative case studies on an MG are carried out to demonstrate the superiority of the proposed DRL-based solution to the mixed-integer linear programming counterpart on efficiency, scalability, and adaptability.

关键词： Planning Costs Uncertainty Indexes Investment Stochastic processes Batteries Battery storage systems deep reinforcement learning microgrid planning two-timescale

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning in continuous action spaces

Reinforcement learning in continuous action spaces

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： van Hasselt, Hado Wiering, Marco A. Univ Utrecht Dept Informat & Comp Sci Intelligent Syst Grp Padualaan 14 NL-3508 TB Utrecht Netherlands

ISBN: (纸本)9781424407064

Quite some research has been done on reinforcement learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic learning Automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Model-based reinforcement learning in factored-state MDPs

Model-based reinforcement learning in factored-state MDPs

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Strehl, Alexander L. Rutgers State Univ Dept Comp Sci Piscataway NJ 08854 USA

ISBN: (纸本)9781424407064

We consider the problem of learning in a factored state Markov Decision Process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Koller for their DBN-E-3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the Interval Estimation approach to exploration and can be expected to outperform factored Rmax on most domains.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Value-Gradient learning

Value-Gradient Learning

引用

ieee International Conference on Fuzzy Systems (FUZZ-ieee)/International Joint Conference on Neural Networks (IJCNN)/ieee Congress on Evolutionary Computation (ieee-CEC)/ieee World Congress on Computational Intelligence (ieee-WCCI)

作者： Fairbank, Michael Alonso, Eduardo City Univ London Sch Informat Dept Comp London EC1V 0HB England

ISBN: (纸本)9781467314909

We describe an adaptive dynamic programming algorithm VGL(lambda) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends Dual Heuristic dynamic programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(lambda). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms Dual Heuristic dynamic programming and TD(lambda). Experiments for control problems using a neural network and greedy policy are provided.

关键词： Value-Gradient learning Dual Heuristic dynamic programming DHP adaptive dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A dynamic checkpointing scheme based on reinforcement learning

A dynamic checkpointing scheme based on reinforcement learni...

引用

10th ieee Pacific Rim International symposium on Dependable Computing (PRDC 2004)

作者： Okamura, H Nishimura, Y Dohi, T Hiroshima Univ Grad Sch Engn Dept Informat Engn Higashihiroshima 7398527 Japan

ISBN: (纸本)0769520766

In this paper, we develop a new checkpointing scheme for a uniprocess application. First, we model the checkpointing scheme by a semi-Markov decision process, and apply the reinforcement learning algorithm to estimate statistically the optimal checkpointing policy. More specifically, the representative reinforcement learning algorithm, called the Q-learning algorithm, is used to develop an adaptive checkpointing scheme. In simulation experiments, we examine the asymptotic behavior of the system overhead with adaptive checkpointing and show quantitatively that the proposed dynamic checkpoint algorithm is useful and robust under an incomplete knowledge on the failure time distribution.

关键词： dynamic checkpointing uniprocess application semi-Markov decision process reinforcement learning Q-learning

来源：评论

学校读者我要写书评

暂无评论

An optimal ADP algorithm for a high-dimensional stochastic control problem

An optimal ADP algorithm for a high-dimensional stochastic c...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Nascimento, Juliana Powell, Warren Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

ISBN: (纸本)9781424407064

We propose a provably optimal approximate dynamic programming algorithm for a class of multistage stochastic problems, taking into account that the probability distribution of the underlying stochastic process is not known and the state space is too large to be explored entirely. The algorithm and its proof of convergence rely on the fact that the optimal value functions of the problems within the problem class are concave and piecewise linear. The algorithm is a combination of Monte Carlo simulation, pure exploitation, stochastic approximation and a projection operation. Several applications, in areas like energy, control, inventory and finance, fall under the framework.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2016年第2期27卷 444-458页

作者： Wei, Qinglai Song, Ruizhuo Yan, Pengfei Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. adaptive dynamic programming ( ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming neurodynamic programming nonlinear systems optimal control recurrent neural network (RNN) reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：