检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Francois-Lavet, Vincent Fonteneau, Raphael Ernst, Damien Univ Liege Dept Elect Engn & Comp Sci B-4000 Liege Belgium

ISBN: (纸本)9781479945528

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the dynamic programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be advisable to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity.

关键词： dynamic programming electrolysis fuel cells hydrogen storage power markets Belgian market day-ahead electricity market dynamic programming principle high-capacity storage device hydrogen-based storage devices interseasonal price fluctuations maximum revenue estimation optimal revenue dynamic programming Electricity Electrochemical processes Fuel cells Hydrogen Hydrogen storage

来源：评论

学校读者我要写书评

暂无评论

Offline Data-Driven adaptive Critic Design With Variational Inference for Wastewater Treatment Process Control

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2024年第4期21卷 4987-4998页

作者： Qiao, Junfei Yang, Ruyue Wang, Ding Beijing Univ Technol Fac Informat Technol Beijing Key Lab Computat Intelligence & Intelligen Beijing Lab Smart Environm Protect Beijing 100124 Peoples R China Beijing Univ Technol Beijing Inst Artificial Intelligence Beijing 100124 Peoples R China

Wastewater treatment is indispensable to the functioning of urban society, and its optimal control has enormous social benefits. However, precise modelling of the unstable and complex treatment process is challenging yet crucial to the adaptive dynamic programming method. In this article, an adaptive critic algorithm with variational inference is designed to address the optimal control problem of nonlinear discrete-time systems, along with the convergence analysis. Based on the recorded system trajectory, the variational autoencoder is utilized to approximate the behavior policy of the offline dataset without system modelling and online interaction. Through policy iteration learning, the actor-critic structure can amend the policy generated by the variational autoencoder to achieve the optimal control objective. Simulations on a nonlinear system and the wastewater treatment process have verified that the proposed approach outperformed the behavior policy. Driven by the wastewater treatment process data derived from the incremental proportional-integral-derivative controller, the proposed approach can produce an optimal control policy of less tracking error and cost. Note to Practitioners-When dealing with an unknown system with complex dynamics, it is more feasible to improve the acceptable performance of the existing control policy based on the system's trajectory than to obtain an excelling policy. Motivated by batch reinforcement learning, learning from offline data can avoid the online interaction between the system and the adaptive dynamic programming algorithm, which could lead to exploratory errors during online learning. Specifically, using a model-free adaptive dynamic programming algorithm, the parameters of the controller are instantly updated based on the experience replay buffer sampled from the online trajectory data. However, online exploration determines the update, and there is no guarantee that the system will converge every time. As a specific typ

关键词： adaptive dynamic programming offline reinforcement learning data-driven control variational autoencoder wastewater treatment

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming with balanced weights seeking strategy

Adaptive dynamic programming with balanced weights seeking s...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Fu, Jian He, Haibo Ni, Zhen School of Automation Wuhan University of Technology Wuhan Hubei 430070 China Department of Electrical Computerand Biomedical Engineering University of Rhode Island Kingston RI 02881 United States

ISBN: (纸本)9781424498888

In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptive dynamic programming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consider a balanced weight updating strategy with the consideration of both robustness and convergence during the online learning process. Specifically, a modified recursive Levenberg-Marquardt (LM) method is integrated into both the action network and critic network of the ADP design, and a detailed learning algorithm is proposed to implement this approach. We test the performance of our approach based on the triple link inverted pendulum, a popular benchmark in the community, to demonstrate online learning and control strategy. Experimental results and comparative study under different noise conditions demonstrate the effectiveness of this approach. © 2011 ieee.

关键词： Inverted pendulum

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using dynamic Output Feedback

引用

ieee TRANSACTIONS ON CYBERNETICS 2020年第11期50卷 4670-4679页

作者： Rizvi, Syed Ali Asad Lin, Zongli Univ Virginia Charles L Brown Dept Elect & Comp Engn Charlottesville VA 22904 USA

In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.

关键词： Output feedback Mathematical model Cost function Optimal control Iterative methods Stability analysis dynamic programming adaptive dynamic programming (ADP) linear quadratic regulator (LQR) output feedback reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

A Novel Iterative θ-adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第4期11卷 1176-1190页

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new iterative theta-adaptive dynamic programming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative theta-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative theta-ADP algorithm can stabilize the nonlinear system which means that the iterative theta-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative theta-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming neural networks neuro-dynamic programming nonlinear systems optimal control policy iteration reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Using reward-weighted regression for reinforcement learning of task space control

Using reward-weighted regression for reinforcement learning ...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Peters, Jan Schaal, Stefan Univ So Calif Los Angeles CA 90089 USA

ISBN: (纸本)9781424407064

Many robot control problems of practical importance, including task or operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive railway traffic control using approximate dynamic programming

引用

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES 2020年 113卷 91-107页

作者： Ghasempour, Taha Heydecker, Benjamin UCL Ctr Transport Studies London WC1E 6BT England

This study presents an adaptive railway traffic controller for real-time operations based on approximate dynamic programming (ADP). By assessing requirements and opportunities, the controller aims to limit consecutive delays resulting from trains that entered a control area behind schedule by sequencing them at a critical location in a timely manner, thus representing the practical requirements of railway operations. This approach depends on an approximation to the value function of dynamic programming after optimisation from a specified state, which is estimated dynamically from operational experience using reinforcement learning techniques. By using this approximation, the ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this investigation, we explore formulations of the approximation function and variants of the learning techniques used to estimate it. Evaluation of the ADP methods in a stochastic simulation environment shows considerable improvements in consecutive delays by comparison with the current industry practice of First-Come-First-Served sequencing. We also found that estimates of parameters of the approximate value function are similar across a range of test scenarios with different mean train entry delays.

关键词： Approximate dynamic programming reinforcement learning Railway traffic management adaptive control

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming based controller for an underactuated 6DoF quadrotor

An approximate Dynamic Programming based controller for an u...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Stingu, Emanuel Lewis, Frank L. Automation and Robotics Research Institute University of Texas at Arlington Arlington TX United States

ISBN: (纸本)9781424498888

This paper discusses how the principles of adaptive dynamic programming (ADP) can be applied to the control of a quadrotor helicopter platform flying in an uncontrolled environment and subjected to various disturbances and model uncertainties. ADP is based on reinforcement learning using an actor-critic structure. Due to the complexity of the quadrotor system, the learning process has to use as much information as possible about the system and the environment. Various methods to improve the learning speed and efficiency are presented. Neural networks with local activation functions are used as function approximators because the state-space can not be explored efficiently due to its size and the limited time available. The complex dynamics is controlled by a single critic and by multiple actors thus avoiding the curse of dimensionality. After a number of iterations, the overall actor-critic structure stores information (knowledge) about the system dynamics and the optimal controller that can accomplish the explicit or implicit goal specified in the cost function. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Online adaptive Integral reinforcement learning for Nonlinear Multi-Input System

引用

ieee TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS 2023年第11期70卷 4176-4180页

作者： Lv, Yongfeng Chang, Huimin Zhao, Jun Taiyuan Univ Technol Coll Elect & Power Engn Taiyuan 030024 Peoples R China Shanxi Univ Sch Math Sci Taiyuan 030006 Peoples R China Shandong Univ Sci & Technol Coll Transportat Qingdao 266590 Peoples R China

In this brief article, a novel adaptive integral reinforcement learning (AIRL) scheme is proposed for the continuous-time (CT) system. Moreover, it is used to learn the optimal controls of the partially unknown multi-input nonlinear system. Firstly, the Nash equilibrium of multi-input is defined. Two neural networks (NN) are used to approximate the cost functions with the integral reinforcement signal, which can avoid directly solving the Hamilton-Jacobi-Bellman (HJB) equation such that dynamic information and derivatives of NN activations are not needed. Then, a novel learning algorithm is used to update the unknown NN weights. The studied weights are used to obtain the optimum multi-policies. The learned weight convergence is proved. Finally, two examples are presented to verify the system performance with the proposed AIRL scheme.

关键词： Integral reinforcement multi-input system adaptive dynamic programming adaptive law Nash equilibrium

来源：评论

学校读者我要写书评

暂无评论

learning-Based adaptive Optimal Control of Linear Time-Delay Systems: A Policy Iteration Approach

引用

ieee TRANSACTIONS ON AUTOMATIC CONTROL 2024年第1期69卷 629-636页

作者： Cui, Leilei Pang, Bo Jiang, Zhong-Ping NYU Tandon Sch Engn Dept Elect & Comp Engn Control & Networks Lab Brooklyn NY 11201 USA

This article studies the adaptive optimal control problem for a class of linear time-delay systems described by delay differential equations. A crucial strategy is to take advantage of recent developments in reinforcement learning and adaptive dynamic programming and develop novel methods to learn adaptive optimal controllers from finite samples of input and state data. In this article, the data-driven policy iteration (PI) is proposed to solve the infinite-dimensional algebraic Riccati equation iteratively in the absence of exact model knowledge. Interestingly, the proposed recursive PI algorithm is new in the present context of continuous-time time-delay systems, even when the model knowledge is assumed known. The efficacy of the proposed learning-based control methods is validated by means of practical applications arising from metal cutting and autonomous driving.

关键词： Optimal control Aerospace electronics Mathematical models Heuristic algorithms Delays Trajectory Stability criteria adaptive dynamic programming (ADP) linear time-delay systems optimal control policy iteration (PI)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：