检索结果-内蒙古大学图书馆

On the convergence of temporal-difference learning with linear function approximation

MACHINE LEARNING 2001年第3期42卷 241-267页

作者： Tadic, V Univ Melbourne Dept Elect & Elect Engn Parkville Vic 3010 Australia

The asymptotic properties of temporal-difference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an uncountable finite-dimensional state-space. Under mild conditions, the almost sure convergence of temporal-difference learning algorithms with linear function approximation is established and an upper bound for their asymptotic approximation error is determined. The obtained results are a generalization and extension of the existing results related to the asymptotic behavior of temporal-difference learning. Moreover, they cover cases to which the existing results cannot be applied, while the adopted assumptions seem to be the weakest possible under which the almost sure convergence of temporal-difference learning algorithms is still possible to be demonstrated.

关键词： temporal-difference learning reinforcement learning neuro-dynamic programming almost sure convergence Markov chains positive Harris recurrence

来源：评论

学校读者我要写书评

暂无评论

Parallel dynamic water supply scheduling in a cluster of computers

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2001年第15期13卷 1281-1302页

作者： Damas, A Salmerón, M Ortega, J Olivares, G Pomares, H Univ Granada Fac Ciencias Dept Comp Architecture & Comp Technol E-18071 Granada Spain

The parallelization of complex planning and control problems arising in diverse application areas in the industrial, services and commercial environments not only allows the determination of control variables in the required times but also improves the performance of the control procedure as more processors are involved in the execution of the parallel program. In this paper we describe a scheduling application in a water supply network to demonstrate the benefits of parallel processing. The procedure we propose combines dynamic programming with genetic algorithms and time series prediction in order to solve problems in which decisions are made in stages, and the states and control belong to a continuous space. Taking into account the computational complexity of these applications and the time constraints that are usually imposed, the procedure has been implemented by a parallel program in a cluster of computers, an inexpensive and widely extended platform that can make parallelism a practical means of tackling complex problems in many different environments. Copyright (C) 2001 John Wiley Sons, Ltd.

关键词： neuro-dynamic programming evolutionary computation parallel genetic algorithms neural networks clusters of computers scheduling of water supply networks

来源：评论

学校读者我要写书评

暂无评论

Simulation-based learning of cost-to-go for control of nonlinear processes

引用

KOREAN JOURNAL OF CHEMICAL ENGINEERING 2004年第2期21卷 338-344页

作者： Lee, JM Lee, JH Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

In this paper, we present a simulation-based dynamic programming method that learns the 'cost-to-go' function in an iterative manner. The method is intended to combat two important drawbacks of the conventional Model Predictive Control (MPC) formulation, which are the potentially exorbitant online computational requirement and the inability to consider the future interplay between uncertainty and estimation in the optimal control calculation. We use a nonlinear Van de Vusse reactor to investigate the efficacy of the proposed approach and identify further research issues.

关键词： nonlinear Model Predictive Control dynamic programming stochastic optimal control reinforcement learning neuro-dynamic programming function approximation

来源：评论

学校读者我要写书评

暂无评论

Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive dynamic programming

引用

IEEE TRANSACTIONS ON CYBERNETICS 2014年第12期44卷 2820-2833页

作者： Wei, Qinglai Wang, Fei-Yue Liu, Derong Yang, Xiong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming approximation error neural networks neuro-dynamic programming nonlinear systems optimal control reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Asynchronous Stochastic Approximations With Asymptotically Biased Errors and Deep Multiagent Learning

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2021年第9期66卷 3969-3983页

作者： Ramaswamy, Arunselvan Bhatnagar, Shalabh Quevedo, Daniel E. Paderborn Univ Heinz Nixdorf Inst D-33098 Paderborn Germany Paderborn Univ Dept Comp Sci D-33098 Paderborn Germany Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India Indian Inst Sci Robert Bosch Ctr Cyber Phys Syst Bangalore 560012 Karnataka India Queensland Univ Technol Sch Elect Engn & Robot Brisbane Qld 4001 Australia

Asynchronous stochastic approximations (SAs) are an important class of model-free algorithms, tools, and techniques that are popular in multiagent and distributed control scenarios. To counter Bellman's curse of dimensionality, such algorithms are coupled with function approximations. Although the learning/control problem becomes more tractable, function approximations affect stability and convergence. In this article, we present verifiable sufficient conditions for stability and convergence of asynchronous SAs with biased approximation errors. The theory developed herein is used to analyze policy gradient methods and noisy value iteration schemes. Specifically, we analyze the asynchronous approximate counterparts of the policy gradient (A2PG) and value iteration (A2VI) schemes. It is shown that the stability of these algorithms is unaffected by biased approximation errors, provided that they are asymptotically bounded. With respect to convergence (of A2VI and A2PG), a relationship between the limiting set and the approximation errors is established. Finally, experimental results that support the theory are presented.

关键词： Approximation algorithms Approximation error Stability analysis Function approximation Stochastic processes Asymptotic stability Convergence Almost sure boundedness (stability) asymptotically biased approximation errors asynchronous stochastic approximations deep function approximations deep reinforcement learning distributed control multi agent learning networked control systems neuro-dynamic programming

来源：评论

学校读者我要写书评

暂无评论

State of the Art of Adaptive dynamic programming and Reinforcement Learning

引用

CAAI Artificial Intelligence Research 2022年第2期1卷 93-110页

作者： Derong Liu Mingming Ha Shan Xue Department of Mechanical and Energy Engineering Southern University of Science and TechnologyShenzhen 518055China Department of Electrical and Computer Engineering University of Illinois at ChicagoIL 606071USA School of Automation and Electrical Engineering University of Science and Technology BeijingBeijing 100083China School of Computer Science and Engineering South China University of TechnologyGuangzhou 510006China

This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning(ADPRL).First,algorithms in reinforcement learning(RL)are introduced and their roots in dynamic programming are *** dynamic programming(ADP)is then introduced following a brief discussion of dynamic *** in ADP and RL have enjoyed the fast developments of the past decade from algorithms,to convergence and optimality analyses,and to stability *** key steps in the recent theoretical developments of ADPRL are mentioned with some future *** particular,convergence and optimality results of value iteration and policy iteration are reviewed,followed by an introduction to the most recent results on stability analysis of value iteration algorithms.

关键词： adaptive dynamic programming approximate dynamic programming adaptive critic designs neuro-dynamic programming neural dynamic programming reinforcement learning intelligent control learning control optimal control

来源：评论

学校读者我要写书评

暂无评论

Model research and features simplification for Scheduling of wafer fabrication system

Model research and features simplification for Scheduling of...

引用

4th International Conference on Computer Science and Education

作者： Wang, Ying Lin, Zhixian Li, Maoqing Xiamen Univ Dept Automat Xiamen 361005 Peoples R China

ISBN: (纸本)9781424435197

Scheduling of wafer fabrication system with machine failures and repair time is studied by neuro-dynmamic programming in this paper. States set and scheduling set are constructed, state transition probability is deduced and dynamic programming model with closed-release policy is analyzed. Key features are derived by simplification of non-bottleneck stations, and then scheduling policy can be obtained by approximation of optimization cost function with simulation. TRC model of HP is scheduled to illustrate the validity of the method.

关键词： wafer fabrication system Scheduling neuro-dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Optimal Control Scheme Based on Q-Learning Algorithm 7

Discrete-Time Optimal Control Scheme Based on <i>Q</i>-Learn...

引用

7th International Conference on Intelligent Control and Information Processing (ICICIP)

作者： Wei, Qinglai Liu, Derong Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

ISBN: (纸本)9781509021550

This paper is concerned with optimal control problems of discrete-time nonlinear systems via a novel Q-learning algorithm. In the newly developed Q-learning algorithm, the iterative Q function in each iteration is required to update on the whole state and control spaces, instead of being updated by a single state and control pair. A new convergence criterion of the corresponding Q-learning algorithm is presented, where the traditional constraints for the learning rates of Q-learning algorithms is relaxed. Finally, simulation results are provided to exemplify the good performance of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming neuro-dynamic programming Q-learning optimal control neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Generalized Policy Iteration ADP Algorithm With Approximation Errors

Discrete-Time Generalized Policy Iteration ADP Algorithm Wit...

引用

IEEE Symposium Series on Computational Intelligence (IEEE SSCI)

作者： Wei, Qinglai Li, Benkai Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing Peoples R China

ISBN: (纸本)9781538627266

This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming neuro-dynamic programming generalized policy iteration nonlinear systems optimal control neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Fair Energy Scheduling in Vehicle-to-Grid Networks in the Smart Grid

Fair Energy Scheduling in Vehicle-to-Grid Networks in the Sm...

引用

IEEE International Conference on Communications (ICC)

作者： Zhong, Weifeng Yu, Rong Zhang, Yan Guangdong Univ Technol Guangzhou Guangdong Peoples R China Simula Res Lab Trondheim Norway

ISBN: (纸本)9781479920037

Plug-in hybrid electric vehicles (PHEVs) are receiving growing attention to achieve a sustainable transport system and society. Due to the limited vehicle battery capacity, PHEVs perform charging and re-charging from time to time. It is visioned that the charging load with high PHEVs penetration will pose a considerable impact on the residential distribution network. Therefore, implementation of coordinated PHEVs charging becomes necessary for smart grid. For maintaining the household load, the limited energy supply may not fulfill all PHEVs charging load at any time. Thus, the fairness of energy scheduling among PHEVs should be considered. In this paper, charging fairness (CF) and discouraging-charging fairness (DCF) are proposed to guarantee the charging opportunity of each PHEV and fast recovery of PHEV driving ability. We formulate the problem of the fair energy scheduling in residential distribution network as a Semi Markov Decision Process (SMDP). The technique neuro-dynamic programming (NDP) is exploited to solve the corresponding problem in SMDP. In the scheduling process, Entropy Weight Method (EWM) is proposed to consider three key metrics: CF, DCF and cable power loss. Simulation results illustrate that the proposed scheduling scheme is able to avoid a number of peak load caused by PHEVs charging and at the same time reduce power loss without affecting traveling plan.

关键词： Plug-in hybrid electric vehicles smart grid neuro-dynamic programming entropy weight method fairness power loss

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：