检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Preux, Philippe Girgin, Sertan Loth, Manuel Univ Lille Lab Informat Fondamentale Lille Comp Sci Lab CNRS Lille France INRIA Paris France

ISBN: (纸本)9781424427611

Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcement learning in particular. Based on our recent work on feature discovery in the context of reinforcement learning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamic programming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming strategy for responsive traffic signal control

An approximate dynamic programming strategy for responsive t...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Cai, Chen Univ Coll London Ctr Transport Studies London WC1E 6BT England

ISBN: (纸本)9781424407064

This paper proposes an approximate dynamic programming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of value function. The proposed value function approximation is separable and exogenous factor independent. The algorithm updates the approximated value function progressively in operation, while preserving the structural property of the control problem. The convergence and performance of the algorithm have been tested in a range of experiments. It has been concluded that the new strategy is as good as the best existing control strategies while being efficient and simple in computation. It also has the potential of being extended to multi-phase signal control at isolate junction and to decentralized network operation.

关键词： dynamic programming Traffic control Function approximation Communication system traffic control adaptive control Roads learning Testing Delay Vehicle safety

来源：评论

学校读者我要写书评

暂无评论

Recent Progress in reinforcement learning and adaptive dynamic programming for Advanced Control Applications

引用

ieee/CAA Journal of Automatica Sinica 2024年第1期11卷 18-36页

作者： Ding Wang Ning Gao Derong Liu Jinna Li Frank L.Lewis IEEE the Faculty of Information Technology Beijing Key Laboratory of Computational Intelligence and Intelligent SystemBeijing Laboratory of Smart Environmental Protectionand Beijing Institute of Artificial IntelligenceBeijing University of TechnologyBeijing 100124China the School of System Design and Intelligent Manufacturing Southern University of Science and TechnologyShenzhen 518055China the Department of Electrical and Computer Engineering University of Illinois at ChicagoChicago IL 60607 USA the School of Information and Control Engineering Liaoning Petrochemical UniversityFushun 113001China the UTA Research Institute the University of Texas at ArlingtonArlington TX 76118 USA

reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, ***, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation ***, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.

关键词： adaptive dynamic programming(ADP) advanced control complex environment data-driven control event-triggered design intelligent control neural networks nonlinear systems optimal control reinforcement learning(RL)

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning for Linear Continuous-time Systems: an Incremental learning Approach

引用

ieee/CAA Journal of Automatica Sinica 2019年第2期6卷 433-440页

作者： Tao Bian Zhong-Ping Jiang Bank of America Merrill Lynch IEEE the Control and Networks Lab Department of Electrical and Computer Engineering Tandon School of Engineering New York University

In this paper, we introduce a novel reinforcement learning(RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms,an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in realworld applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.

关键词： adaptive optimal control robust dynamic programming value iteration(Ⅵ)

来源：评论

学校读者我要写书评

暂无评论

A Data-based Online reinforcement learning Algorithm with High-efficient Exploration

A Data-based Online Reinforcement Learning Algorithm with Hi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Zhu, Yuanheng Zhao, Dongbin Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing Peoples R China

ISBN: (纸本)9781479945528

An online reinforcement learning algorithm is proposed in this paper to directly utilizes online data efficiently for continuous deterministic systems without system parameters. The dependence on some specific approximation structures is crucial to limit the wide application of online reinforcement learning algorithms. We utilize the online data directly with the kd-tree technique to remove this limitation. Moreover, we design the algorithm in the Probably Approximately Correct principle. Two examples are simulated to verify its good performance.

关键词： Trees (mathematics)

来源：评论

学校读者我要写书评

暂无评论

Particle swarm optimized adaptive dynamic programming

Particle swarm optimized adaptive dynamic programming

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Dongbin Zhao Jianqiang Yi Liu, Derong Chinese Acad Sci Inst Automat Key Lab Complex Syst & Intelligence Sci Beijing 100080 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

ISBN: (纸本)9781424407064

Particle swarm optimization is used for the training of the action network and critic network of the adaptive dynamic programming approach. The typical structures of the adaptive dynamic programming and particle swarm optimization are adopted for comparison to other learning algorithms such as gradient descent method. Besides simulation on the balancing of a cart pole plant, a more complex plant pendulum robot (pendubot) is tested for the learning performance. Compared to traditional adaptive dynamic programming approaches, the proposed evolutionary learning strategy is verified as faster convergence and higher efficiency. Furthermore, the structure becomes simple because the plant model does not need to be identified beforehand.

关键词： adaptive dynamic programming Particle swarm optimization Pendubot Pole balancing

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming-based optimal tracking control for nonlinear systems using general value iteration

Adaptive dynamic programming-based optimal tracking control ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Lin, Xiaofeng Ding, Qiang Kong, Weikai Song, Chunning Huang, Qingbao Guangxi Univ Sch Elect Engn Nanning Peoples R China

ISBN: (纸本)9781479945528

For the optimal tracking control problem of affine nonlinear systems, a general value iteration algorithm based on adaptive dynamic programming is proposed in this paper. By system transformation, the optimal tracking problem is converted into the optimal regulating problem for the tracking error dynamics. Then, general value iteration algorithm is developed to obtain the optimal control with convergence analysis. Considering the advantages of echo state network, we use three echo state networks with levenberg-Marquardt (LM) adjusting algorithm to approximate the system, the cost function and the control law. A simulation example is given to demonstrate the effectiveness of the presented scheme.

关键词： adaptive dynamic programming value iteration tracking control echo state network

来源：评论

学校读者我要写书评

暂无评论

Gr-GDHP: A New Architecture for Globalized Dual Heuristic dynamic programming

引用

ieee TRANSACTIONS ON CYBERNETICS 2017年第10期47卷 3318-3330页

作者： Zhong, Xiangnan Ni, Zhen He, Haibo Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA South Dakota State Univ Dept Elect Engn & Comp Sci Brooking SD 57007 USA

Goal representation globalized dual heuristic dynamic programming (Gr-GDHP) method is proposed in this paper. A goal neural network is integrated into the traditional GDHP method providing an internal reinforcement signal and its derivatives to help the control and learning process. From the proposed architecture, it is shown that the obtained internal reinforcement signal and its derivatives can be able to adjust themselves online over time rather than a fixed or predefined function in literature. Furthermore, the obtained derivatives can directly contribute to the objective function of the critic network, whose learning process is thus simplified. Numerical simulation studies are applied to show the performance of the proposed Gr-GDHP method and compare the results with other existing adaptive dynamic programming designs. We also investigate this method on a ball-and-beam balancing system. The statistical simulation results are presented for both the Gr-GDHP and the GDHP methods to demonstrate the improved learning and controlling performance.

关键词： adaptive dynamic programming (ADP) globalized dual heuristic dynamic programming (GDHP) goal representation neural network reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

An adaptive dynamic programming Algorithm to Solve Optimal Control of Uncertain Nonlinear Systems

An Adaptive Dynamic Programming Algorithm to Solve Optimal C...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Cui, Xiaohong Luo, Yanhong Zhang, Huaguang Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China

ISBN: (纸本)9781479945528

In this paper, an approximate optimal control method based on adaptive dynamic programming(ADP) is discussed for completely unknown nonlinear system. An online critic-action-identifier algorithm is developed using neural network systems, where the critic -action networks approximate the optimal value function and optimal control and the other two neural networks approximates the unknown system. Furthermore the adaptive tuning laws are given based on Lyapunov approach, which ensures the uniform ultimate bounded stability of the closed-loop system. Finally, the effectiveness is demonstrated by a simulation example.

关键词： Closed loop systems

来源：评论

学校读者我要写书评

暂无评论

Model-Based Multi-Objective reinforcement learning

Model-Based Multi-Objective Reinforcement Learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Wiering, Marco A. Withagen, Maikel Drugan, Madalina M. Univ Groningen Inst Artificial Intelligence NL-9700 AB Groningen Netherlands Vrije Univ Brussel Artificial Intelligence Lab Ixelles Brunei

ISBN: (纸本)9781479945528

This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multi-objective dynamic programming method to compute Pareto optimal policies. The advantage of this model-based multi-objective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic programming method will compute all Pareto optimal policies. Therefore it is important that the agent explores the environment in an intelligent way by using a good exploration strategy. In this paper we have supplied the agent with two different exploration strategies and compare their effectiveness in estimating accurate models within a reasonable amount of time. The experimental results show that our method with the best exploration strategy is able to quickly learn all Pareto optimal policies for the Deep Sea Treasure problem.

关键词： Pareto optimisation decision making dynamic programming learning (artificial intelligence) Pareto optimal policies deep sea treasure problem model-based multiobjective reinforcement learning multiobjective dynamic programming method multiobjective sequential decision making problem Computational modeling dynamic programming Heuristic algorithms learning (artificial intelligence) Markov processes Pareto optimization Vectors Pareto optimisation dynamic programming exploration strategy Heuristic algorithms learning (artificial intelligence) Computational modeling Markov chain Agents optimal strategy decision making

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：