检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Fonteneau, Raphael Murphy, Susan A. Wehenkel, Louis Ernst, Damien Department of Electrical Engineering and Computer Science University of Liège Belgium Department of Statistics University of Michigan United States

ISBN: (纸本)9781424498888

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising. © 2011 ieee.

关键词： learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Neuro-Optimal Temperature Control of Water-Gas Shift Reaction Using Stable Iterative adaptive dynamic programming

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2014年第11期61卷 6399-6408页

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, a novel data-driven stable iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal temperature control problems for water-gas shift (WGS) reaction systems. According to the system data, neural networks (NNs) are used to construct the dynamics of the WGS system and solve the reference control, respectively, where the mathematical model of the WGS system is unnecessary. Considering the reconstruction errors of NNs and the disturbances of the system and control input, a new stable iterative ADP algorithm is developed to obtain the optimal control law. The convergence property is developed to guarantee that the iterative performance index function converges to a finite neighborhood of the optimal performance index function. The stability property is developed to guarantee that each of the iterative control laws can make the tracking error uniformly ultimately bounded (UUB). NNs are developed to implement the stable iterative ADP algorithm. Finally, numerical results are given to illustrate the effectiveness of the developed method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming approximation errors data-driven control neural networks (NNs) optimal control reinforcement learning water-gas shift (WGS)

来源：评论

学校读者我要写书评

暂无评论

Protecting against evaluation overfitting in empirical reinforcement learning

Protecting against evaluation overfitting in empirical reinf...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Whiteson, Shimon Tanner, Brian Taylor, Matthew E. Stone, Peter Informatics Institute University of Amsterdam Netherlands Department of Computing Science University of Alberta Canada Department of Computer Science Lafayette College United States Department of Computer Science University of Texas Austin United States

ISBN: (纸本)9781424498888

Empirical evaluations play an important role in machine learning. However, the usefulness of any evaluation depends on the empirical methodology employed. Designing good empirical methodologies is difficult in part because agents can overfit test evaluations and thereby obtain misleadingly high scores. We argue that reinforcement learning is particularly vulnerable to environment overfitting and propose as a remedy generalized methodologies, in which evaluations are based on multiple environments sampled from a distribution. In addition, we consider how to summarize performance when scores from different environments may not have commensurate values. Finally, we present proof-of-concept results demonstrating how these methodologies can validate an intuitively useful range-adaptive tile coding method. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Heuristics for Multiagent reinforcement learning in Decentralized Decision Problems

Heuristics for Multiagent Reinforcement Learning in Decentra...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Allen, Martin W. Hahn, David MacFarland, Douglas C. Univ Wisconsin Dept Comp Sci La Crosse WI 54601 USA

ISBN: (纸本)9781479945528

Decentralized partially observable Markov decision processes (Dec-POMDPs) model cooperative multiagent scenarios, providing a powerful general framework for team-based artificial intelligence. While optimal algorithms exist for Dec-POMDPs, theoretical and empirical results demonstrate that they are impractical for many problems of real interest. We examine the use of reinforcement learning (RL) as a means to generate adequate, if not optimal, joint policies for Dec-POMDPs. It is easily demonstrated (and expected) that single-agent RL produces results of little joint utility. We therefore investigate heuristic methods, based upon the dynamics of the Dec-POMDP formulation, that bias the learning process to produce coordinated action. Empirical tests on a benchmark problem show that these heuristics significantly enhance learning performance, even out-performing a hand-crafted heuristic in cases where the learning process converges quickly.

关键词： Markov processes learning (artificial intelligence) multi-agent systems Dec-POMDP model cooperative multiagent decentralized decision problem heuristic method multiagent reinforcement learning partially observable Markov decision process team-based artificial intelligence Benchmark testing Complexity theory Equations Heuristic algorithms Joints learning (artificial intelligence) Markov chain Benchmark testing Complexity theory Heuristic algorithms Heuristics Joints Joints Multi-agent systems learning (artificial intelligence) heuristic methods learning processes Decentralized

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-Based Structural Control of Floating Wind Turbines

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2022年第3期52卷 1603-1613页

作者： Zhang, Jincheng Zhao, Xiaowei Wei, Xing Univ Warwick Sch Engn Coventry CV4 7AL W Midlands England

The structural control of floating wind turbines using active tuned mass damper is investigated in this article. To our knowledge, this is for the first time that reinforcement learning-based control approach is employed to this type of application. Specifically, an adaptive dynamic programming (ADP) algorithm is used to derive the optimal control law based on the nonlinear structural dynamics, and the large-scale machine learning platform Tensorflow is employed for the design and implementation of the neural network (NN) structure. Three fully connected NNs, i.e., a plant network, a critic network, and an action network, are included in the proposed NN structure. Their training requires the gradient information flowing through the whole network, which is tackled by automatic differentiation, a popular technique for deriving the gradients of complex networks automatically. While to our knowledge, the network structures in the existing literature are rather simple and the training of the hidden layer is usually ignored. This allows their gradients to be derived analytically, which is infeasible with complex network structures. Thus, automatic differentiation greatly improves the employed ADP algorithm's ability in solving complex problems. The simulation results of structural control of floating wind turbines show that ADP controller performs very well in both normal and extreme conditions, with the standard deviation of the platform pitch displacement being reduced by around 40%. A clear advantage of ADP controllers over the H-infinity controller is observed, especially in extreme conditions. Moreover, our design considers the tradeoff between the control performance and power consumption.

关键词： Wind turbines Resists Artificial neural networks Training Vibrations dynamic programming Shock absorbers Active structural control adaptive dynamic programming (ADP) floating wind turbine neural networks (NNs) reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

On learning with imperfect representations

On learning with imperfect representations

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Kalyanakrishnan, Shivaram Stone, Peter Department of Computer Science University of Texas at Austin 1616 Guadalupe St Austin TX 78701 United States

ISBN: (纸本)9781424498888

In this paper we present a perspective on the relationship between learning and representation in sequential decision making tasks. We undertake a brief survey of existing real-world applications, which demonstrates that the classical tabular representation seldom applies in practice. Specifically, several practical tasks suffer from state aliasing, and most demand some form of generalization and function approximation. Coping with these representational aspects thus becomes an important direction for furthering the advent of reinforcement learning in practice. The central thesis we present in this position paper is that in practice, learning methods specifically developed to work with imperfect representations are likely to perform better than those developed for perfect representations and then applied in imperfect- representation settings. We specify an evaluation criterion for learning methods in practice, and propose a framework for their synthesis. In particular, we highlight the degrees of representational bias prevalent in different learning methods. We reference a variety of relevant literature as a background for this introspective essay. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Control: A Survey and Recent Advances

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2021年第1期51卷 142-160页

作者： Liu, Derong Xue, Shan Zhao, Bo Luo, Biao Wei, Qinglai Guangdong Univ Technol Sch Automat Guangzhou 510006 Peoples R China South China Univ Technol Sch Comp Sci & Engn Guangzhou 510006 Peoples R China Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Cent South Univ Sch Automat Changsha 410083 Peoples R China Peng Cheng Lab Shenzhen 518000 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China

This article reviews the recent development of adaptive dynamic programming (ADP) with applications in control. First, its applications in optimal regulation are introduced, and some skilled and efficient algorithms are presented. Next, the use of ADP to solve game problems, mainly nonzero-sum game problems, is elaborated. It is followed by applications in large-scale systems. Note that although the functions presented in this article are based on continuous-time systems, various applications of ADP in discrete-time systems are also analyzed. Moreover, in each section, not only some existing techniques are discussed, but also possible directions for future work are pointed out. Finally, some overall prospects for the future are given, followed by conclusions of this article. Through a comprehensive and complete investigation of its applications in many existing fields, this article fully demonstrates that the ADP intelligent control method is promising in today's artificial intelligence era. Furthermore, it also plays a significant role in promoting economic and social development.

关键词： adaptive critic designs (ACDs) adaptive dynamic programming approximate dynamic programming intelligent control learning control neural dynamic programming neuro-dynamic programming optimal control reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Higher order Q-learning

Higher order Q-Learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Edwards, Ashley Pottenger, William M. Department of Computer Science University of Georgia Athens GA 30606 United States Department of Computer Science and DIMACS Rutgers University Piscataway NJ 08854 United States

ISBN: (纸本)9781424498888

Higher order learning is a statistical relational learning framework in which relationships between different instances of the same class are leveraged (Ganiz, Lytkin and Pottenger, 2009). learning can be supervised or unsupervised. In contrast, reinforcement learning (Q-learning) is a technique for learning in an unknown state space. Action selection is often based on a greedy, or epsilon greedy approach. The problem with this approach is that there is often a large amount of initial exploration before convergence. In this article we introduce a novel approach to this problem that treats a state space as a collection of data from which latent information can be extrapolated. From this data, we classify actions as leading to a high reward or low reward, and formulate behaviors based on this information. We provide experimental evidence that this technique drastically reduces the amount of exploration required in the initial stages of learning. We evaluate our algorithm in a well-known reinforcement learning domain, grid-world. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive critic designs for discrete-time zero-sum games with application to H_∞ control

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2007年第1期37卷 240-247页

作者： Al-Tamimi, Asma Abu-Khalaf, Murad Lewis, Frank L. Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA

In this correspondence, adaptive critic approximate dynamic programming designs are derived to solve the discrete-time zero-sum game in which the state and action spaces are continuous. This results in a forward-in-time reinforcement learning algorithm that converges to the Nash equilibrium of the corresponding zero-sum game. The results in this correspondence can be thought of as a way to solve the Riccati equation of the well-known discrete-time H-infinity optimal control problem forward in time. Two schemes are presented, namely: 1) a heuristic dynamic programming and 2) a dual-heuristic dynamic programming, to solve for the value function and the costate of the game, respectively. An H-infinity autopilot design for an F-16 aircraft is presented to-illustrate the results.

关键词： adaptive critics approximate dynamic programming (ADP) H-infinity optimal control policy iteration zero-sum game

来源：评论

学校读者我要写书评

暂无评论

Neuro-controller of Cement Rotary Kiln Temperature with adaptive Critic Designs

Neuro-controller of Cement Rotary Kiln Temperature with Adap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Lin, Xiaofeng Liu, Tangbo Song, Shaojian Song, Chunning Guangxi Univ Coll Elect Engn Nanning 530004 Peoples R China

ISBN: (纸本)9781424427611

The production process of the cement rotary kiln is a typical engineering thermodynamics with large inertia, lagging and nonlinearity. So it is very difficult to control this process accurately using traditional control theory. In order to guarantee the process to be stable, and to produce the high-grade cement clinker, it is important to make the temperature of the sintering zone stable. Artificial neural networks offer a solution to this problem due to their advantages, such as self-organization, self-adaptivity and fault tolerance. This paper introduces a novel nonlinear optimal neuro-controller which is based on adaptive critic design and uses the structure of action-dependant heuristic dynamic programming (ADHDP). The principle of ADHDP is presented. An action network and a critic network are set up in such a way that they basically learn from interactions based on local measurement to optimize the neuro-controller. The ADHDP neuro-controller has a simple frame-work and is independent from the system model. A simulation of the cement rotary kiln is carried out using Matlab/Simulink. The simulation results show that using the ADHDP neuro-controller it is possible to keep the temperature of sintering zone stable in a certain range, and the temperature can meet the requirements of cement clinker production. Simulation results also are presented to show that the neuro-controller with the ACD has the potential to control the cement rotary kiln.

关键词： cement rotary kiln the sintering zone model adaptive critic designs artificial neural network action-dependant heuristic dynamic programming (ADHDP)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：