检索结果-内蒙古大学图书馆

16th UKSim-AMSS International Conference on Computer Modelling and Simulation (UKSim)

作者： Santos, Watson R. M. Queiroz, Jonathan A. Neto, Joao Viana da F. Rego, Patricia H. M. Santana, Ewaldo Andrade, Gustavo Univ Estadual Maranhao Fed Univ Maranhao Fed Inst Maranhao Embedded Syst & Intelligent Control Lab Sao Luis Maranhao Brazil

ISBN: (纸本)9781479949236

In this paper, a method to design online optimal policies that encompasses Hamilton-Jacobi-Bellman (HJB) equation solution approximation and heuristic dynamic programming (HDP) approach is proposed. Recursive least squares (RLS) algorithms are developed to approximate the HJB equation solution that is supported by a sequence of greedy policies. The proposal investigates the convergence properties of a family of RLS algorithms and its numerical complexity in the context of reinforcement learning and optimal control. The algorithms are computationally evaluated in an electric circuit model that represents an MIMO dynamic system. The results presented herein emphasize the convergence behaviour of the RLS, projection and Kaczmarz algorithms that are developed for online applications.

关键词： Recursive Least Squares Heuristic dynamic programming RLS Convergence MIMO dynamic Systems Optimal Control adaptive dynamic programming

来源：评论

学校读者我要写书评

暂无评论

2014 ieee International symposium on Intelligent Control, ISIC 2014

2014 IEEE International Symposium on Intelligent Control, IS...

引用

2014 ieee International symposium on Intelligent Control, ISIC 2014

ISBN: (纸本)9781479974061

The proceedings contain 56 papers. The topics discussed include: consensus with convergence rate in directed networks with multiple non-differentiable input delays;differentiated consensuses in a stochastic network with priorities;exponential synchronization for a new class of complex dynamical network with PIPC and hybrid TVD;output synchronization of uncertain nonlinear multi-agent systems with relative degree one;optimality of consensus protocols for multi-agent systems with interaction;robust synchronization of directed Lur'e networks with incremental nonlinearities;robust adaptive dynamic programming for continuous-time linear stochastic systems;simple adaptive output control of linear systems;passivity based stabilization of nonlinear 2D systems with application to iterative learning control;dual adaptive control of bimanual manipulation with online fuzzy parameter tuning;and on the intrinsic coordinatability of network control systems.

关键词：

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Based Controller Synthesis for Flexible Aircraft Wings

引用

ieee/CAA Journal of Automatica Sinica 2014年第4期1卷 435-448页

作者： Manoj Kumar Karthikeyan Rajagopal Sivasubramanya Nadar Balakrishnan Nhan T.Nguyen the Missouri University of Science&Technology the NASA Ames Research Center Moffet Field

Aeroelastic study of flight vehicles has been a subject of great interest and research in the last several years. Aileron reversal and flutter related problems are due in part to the elasticity of a typical airplane. Structural dynamics of an aircraft wing due to its aeroelastic nature are characterized by partial differential equations. Controller design for these systems is very complex as compared to lumped parameter systems defined by ordinary differential equations. In this paper, a stabilizing statefeedback controller design approach is presented for the heave dynamics of a wing-fuselage model. In this study, a continuous actuator in the spatial domain is assumed. A control methodology is developed by combining the technique of “proper orthogonal decomposition” and approximate dynamic programming. The proper orthogonal decomposition technique is used to obtain a low-order nonlinear lumped parameter model of the infinite dimensional system. Then a near optimal controller is designed using the single-network-adaptive-critic technique. Furthermore,to add robustness to the nominal single-network-adaptive-critic controller against matched uncertainties, an identifier based adaptive controller is proposed. Simulation results demonstrate the effectiveness of the single-network-adaptive-critic controller augmented with adaptive controller for infinite dimensional systems.

关键词： Adaptation models adaptive control Aircraft Atmospheric modeling Cost function learning (artificial intelligence) Method of moments Optimal control Single-network-adaptive-critic adaptive control adaptive critic flexible wing partial differential equation proper orthogonal decomposition uncertainty METHOD OF MOMENTS flexible airfoils Optimal control Wings adaptive controller learning Atmospheric models Partial differential equations Differential equations Control Methods adaptive control Aircraft flexible air vehicles Cost functions Adaptation models

来源：评论

学校读者我要写书评

暂无评论

A Novel Fuzzy reinforcement learning Approach in Two-Level Intelligent Control of 3-DOF Robot Manipulators

A Novel Fuzzy Reinforcement Learning Approach in Two-Level I...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Nasser Sadati Mohammad Mollaie Emamzadeh Electrical Engineering Department Sharif University of Technology Tehran Tehran Iran Electrical Engineering Department Sharif University of Technology Tehran Iran

In this paper, a fuzzy coordination method based on interaction prediction principle (IPP) and reinforcement learning is presented for the optimal control of robot manipulators with three degrees-of-freedom. For this purpose, the robot manipulator is considered as a two-level large-scale system where in the first level, the robot manipulator is decomposed into several subsystems. In the second level, a fuzzy interaction prediction system is introduced for coordination of the overall system where a critic vector is also used for evaluating its performance. The simulation results on using the proposed novel approach, for optimal control of robot manipulators show its effectiveness and superiority in comparison with the centralized optimization methods

关键词： Fuzzy control learning Intelligent control Intelligent robots Manipulators Robot kinematics Optimal control Large-scale systems Fuzzy systems Optimization methods

来源：评论

学校读者我要写书评

暂无评论

Strategy Generation with Cognitive Distance in Two-Player Games

Strategy Generation with Cognitive Distance in Two-Player Ga...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kosuke Sekiyama Ricardo Carnieri Toshio Fukuda Department of Micro-Nano Systems Engineering University of Nagoya Nagoya Japan

In game theoretical approaches to multi-agent systems, a payoff matrix is often given a priori and used by agents in action selection. By contrast, in this paper we approach the problem of decision making by use of the concept of cognitive distance, which is a notion of the difficulty of an action perceived subjectively by the agent. As opposed to ordinary physical distance, cognitive distance depends on the situation and skills of the agent, ultimately representing the perceived difficulty in performing an action given the current state. The concept of cognitive distance is applied to a two-player game scenario, and it is shown how an agent can learn a model of its skills by estimating and observing the outcomes of its actions. This skill model is then used during play in a minimax search for the best actions

关键词： Game theory Uncertainty Decision making dynamic programming learning Systems engineering and theory Multiagent systems Minimax techniques Stochastic processes

来源：评论

学校读者我要写书评

暂无评论

Two Novel On-policy reinforcement learning Algorithms based on TD(λ)-methods

Two Novel On-policy Reinforcement Learning Algorithms based ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Hado van Hasselt Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms

关键词： learning automata Neural networks dynamic programming Intelligent systems State estimation Probability distribution Stochastic systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Alexandre Sztykgold Gilles Coppin Olivier Hudry GET/ENST-Bretagne LUSSI Department France GET/ENST Computer Science Department France

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to temporal differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues

关键词： Game theory dynamic programming learning Decision making Computer science Military computing Stochastic processes Shortest path problem Qualifications Graph theory

来源：评论

学校读者我要写书评

暂无评论

Editorial Special Issue on adaptive dynamic programming and reinforcement learning

引用

ieee Transactions on Systems, Man, and Cybernetics: Systems 2020年第11期50卷 3944-3947页

作者： Liu, Derong Lewis, Frank L. Wei, Qinglai School of Automation Guangdong University of Technology Guangzhou510006 China Uta Research Institute University of Texas at Arlington Fort WorthTX76118 United States State Key Laboratory of Management and Control for Complex Systems Istitute of Automation Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China

The past decade has witnessed a surge in research activities related to adaptive dynamic programming (ADP) and reinforcement learning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamic programming problems. In a 1995 article by Barto et al. [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamic programming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray et al. [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix].

关键词： Special issues and sections reinforcement learning learning systems Control systems dynamic programming Real-time systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

Sparse Temporal Difference learning Using LASSO

Sparse Temporal Difference Learning Using LASSO

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Manuel Loth Manuel Davy Philippe Preux SequeL INRIA-Futurs LIFL CNRS University of Lille (USTL) France SequeL INRIA-Futurs Lagis CNRS Ecole Centrale de Lille France SequeL INRIA-Futurs LIFL CNRS University of Lille (USTL) France

We consider the problem of on-line value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approximators. We propose to fit the use of kernels into the temporal difference algorithms by using regression via the LASSO. We introduce the equi-gradient descent algorithm (EGD) which is a direct adaptation of the one recently introduced in the LARS algorithm family for solving the LASSO. We advocate our choice of the EGD as a judicious algorithm for these tasks. We present the EGD algorithm in details as well as some experimental results. We insist on the qualities of the EGD for reinforcement learning.

关键词： learning Kernel Convergence Computational efficiency dynamic programming Costs Minimization methods Input variables Approximation algorithms Linear approximation

来源：评论

学校读者我要写书评

暂无评论

ADHDP(λ) strategies based coordinated ramps metering with queuing consideration

ADHDP(λ) strategies based coordinated ramps metering with q...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xuerui Bai Dongbin Zhao Jianqiang Yi Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China

Ramp metering has been developed as a traffic management strategy to alleviate congestion on freeways. Most ramp metering control algorithms are concerned without queuing consideration, because its still a tough job to deal with the problems of coordinated multiple ramps metering with queuing consideration. In this paper, on the basis of our previous studies, we use action-dependent heuristic dynamic programming based on eligibility traces (ADHDP(lambda)) to solve local ramp metering and multiple ramps metering problems with queuing consideration. First, for the local ramp metering problem, we establish a comprehensive performance index which considers both traffic density and on-ramp queue length. Second, for the multiple ramps metering problem, based on ADHDP(lambda), the coordinated ramps metering and regulating queue lengths are achieved at the same time. Simulation studies on a hypothetical freeway are reported. It is shown that the proposed control scheme is efficient.

关键词： Traffic control Telecommunication traffic dynamic programming Communication system traffic control Performance analysis Laboratories Intelligent systems Automation Automatic control Feedback control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：