检索结果-内蒙古大学图书馆

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Anderson, Stuart O. Srinivasa, Siddhartha S. Carnegie Mellon Univ Inst Robot 5000 Forbes Ave Pittsburgh PA 15213 USA Intel Res Pittsburgh Pittsburgh PA 15213 USA

ISBN: (纸本)9781424407064

Using domain knowledge to decompose difficult control problems is a widely used technique in robotics. Previous work has automated the process of identifying some qualitative behaviors of a system, finding a decomposition of the system based on that behavior, and constructing a control policy based on that decomposition. We introduce a novel method for auto matically finding decompositions of a task based on observing the behavior of a preexisting controller. Unlike previous work, these decompositions define reparameterizations of the state space that can permit simplified control of the system.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Development of reinforcement learning methods in control and decision making in the large scale dynamic game environments

Development of reinforcement learning methods in control and...

引用

ieee International symposium on Intelligent Control

作者： Orafa, S. Yazdanpanah, M. J. Lucas, C. Rahimikian, A. Ahmadabadi, M. Nili Univ Tehran Control & Intelligent Proc Ctr Excellence Fac Elect & Comp Engn Tehran Iran

ISBN: (纸本)9780780397989

In this paper, an analytical comparison is done between dynamic programming and reinforcement learning methods in dynamic two-player games. The emphasis is on the large number of states and actions available for each player and different conflictive optimization objectives of these games that make them complicated in modeling and analysis. Optimization and decision making is done through quantifying a modified Q-learning algorithm. By this method, it is shown that the information processing in large scale-long stage games will take shorter times and will result in lower decision costs whereas dynamic programming methods cannot handle them across long time-horizons.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Incremental Dual Heuristic dynamic programming Based Hybrid Approach for Multi-Channel Control of Unstable Tailless Aircraft

引用

ieee ACCESS 2022年 10卷 31677-31691页

作者： Li, Hangxu Sun, Liguo Tan, Wenqian Liu, Xiaoyu Dang, Weigao Beihang Univ Sch Aeronaut Sci & Engn Beijing 100191 Peoples R China

Actor-critic based online reinforcement learning control has been proved to be promising method for control of aerial vehicles. However, it is difficult to guarantee high-level success rate of initial training and to tune the large amount of parameters for actors and critics considering unstable multi-input and multi-output (MIMO) aircraft. In order to facilitate and simplify the training of the actor and critic for unstable aircraft, classic stability augmentation system (SAS) is designed for the open-loop aircraft first. Then the online incremental model based dual heuristic dynamic programming (IDHP) method, which has been proposed recently, is extended in application to design a multi-channel robust adaptive controller, and MIMO form network structures are designed and determined for the actors and critics considering the three-channel coupling issues. Consequently, the classic SAS and the IDHP controller make up a novel hybrid control framework. In this control framework, the SAS takes charge of counteracting the unstable eigenvalues of the open-loop aircraft system, and the IDHP takes charge on guaranteeing robust and adaptive performance for high-performance tailless aircraft equipped with the SAS. Specifically, the introduction of the classic control method decreases the difficulty of the initial training for multi-channel IDHP controller. The tuning process for initial parameters of actor and critic neural networks in multiple channels is greatly facilitated. Without the help of SAS, the initial training for multi-channel IDHP controllers of unstable plants is almost impossible to succeed. Finally, the novel hybrid control architecture and method are validated using the Innovative Control Effectors (ICE) model, which has unstable modes in the longitudinal dynamics. Typical aerodynamic model uncertainties are numerically simulated to demonstrate the effectiveness of the proposed control method.

关键词： Aircraft Aerospace control Atmospheric modeling MIMO communication Adaptation models Training Synthetic aperture sonar Incremental DHP actor and critic unstable aircraft reinforcement learning MIMO control

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Sztykgold, Alexandre Coppin, Gilles Hudry, Olivier GET ENST Bretagne LUSSI Dept CNRS TAMCICUMR 2872 Bretagne Germany GET ENST Bretagne Dept Comp Sci CNRS LTCI UMR 5141 Bretagne Germany

ISBN: (纸本)9781424407064

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to Temporal Differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues.

关键词： decision aid game theory graph theory viability theory Temporal Differences methods approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With reinforcement learning

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2019年第12期49卷 2739-2750页

作者： Lv, Yongfeng Ren, Xuemei Beijing Inst Technol Sch Automat Beijing 100081 Peoples R China

Inspired by Nash game theory, a multiplayer mixed-zero-sum (MZS) nonlinear game considering both two situations [zero-sum and nonzero-sum (NZS) Nash games] is proposed in this paper. A synchronous reinforcement learning (RL) scheme based on the identifier-critic structure is developed to learn the Nash equilibrium solution of the proposed MZS game. First, the MZS game formulation is presented, where the performance indexes for players 1 to N - 1 and N NZS Nash game are presented, and another performance index for players N and N + 1 zero-sum game is presented, such that player N cooperates with players 1 to N - 1, while competes with player N + 1, which leads to a Nash equilibrium of all players. A single-layer neural network (NN) is then used to approximate the unknown dynamics of the nonlinear game system. Finally, an RL scheme based on NNs is developed to learn the optimal performance indexes, which can be used to produce the optimal control policy of every player such that Nash equilibrium can be obtained. Thus, the widely used actor NN in RL literature is not needed. To this end, a recently proposed adaptive law is used to estimate the unknown identifier coefficient vectors, and an improved adaptive law with the error performance index is further developed to update the critic coefficient vectors. Both linear and nonlinear simulations are presented to demonstrate the existence of Nash equilibrium for MZS game and performance of the proposed algorithm.

关键词： Approximate dynamic programming (ADP) Nash games neural networks (NNs) reinforcement learning (RL) system identification

来源：评论

学校读者我要写书评

暂无评论

adaptive Safe reinforcement learning With Full-State Constraints and Constrained Adaptation for Autonomous Vehicles

引用

ieee TRANSACTIONS ON CYBERNETICS 2024年第3期54卷 1907-1920页

作者： Zhang, Yuxiang Liang, Xiaoling Li, Dongyu Ge, Shuzhi Sam Gao, Bingzhao Chen, Hong Lee, Tong Heng Natl Univ Singapore Dept Elect & Comp Engn Singapore 117583 Singapore Natl Univ Singapore Inst Funct Intelligent Mat Singapore 117583 Singapore Natl Univ Singapore Dept Elect & Comp Engn Singapore 117576 Singapore Beihang Univ Sch Cyber Sci & Technol Beijing 100191 Peoples R China Tongji Univ Clean Energy Automot Engn Ctr Shanghai 201804 Peoples R China Tongji Univ Coll Elect & Informat Engn Shanghai 201804 Peoples R China

High-performance learning-based control for the typical safety-critical autonomous vehicles invariably requires that the full-state variables are constrained within the safety region even during the learning process. To solve this technically critical and challenging problem, this work proposes an adaptive safe reinforcement learning (RL) algorithm that invokes innovative safety-related RL methods with the consideration of constraining the full-state variables within the safety region with adaptation. These are developed toward assuring the attainment of the specified requirements on the full-state variables with two notable aspects. First, thus, an appropriately optimized backstepping technique and the asymmetric barrier Lyapunov function (BLF) methodology are used to establish the safe learning framework to ensure system full-state constraints requirements. More specifically, each subsystem's control and partial derivative of the value function are decomposed with asymmetric BLF-related items and an independent learning part. Then, the independent learning part is updated to solve the Hamilton-Jacobi-Bellman equation through an adaptive learning implementation to attain the desired performance in system control. Second, with further Lyapunov-based analysis, it is demonstrated that safety performance is effectively doubly assured via a methodology of a constrained adaptation algorithm during optimization (which incorporates the projection operator and can deal with the conflict between safety and optimization). Therefore, this algorithm optimizes system control and ensures that the full set of state variables involved is always constrained within the safety region during the whole learning process. Comparison simulations and ablation studies are carried out on motion control problems for autonomous vehicles, which have verified superior performance with smaller variance and better convergence performance under uncertain circumstances. The effectiveness of the safe

关键词： adaptive dynamic programming (ADP) autonomous vehicles barrier Lyapunov function (BLF) safe reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Evolutionary computation on multitask reinforcement learning problems

Evolutionary computation on multitask reinforcement learning...

引用

ieee International Conference on Networking, Sensing and Control

作者： Handa, Hisashi Okayama Univ Grad Sch Nat Sci & Technol Okayama 7008530 Japan

ISBN: (纸本)9781424410750

Recently, Multitask learning, which can cope with several tasks, has attracted much attention. Multitask reinforcement learning introduced by Tanaka et al is a problem class where number of problem instances of Markov Decision Processes sampled from the same probability distributions is sequentially given to reinforcement learning agents. The purpose of solving this problem is to realize adaptive agents for newly given environments by using knowledge acquired from past experience. Evolutionary Algorithms are often used to solve reinforcement learning problems if problem classes are quite different with Markov Decision Processes or state-action space is quite huge. From the viewpoint of Evolutionary Algorithms studies, the Multitask reinforcement learning problems are regarded as dynamic problems whose fitness landscape has changed temporally. In this paper, a memory-based Evolutionary programming which is suitable for Multitask reinforcement learning problems is proposed.

关键词： multitask reinforcement learning problems evolutionary algorithms dynamic environments

来源：评论

学校读者我要写书评

暂无评论

Value-iteration based fitted policy iteration:: learning with a single trajectory

Value-iteration based fitted policy iteration:: Learning wit...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Antos, Andras Szepesvari, Csaba Munos, Remi Hungarian Acad Sci Comp & Automat Res Inst Kendu U 13-17 H-1111 Budapest Hungary Univ Alberta Dept Comput Sci Edmonton AB Canada

ISBN: (纸本)9781424407064

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee nearoptimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Pattern Driven dynamic Scheduling Approach using reinforcement learning

Pattern Driven Dynamic Scheduling Approach using Reinforceme...

引用

ieee International Conference on Automation and Logistics

作者： Wei Yingzi Jiang Xinli Hao Pingbo Gu Kanfeng Shenyang Ligong Univ Shenyang 110168 Peoples R China Chinese Acad Sci Shenyang Inst Automat Shenyang 110016 Peoples R China

ISBN: (纸本)9781424447947

Production scheduling is critical for manufacturing system. Dispatching rules are usually applied dynamically to schedule the job in the dynamic job-shop. The paper presents an adaptive iterative scheduling algorithm that operates dynamically to schedule the job in the dynamic job-shop. In order to get adaptive behavior, the reinforcement learning system is done with the phased Q-learning by defining the intermediate state pattern. We convert the scheduling problem into reinforcement learning problems by constructing a multi-phase dynamic programming process, including the definition of state representation, actions and the reward function. We use five heuristic rules, CNP-CR, CNP-FCFS, CNP-EFT, CNP-EDD and CNP-SPT, as actions and the scheduling objective: minimization of maximum completion time. So a complex dynamic scheduling problem can be divided into a sequential sub-problem easier to solve. We also analyze the time and the solution and present some experimental results.

关键词： reinforcement learning Contract Net Protocol (CNP) State Pattern dynamic Scheduling

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming as a Theory of Sensorimotor control

Adaptive Dynamic Programming as a Theory of Sensorimotor con...

引用

ieee Signal Processing in Medicine and Biology symposium (SPMB)

作者： Jiang, Yu Jiang, Zhong-Ping NYU Control & Networks Lab Dept Elect & Comp Engn Polytech Inst Brooklyn NY 11201 USA

ISBN: (纸本)9781467356664;9781467356657

This paper studies the control mechanism in human arm movements from a perspective of approximate/adaptive dynamic programming (ADP). The control scheme is developed by incorporating Ito calculus with the ADP method for continuous-time stochastic linear systems. An online learning technique is presented to find a robust optimal control policy without knowing the system dynamics. Finally, the proposed method is applied to a single-joint movement control problem and is validated by computer simulations.

关键词： approximation theory biomechanics calculus continuous time systems dynamic programming learning systems medical control systems motion control optimal control stochastic systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：