检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Feinberg, Eugene A. Kasyanov, Pavlo O. Zgurovsky, Michael Z. SUNY Stony Brook Dept Appl Math & Stat Stony Brook NY 11794 USA Natl Tech Univ Ukraine Kyiv Polytech Inst Inst Appl Syst Anal UA-03056 Kiev Ukraine Natl Tech Univ Ukraine Kyiv Polytech Inst UA-03056 Kiev Ukraine

ISBN: (纸本)9781479945528

This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.

关键词： Markov processes convergence of numerical methods decision making dynamic programming iterative methods Borel state COMDPs Markov decision processes POMDPs action sets completely observable MDPs dynamic programming algorithm general state infinite state partially observable MDPs sufficient condition total-cost MDPs unbounded one-step cost functions value iterations convergence Convergence Cost function Equations Extraterrestrial measurements Kernel Markov chain dynamic programming algorithm convergence of numerical methods Extraterrestrial measurements iterative methods Converge Cost functions dynamic programming SETTING Sufficient conditions Kernel

来源：评论

学校读者我要写书评

暂无评论

Full-range adaptive cruise control based on supervised adaptive dynamic programming

引用

NEUROCOMPUTING 2014年 125卷 57-67页

作者： Zhao, Dongbin Hu, Zhaohui Xia, Zhongpu Alippi, Cesare Zhu, Yuanheng Wang, Ding Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Guangdong Power Grid Corp Elect Power Res Inst Guangzhou 510080 Guangdong Peoples R China Politecn Milan Dipartimento Elettron & Informaz I-20133 Milan Italy

The paper proposes a supervised adaptive dynamic programming (SADP) algorithm for a full-range adaptive cruise control (ACC) system, which can be formulated as a dynamic programming problem with stochastic demands. The suggested ACC system has been designed to allow the host vehicle to drive both in highways and in Stop and Go (SG) urban scenarios. The ACC system can autonomously drive the host vehicle to a desired speed and/or a given distance from the target vehicle in both operational cases. Traditional adaptive dynamic programming (ADP) is a suitable tool to address the problem but training usually suffers from low convergence rates and hardly achieves an effective controller. A SADP algorithm which introduces the concept of inducing region is here introduced to overcome such training drawbacks. The SADP algorithm performs very well in all simulation scenarios and always better than more traditional controllers. The conclusion is that the proposed SADP algorithm is an effective control methodology able to effectively address the full-range ACC problem. (C) 2013 Elsevier B.V. All rights reserved.

关键词： adaptive dynamic programming Supervised reinforcement learning Neural networks adaptive cruise control Stop and go

来源：评论

学校读者我要写书评

暂无评论

Beyond Exponential Utility Functions: A Variance-Adjusted Approach for Risk-Averse reinforcement learning

Beyond Exponential Utility Functions: A Variance-Adjusted Ap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Gosavi, Abhijit A. Das, Sajal K. Murray, Susan L. Missouri Univ Sci & Technol Dept Engn Management & Syst Engn Rolla MO 65409 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

ISBN: (纸本)9781479945528

Utility theory has served as a bedrock for modeling risk in economics. Where risk is involved in decision-making, for solving Markov decision processes (MDPs) via utility theory, the exponential utility (EU) function has been used in the literature as an objective function for capturing risk-averse behavior. The EU function framework uses a so-called risk-averseness coefficient (RAC) that seeks to quantify the risk appetite of the decision-maker. Unfortunately, as we show in this paper, the EU framework suffers from computational deficiencies that prevent it from being useful in practice for solution methods based on reinforcement learning (RL). In particular, the value function becomes very large and typically the computer overflows. We provide a simple example to demonstrate this. Further, we show empirically how a variance-adjusted (VA) approach, which approximates the EU function objective for reasonable values of the RAC, can be used in the RL algorithm. The VA framework in a sense has two objectives: maximize expected returns and minimize variance. We conduct empirical studies on a VA-based RL algorithm on the semi-MDP (SMDP), which is a more general version of the MDP. We conclude with a mathematical proof of the boundedness of the iterates in our algorithm.

关键词： Markov processes decision making economics learning (artificial intelligence) mathematical analysis risk analysis utility theory EU function MDP Markov decision process RAC VA approach exponential utility functions mathematical proof risk-averse reinforcement learning risk-averseness coefficient variance-adjusted approach Computers Equations learning (artificial intelligence) Linear programming Mathematical model Measurement Markov chain utility theory formal proof economics AKT1 gene Computers decision making mathematical analysis linear programming Risk Management risk analysis learning (artificial intelligence) Mathematical Model

来源：评论

学校读者我要写书评

暂无评论

Subspace Identification for Predictive State Representation by Nuclear Norm Minimization

Subspace Identification for Predictive State Representation ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Glaude, Hadrien Pietquin, Olivier Enderli, Cyrille Univ Lille 1 F-59655 Villeneuve Dascq France CNRS LIFL UMR 8022 Lille 1SequeL Team F-75700 Paris France Thales Airborne Syst Elancourt France

ISBN: (纸本)9781479945528

Predictive State Representations (PSRs) are dynamical systems models that keep track of the system's state using predictions of future observations. In contrast to other models of dynamical systems, such as partially observable Markov decision processes, PSRs produces more compact models and can be consistently learned using statistics of the execution trace and spectral decomposition. In this paper we make a connection between rank minimization problems and learning PSRs. This allows us to derive a new algorithm based on nuclear norm minimization. In addition to estimate automatically the dimension of the system, our algorithm compares favorably with the state of art on randomly generated realistic problems of different sizes.

关键词： learning (artificial intelligence) statistics PSR dynamical systems execution trace nuclear norm minimization predictive state representation rank minimization spectral decomposition subspace identification Correlation Hidden Markov models History Minimization Noise Trajectory Vectors spectral representation dynamicAL SYSTEMS Hidden Markov models PRIMARY SYSTEM RELIEF trajectory Noise

来源：评论

学校读者我要写书评

暂无评论

Active learning for Classification: An Optimistic Approach

Active Learning for Classification: An Optimistic Approach

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Collet, Timothe Pietquin, Olivier Supelec MaLIS Res Grp Gif Sur Yvette France GeorgiaTech CNRS UMI 2958 Metz France Univ Lille 1 F-59655 Villeneuve Dascq France CNRS LIFL UMR 8022 Lille 1SequeL Team F-75700 Paris France Inst Univ France Paris France

ISBN: (纸本)9781479945528

In this paper, we propose to reformulate the active learning problem occurring in classification as a sequential decision making problem. We particularly focus on the problem of dynamically allocating a fixed budget of samples. This raises the problem of the trade off between exploration and exploitation which is traditionally addressed in the framework of the multiarmed bandits theory. Based on previous work on bandit theory applied to active learning for regression, we introduce four novel algorithms for solving the online allocation of the budget in a classification problem. Experiments on a generic classification problem demonstrate that these new algorithms compare positively to state-of-the-art methods.

关键词： decision making learning (artificial intelligence) optimisation pattern classification regression analysis active learning classification multiarmed bandits theory optimistic approach regression sequential decision making problem Algorithm design and analysis Noise Noise measurement Partitioning algorithms Resource management Shape Uncertainty Experiential learning Algorithm design and analysis Partitioning algorithms Noise measurement Pattern recognition management of resources Noise regression analysis decision making

来源：评论

学校读者我要写书评

暂无评论

Event-based Optimal Regulator Design for Nonlinear Networked Control Systems

Event-based Optimal Regulator Design for Nonlinear Networked...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Sahoo, Avimanyu Xu, Hao Jagannathan, S. Missouri Univ Sc & Tech Dept Elect & Comp Engn Rolla MO 65409 USA Texas A&M Univ Coll Sci & Engn Dept Elect Engn Corpus Christi TX USA

ISBN: (纸本)9781479945528

This paper presents a novel stochastic event-based near optimal control strategy to regulate a networked control system (NCS) represented as an uncertain nonlinear continuous time system. An online stochastic actor-critic neural network (NN) based approach is utilized to achieve the near optimal regulation in the presence of network constraints, such as, network induced time-varying delays and random packet losses under event-based transmission of the feedback signals. The transformed nonlinear NCS in discrete-time after the incorporation the delays and packet losses is utilized for the actor-critic NN based controller design. To relax the knowledge of the control coefficient matrix, a NN based identifier is used. Event sampled state vector is utilized as NN inputs and their respective weights are updated non-periodically at the occurrence of events. Further, an event-trigger condition is designed by using the Lyapunov technique to ensure ultimate boundedness of all the closed-loop signals and save network resources and computation. Moreover, policy and value iterations are not utilized for the stochastic optimal regulator design. Finally, the analytical design is verified by using a numerical example by carrying out Monte-Carlo simulations.

关键词： Event-triggered control optimal control adaptive dynamic programming neural networks networked control systems

来源：评论

学校读者我要写书评

暂无评论

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2009 - Proceedings: Welcome Message

2009 IEEE Symposium on Adaptive Dynamic Programming and Rein...

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2009 - Proceedings 2009年 viii页

作者： Liu, Derong

来源：评论

学校读者我要写书评

暂无评论

Integral reinforcement learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown dynamics

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第3期11卷 706-714页

作者： Li, Hongliang Liu, Derong Wang, Ding Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics. This algorithm is a fully model-free method solving the game algebraic Riccati equation forward in time. The developed algorithm updates value function, control and disturbance policies simultaneously. The convergence of the algorithm is demonstrated to be equivalent to Newton's method. To implement this algorithm, one critic network and two action networks are used to approximate the game value function, control and disturbance policies, respectively, and the least squares method is used to estimate the unknown parameters. The effectiveness of the developed scheme is demonstrated in the simulation by designing an H-infinity state feedback controller for a power system. Note to Practitioners-Noncooperative zero-sum differential game provides an ideal tool to study multiplayer optimal decision and control problems. Existing approaches usually solve the Nash equilibrium solution by means of offline iterative computation, and require the exact knowledge of the system dynamics. However, it is difficult to obtain the exact knowledge of the system dynamics for many real-world industrial systems. The algorithm developed in this paper is a fully model-free method which solves the zero-sum differential game problem forward in time by making use of online measured data. This method is not affected by errors between an identification model and a real system, and responds fast to changes of the system dynamics. Exploration signals are required to satisfy the persistence of excitation condition to update the value function and the policies, and these signals do not affect the convergence of the learning process. The least squares method is used to obtain the approximate solution for the zero-sum games with unknown dynamics. The developed a

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming reinforcement learning policy iteration zero-sum games

来源：评论

学校读者我要写书评

暂无评论

Multi-Objective reinforcement learning for AUV Thruster Failure Recovery

Multi-Objective Reinforcement Learning for AUV Thruster Fail...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Ahmadzadeh, Seyed Reza Kormushev, Petar Caldwell, Darwin G. Ist Italiano Tecnol Dept Adv Robot Via Morego 30 I-16163 Genoa Italy

ISBN: (纸本)9781479945528

This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes under-actuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.

关键词： autonomous underwater vehicles closed loop systems control engineering computing fault diagnosis learning (artificial intelligence) mobile robots optimal control state feedback AUV state feedback AUV thruster failure recovery Girona500 AUV closed-loop conflicting objective fault detection fault-tolerant control policy faulty thruster model-based direct policy search multiobjective reinforcement learning approach on-board simulated model optimal solution Optimization Sociology Statistics Trajectory Vectors Vehicle dynamics Vehicles Autonomous underwater vehicles control engineering computing Closed loop systems State feedback optimal solution trajectory Sociology vehicle Vehicle dynamics Mobile robots Defect detection Fault diagnosis learning (artificial intelligence) Optimal control CLOSED LOOP

来源：评论

学校读者我要写书评

暂无评论

Closed-Loop Control of Anesthesia and Mean Arterial Pressure Using reinforcement learning

Closed-Loop Control of Anesthesia and Mean Arterial Pressure...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Padmanabhan, Regina Meskin, Nader Haddad, Wassim M. Qatar Univ Dept Elect Engn Doha Qatar Georgia Inst Technol Sch Aerosp Engn Atlanta GA 30332 USA

ISBN: (纸本)9781479945528

General anesthesia is required for patients undergoing surgery as well as for some patients in the intensive care units with acute respiratory distress syndrome. However, most anesthetics affect cardiac and respiratory functions. Hence, it is important to monitor and control the infusion of anesthetics to meet sedation requirements while keeping patient vital parameters within safe limits. The critical task of anesthesia administration also necessitates that drug dosing be optimal, patient specific, and robust. In this paper, the concept of reinforcement learning (RL) is used to develop a closed-loop anesthesia controller using the bispectral index (BIS) as a control variable while concurrently accounting for mean arterial pressure (MAP). In particular, the proposed framework uses these two parameters to control propofol infusion rates to regulate the BIS and MAP within a desired range. Specifically, a weighted combination of the error of the BIS and MAP signals is considered in the proposed RL algorithm. This reduces the computational complexity of the RL algorithm and consequently the controller processing time.

关键词： closed loop systems computational complexity learning (artificial intelligence) medical computing medical control systems surgery BIS MAP RL acute respiratory distress syndrome anesthesia administration anesthetics bispectral index closed-loop control mean arterial pressure patient surgery reinforcement learning Anesthesia Biomedical monitoring Blood pressure Drugs Indexes learning (artificial intelligence) Optimal control Anesthetics complexity classes learning (artificial intelligence) Mean Arterial Pressure Adult Respiratory Distress Syndrome Anesthesia medical control systems Biomedical monitoring bispectral index medical computing Closed loop systems manufacturing automation protocol MITIGATION ACTION PLANS control ring Optimal control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：