检索结果-内蒙古大学图书馆

ieee symposium on Adaptive dynamic programming and reinforcement learning

作者： Preux, Philippe Girgin, Sertan Loth, Manuel Univ Lille Lab Informat Fondamentale Lille Comp Sci Lab CNRS Lille France INRIA Paris France

ISBN: (纸本)9781424427611

Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcement learning in particular. Based on our recent work on feature discovery in the context of reinforcement learning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamic programming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Coordinated reinforcement learning for decentralized optimal control

Coordinated reinforcement learning for decentralized optimal...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Yagan, Daniel Tharn, Chen-Khong Natl Univ Singapore Dept Elect & Comp Engn Singapore 117548 Singapore

ISBN: (纸本)9781424407064

We consider a multi-agent system where the overall performance is affected by the joint actions or policies of agents. However, each agent only observes a partial view of the global state condition. This model is known as a Decentralized Partially-Observable Markov Decision Process (DEC-POMDP), which can be considered more applicable in real-world applications such as communication networks. It is known that the exact solution to a DEC-POMDP is NEXP-complete and memory requirements grow exponentially even for finite-horizon problems. In this paper, we propose to address these issues by using an online model-free technique and by exploiting the locality of interaction among agents in order to approximate the joint optimal policy. Simulation results show the effectiveness and convergence of the proposed algorithm in the context of resource allocation for multi-agent wireless multihop networks.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Sztykgold, Alexandre Coppin, Gilles Hudry, Olivier GET ENST Bretagne LUSSI Dept CNRS TAMCICUMR 2872 Bretagne Germany GET ENST Bretagne Dept Comp Sci CNRS LTCI UMR 5141 Bretagne Germany

ISBN: (纸本)9781424407064

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to Temporal Differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues.

关键词： decision aid game theory graph theory viability theory Temporal Differences methods approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof

Discrete-time nonlinear HJB solution using approximate dynam...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Al-Tamimi, Asma Lewis, Frank Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424407064

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely Heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used- one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the Algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

关键词： adaptive critics approximate dynamic programming HJB policy iterations.

来源：评论

学校读者我要写书评

暂无评论

SVM viability controller active learning: Application to bike control

SVM viability controller active learning: Application to bik...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Chapel, Laetitia Deffuant, Guillaume Cemagref LISC Aubiere France

ISBN: (纸本)9781424407064

It was shown recently that SVMs are particularly adequate to define action policies to keep a dynamical system inside a given constraint set (in the framework of viability theory). However, the training set of the SVMs face the dimensionality curse, because it is based on a regular grid of the state space. In this paper, we propose an active learning approach, aiming at decreasing dramatically the training set size, keeping it as close as possible to the final number of support vectors. We use a virtual multi-resolution grid, and some particularities of the problem, to choose very efficient examples to add to the training set. To illustrate the performances of the algorithm, we solve a six-dimensional problem, controlling a bike on a track, problem usually solved using reinforcement learning techniques.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Q-learning with continuous state spaces and finite decision set

Q-learning with continuous state spaces and finite decision ...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Barty, Kengy Girardeau, Pierre Roy, Jean-Sebastien Strugarek, Cyrille EDF R&D 1 Ave Gen Gaulle F-92141 Clamart France

ISBN: (纸本)9781424407064

This paper aims to present an original technique in order to compute the optimal policy of a Markov Decision Problem with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced in 1989 by Watkins for discrete Markov Decision Problems. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We state under mild assumptions a converge theorem for this algorithm. Finally, we illustrate our algorithm by solving two classical problems: the Mountain car Task and the Puddle World Task.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Adaptive railway traffic control using approximate dynamic programming

引用

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES 2020年 113卷 91-107页

作者： Ghasempour, Taha Heydecker, Benjamin UCL Ctr Transport Studies London WC1E 6BT England

This study presents an adaptive railway traffic controller for real-time operations based on approximate dynamic programming (ADP). By assessing requirements and opportunities, the controller aims to limit consecutive delays resulting from trains that entered a control area behind schedule by sequencing them at a critical location in a timely manner, thus representing the practical requirements of railway operations. This approach depends on an approximation to the value function of dynamic programming after optimisation from a specified state, which is estimated dynamically from operational experience using reinforcement learning techniques. By using this approximation, the ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this investigation, we explore formulations of the approximation function and variants of the learning techniques used to estimate it. Evaluation of the ADP methods in a stochastic simulation environment shows considerable improvements in consecutive delays by comparison with the current industry practice of First-Come-First-Served sequencing. We also found that estimates of parameters of the approximate value function are similar across a range of test scenarios with different mean train entry delays.

关键词： approximate dynamic programming reinforcement learning Railway traffic management Adaptive control

来源：评论

学校读者我要写书评

暂无评论

Value-iteration based fitted policy iteration:: learning with a single trajectory

Value-iteration based fitted policy iteration:: Learning wit...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Antos, Andras Szepesvari, Csaba Munos, Remi Hungarian Acad Sci Comp & Automat Res Inst Kendu U 13-17 H-1111 Budapest Hungary Univ Alberta Dept Comput Sci Edmonton AB Canada

ISBN: (纸本)9781424407064

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee nearoptimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Identifying trajectory classes in dynamic tasks

Identifying trajectory classes in dynamic tasks

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Anderson, Stuart O. Srinivasa, Siddhartha S. Carnegie Mellon Univ Inst Robot 5000 Forbes Ave Pittsburgh PA 15213 USA Intel Res Pittsburgh Pittsburgh PA 15213 USA

ISBN: (纸本)9781424407064

Using domain knowledge to decompose difficult control problems is a widely used technique in robotics. Previous work has automated the process of identifying some qualitative behaviors of a system, finding a decomposition of the system based on that behavior, and constructing a control policy based on that decomposition. We introduce a novel method for auto matically finding decompositions of a task based on observing the behavior of a preexisting controller. Unlike previous work, these decompositions define reparameterizations of the state space that can permit simplified control of the system.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Robust dynamic programming for discounted infinite-horizon Markov decision processes with uncertain stationary transition matrice

Robust dynamic programming for discounted infinite-horizon M...

引用

ieee international symposium on approximate dynamic programming and reinforcement learning

作者： Li, Baohua Si, Jennie Arizona State Univ Dept Elect Engn Tempe AZ 85287 USA

ISBN: (纸本)9781424407064

In this paper, finite-state, Saite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimahty criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimahty criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：