检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Daniel R. Jiang Thuy V. Pham Warren B. Powell Daniel F. Salas Warren R. Scott Department of Electrical & Electronics Enzineering Dehradun India Graphic Era University Dehradun India School of Rlectronics Dehradun India Graphic Era Hill University Bhimtal India

As more renewable, yet volatile, forms of energy like solar and wind are being incorporated into the grid, the problem of finding optimal control policies for energy storage is becoming increasingly important. These sequential decision problems are often modeled as stochastic dynamic programs, but when the state space becomes large, traditional (exact) techniques such as backward induction, policy iteration, or value iteration quickly become computationally intractable. Approximate dynamic programming (ADP) thus becomes a natural solution technique for solving these problems to near-optimality using significantly fewer computational resources. In this paper, we compare the performance of the following: various approximation architectures with approximate policy iteration (API), approximate value iteration (AVI) with structured lookup table, and direct policy search on a benchmarked energy storage problem (i.e., the optimal solution is computable).

关键词： Function approximation Energy storage Benchmark testing Mathematical model Approximation algorithms Equations

来源：评论

学校读者我要写书评

暂无评论

Multi-objective reinforcement learning for AUV thruster failure recovery

Multi-objective reinforcement learning for AUV thruster fail...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Seyed Reza Ahmadzadeh Petar Kormushev Darwin G. Caldwell Department of Advanced Robotics Istituto Italiano di Tecnologia Genova

This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes under-actuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.

关键词： Vectors Optimization Sociology Statistics Vehicle dynamics Trajectory Vehicles

来源：评论

学校读者我要写书评

暂无评论

Pseudo-MDPs and factored linear action models

Pseudo-MDPs and factored linear action models

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hengshuai Yao Csaba Szepesvári Bernardo Ávila Pires Xinhua Zhang Department of Computing Science University of Alberta Edmonton Alberta Canada Machine Learning Research Group National ICT Australia Sydney and Canberra Australia

In this paper we introduce the concept of pseudo-MDPs to develop abstractions. Pseudo-MDPs relax the requirement that the transition kernel has to be a probability kernel. We show that the new framework captures many existing abstractions. We also introduce the concept of factored linear action models; a special case. Again, the relation of factored linear action models and existing works are discussed. We use the general framework to develop a theory for bounding the suboptimality of policies derived from pseudo-MDPs. Specializing the framework, we recover existing results. We give a leastsquares approach and a constrained optimization approach of learning the factored linear model as well as efficient computation methods. We demonstrate that the constrained optimization approach gives better performance than the least-squares approach with normalization.

关键词： Kernel Approximation methods Computational modeling Mathematical model Equations Feature extraction Optimization

来源：评论

学校读者我要写书评

暂无评论

Accelerated gradient temporal difference learning algorithms

Accelerated gradient temporal difference learning algorithms

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Dominik Meyer Rémy Degenne Ahmed Omrane Hao Shen Institute for Data Processing Technische Universität München Germany

In this paper we study Temporal Difference (TD) learning with linear value function approximation. The classic TD algorithm is known to be unstable with linear function approximation and off-policy learning. Recently developed Gradient TD (GTD) algorithms have addressed this problem successfully. Despite their prominent properties of good scalability and convergence to correct solutions, they inherit the potential weakness of slow convergence as they are a stochastic gradient descent algorithm. Accelerated stochastic gradient descent algorithms have been developed to speed up convergence, while still keeping computational complexity low. In this work, we develop an accelerated stochastic gradient descent method for minimizing the Mean Squared Projected Bellman Error (MSPBE), and derive a bound for the Lipschitz constant of the gradient of the MSPBE, which plays a critical role in our proposed accelerated GTD algorithms. Our comprehensive numerical experiments demonstrate promising performance in solving the policy evaluation problem, in comparison to the GTD]algorithm family. In particular, accelerated TDC surpasses state-of-the-art algorithms.

关键词： Acceleration Convergence Function approximation Approximation algorithms Vectors Radio access networks

来源：评论

学校读者我要写书评

暂无评论

A two stage learning technique for dual learning in the pursuit-evasion differential game

A two stage learning technique for dual learning in the purs...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Ahmad A. Al-Talabi Howard M. Schwartz Mechatronics Engineering Department Baghdad University Baghdad Iraq Department of Systems and Computer Engineering Carleton University Ottawa ON Canada

This paper addresses the case of dual learning in the pursuit-evasion (PE) differential game and examines how fast the players can learn their default control strategies. The players should learn their default control strategies simultaneously by interacting with each other. Each player's learning process depends on the rewards received from its environment. The learning process is implemented using a two stage learning algorithm that combines the particle swarm optimization (PSO)-based fuzzy logic control (FLC) algorithm with the Q-learning fuzzy inference system (QFIS) algorithm. The PSO algorithm is used as a global optimizer to autonomously tune the parameters of a fuzzy logic controller whereas the QFIS algorithm is used as a local optimizer. The two stage learning algorithm is compared through simulation with the default control strategy, the PSO-based FLC algorithm, and the QFIS algorithm. Simulation results show that the players are able to learn their default control strategies. Also, it shows that the two stage learning algorithm outperforms the PSO-based FLC algorithm and the QFIS algorithm with respect to the learning time.

关键词： Games Fuzzy logic Inference algorithms Approximation algorithms Sociology Statistics Tuning

来源：评论

学校读者我要写书评

暂无评论

Clipping in Neurocontrol by adaptive dynamic programming

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第10期25卷 1909-1920页

作者： Fairbank, Michael Prokhorov, Danil Alonso, Eduardo City Univ London Sch Informat Dept Comp Sci London EC1V OHB England Toyota Res Inst NA Ann Arbor MI 48105 USA

In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamic programming, temporal differences learning, or policy-gradient learning algorithms.

关键词： Backpropagation through time (BPTT) clipping dual heuristic programming (DHP) neurocontrol value-gradient learning

来源：评论

学校读者我要写书评

暂无评论

2014 ieee International symposium on Intelligent Control, ISIC 2014

2014 IEEE International Symposium on Intelligent Control, IS...

引用

2014 ieee International symposium on Intelligent Control, ISIC 2014

ISBN: (纸本)9781479974061

The proceedings contain 56 papers. The topics discussed include: consensus with convergence rate in directed networks with multiple non-differentiable input delays;differentiated consensuses in a stochastic network with priorities;exponential synchronization for a new class of complex dynamical network with PIPC and hybrid TVD;output synchronization of uncertain nonlinear multi-agent systems with relative degree one;optimality of consensus protocols for multi-agent systems with interaction;robust synchronization of directed Lur'e networks with incremental nonlinearities;robust adaptive dynamic programming for continuous-time linear stochastic systems;simple adaptive output control of linear systems;passivity based stabilization of nonlinear 2D systems with application to iterative learning control;dual adaptive control of bimanual manipulation with online fuzzy parameter tuning;and on the intrinsic coordinatability of network control systems.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Beyond exponential utility functions: A variance-adjusted approach for risk-averse reinforcement learning

Beyond exponential utility functions: A variance-adjusted ap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Abhijit A. Gosavi Sajal K. Das Susan L. Murray Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla MO Department of Computer Science Missouri University of Science and Technology Rolla MO

Utility theory has served as a bedrock for modeling risk in economics. Where risk is involved in decision-making, for solving Markov decision processes (MDPs) via utility theory, the exponential utility (EU) function has been used in the literature as an objective function for capturing risk-averse behavior. The EU function framework uses a so-called risk-averseness coefficient (RAC) that seeks to quantify the risk appetite of the decision-maker. Unfortunately, as we show in this paper, the EU framework suffers from computational deficiencies that prevent it from being useful in practice for solution methods based on reinforcement learning (RL). In particular, the value function becomes very large and typically the computer overflows. We provide a simple example to demonstrate this. Further, we show empirically how a variance-adjusted (VA) approach, which approximates the EU function objective for reasonable values of the RAC, can be used in the RL algorithm. The VA framework in a sense has two objectives: maximize expected returns and minimize variance. We conduct empirical studies on a VA-based RL algorithm on the semi-MDP (SMDP), which is a more general version of the MDP. We conclude with a mathematical proof of the boundedness of the iterates in our algorithm.

关键词： Equations Linear programming Mathematical model Measurement Markov processes learning (artificial intelligence) Computers

来源：评论

学校读者我要写书评

暂无评论

Design and real-time implementation of optimal power system wide area system-centric controller based on temporal difference learning

Design and real-time implementation of optimal power system ...

引用

2014 ieee Industry Application Society Annual Meeting, IAS 2014

作者： Yousefian, Reza Kamalasadan, Sukumar Department of Electrical and Computer Engineering University of North Carolina at Charlotte CharlotteNC United States

ISBN: (纸本)9781479922888

In this paper a new method for designing and implementing coordinated wide area controller architecture is presented and tested using real-time digital simulation on a benchmark two area power system model for improved power system dynamic stability. The algorithm is an optimal Wide Area System-Centric Controller and Observer (WASCCO) based on reinforcement and temporal difference learning which allows the system to learn from interaction and predict future states. The controller design uses a powerful technique of the adaptive critic design (ACD) family called dual heuristic programming (DHP). The DHP controllers training and testing are implemented on the Innovative Integration Picolo card consisting of the TMS320C28335 processor. The main advantage of this design is its ability to learn from the past using eligibility traces and predict the optimal trajectory through temporal difference learning in the format of Receding Horizon Control(RHC). Results on a two area system provides better response compared to conventional schemes. © 2014 ieee.

关键词： Controllers

来源：评论

学校读者我要写书评

暂无评论

symposium on adaptive dynamic programming and reinforcement learning (ieee ADPRL 2011)

Symposium on adaptive dynamic programming and reinforcement ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

ADPRL 2011 is the third ieee International symposium on Approximate dynamic programming and reinforcement learning. The area of approximate dynamic programming and reinforcement learning is a fusion of a number of research areas in engineering, mathematics, artificial intelligence, operations research, and systems and control theory. This symposium brings together researchers from different disciplines and will provide a remarkable opportunity for the academic and industrial community to address new challenges, share innovative yet practical solutions, and define promising future research directions.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：