检索结果-内蒙古大学图书馆

AUTOMATICA 2009年第11期45卷 2471-2482页

作者： Bhatnagar, Shalabh Sutton, Richard S. Ghavamzadeh, Mohammad Lee, Mark Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India Univ Alberta Dept Comp Sci RLAI Lab Edmonton AB T6G 2E8 Canada INRIA Lille Nord Europe Team SequeL Lille France

We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. (C) 2009 Elsevier Ltd. All rights reserved.

关键词： Actor-critic reinforcement learning algorithms Policy-gradient methods approximate dynamic programming Function approximation Two-timescale stochastic approximation Temporal difference learning Natural gradient

来源：评论

学校读者我要写书评

暂无评论

Swarm-based approximate dynamic optimization process for discrete particle swarm optimization system

引用

INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION 2009年第1-2期1卷 61-70页

作者： Kang, Qi Wang, Lei Wu, Qidi Tongji Univ Dept Control Sci & Engn Shanghai 201804 Peoples R China

This paper presents a convergence analysis of particle swarm optimization system by treating it as a discrete-time linear time-variant system firstly. And then, based on the results of system convergence conditions, dynamic optimal control of a deterministic PSO system for parameters optimization is studied by using dynamic programming;and an approximate dynamic programming algorithm - swarm-based approximate dynamic programming (swarm-ADP) is proposed in this paper. Finally, numerical simulations proved the validated of this presented dynamic optimization method.

关键词： particle swarm optimization PSO approximate dynamic programming dynamic optimization

来源：评论

学校读者我要写书评

暂无评论

Coding and control for communication networks

引用

QUEUEING SYSTEMS 2009年第1-4期63卷 195-216页

作者： Chen, Wei Traskov, Danail Heindlmaier, Michael Medard, Muriel Meyn, Sean Ozdaglar, Asuman Univ Illinois Dept Elect & Comp Engn Urbana IL 61801 USA Univ Illinois Coordinated Sci Lab Urbana IL 61801 USA Tech Univ Munich Inst Comm Engn Munich Germany MIT Dept Elect Engn & Comp Sci Cambridge MA 02139 USA

The purpose of this paper is to survey techniques for constructing effective policies for controlling complex networks, and to extend these techniques to capture special features of wireless communication networks under different networking scenarios. Among the key questions addressed are: The relationship between static network equilibria, and dynamic network control. The effect of coding on control and delay through rate regions. Routing, scheduling, and admission control. These approximations are the basis of a specific formulation of an h-MaxWeight policy for network routing. Simulations show a 50% improvement in average delay performance as compared to methods used in current practice.

关键词： Routing Scheduling Networks Coding Information theory approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Controlled exploration of state space in off-line ADP and its application to stochastic shortest path problems

引用

COMPUTERS & CHEMICAL ENGINEERING 2009年第12期33卷 2111-2122页

作者： Pratikakis, Nikolaos E. Realff, Matthew J. Lee, Jay H. Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

This paper addresses the problem of finding a control policy that drives a generic discrete event stochastic system from an initial state to a set of goal states with a specified probability. The control policy is iteratively constructed via an approximate dynamic programming (ADP) technique over a small subset of the state space that is evolved via Monte Carlo simulations. The effect of certain user-chosen parameters on the performance of the algorithm is investigated The method is evaluated on several stochastic shortest path (SSP) examples and on a manufacturing job shop problem. We solve SSP problems that contain up to one million states to illustrate the scaling of computational and memory benefits with respect to the problem size. In the case of the manufacturing job shop example. the proposed ADP approach outperforms a traditional rolling horizon math programming approach. (C) 2009 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Markov decision process Discrete time stochastic systems Simulation Controlled exploration

来源：评论

学校读者我要写书评

暂无评论

Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2009年第9期20卷 1490-1503页

作者： Zhang, Huaguang Luo, Yanhong Liu, Derong Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Liaoning Peoples R China Chinese Acad Sci Key Lab Complex Syst & Intelligence Sci Inst Automat Beijing 100190 Peoples R China

In this paper, the near-optimal control problem for a class of nonlinear discrete-time systems with control constraints is solved by iterative adaptive dynamic programming algorithm. First, a novel nonquadratic performance functional is introduced to overcome the control constraints, and then an iterative adaptive dynamic programming algorithm is developed to solve the optimal feedback control problem of the original constrained system with convergence analysis. In the present control scheme, there are three neural networks used as parametric structures for facilitating the implementation of the iterative algorithm. Two examples are given to demonstrate the convergence and feasibility of the proposed optimal control scheme.

关键词： Adaptive dynamic programming approximate dynamic programming control constraints convergence analysis near-optimal control neural networks

来源：评论

学校读者我要写书评

暂无评论

Intelligence in the brain: A theory of how it works and how to build it

引用

NEURAL NETWORKS 2009年第3期22卷 200-212页

作者： Werbos, Paul J. Natl Sci Fdn ECCS Div Arlington VA 22230 USA

This paper presents a theory of how general-purpose learning-based intelligence is achieved in the mammal brain, and how we can replicate it. It reviews four generations of ever more powerful general-purpose learning designs in Adaptive, approximate dynamic programming (ADP), which includes reinforcement learning as a special case. It reviews empirical results which fit the theory, and suggests important new directions for research, within the scope of NSF's recent initiative on Cognitive Optimization and Prediction. The appendices suggest possible connections to the realms of human subjective experience, comparative cognitive neuroscience, and new challenges in electric power. The major challenge before us today in mathematical neural networks is to replicate the "mouse level", but the paper does contain a few thoughts about building, understanding and nourishing levels of general intelligence beyond the mouse. Published by Elsevier Ltd

关键词： Intelligence Reinforcement learning Backpropagation Cognitive prediction Adaptive critic ADP Adaptive dynamic programming approximate dynamic programming Complexity Creativity Neurocontrol Comparative neuropsychology Freud Robust Consciousness

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming STRATEGY FOR DUAL ADAPTIVE CONTROL

引用

IFAC Proceedings Volumes 2005年第1期38卷 459-464页

作者： Jong Min Lee Jay H. Lee School of Chemical and Biomolecular Engineering Georgia Institute of Technology Atlanta GA 30332 USA

An approximate dynamic programming (ADP) strategy for a dual adaptive control problem is presented. An optimal control policy of a dual adaptive control problem can be derived by solving a stochastic dynamic programming problem, which is computationally intractable using conventional solution methods that involve sampling of a complete hyperstate space. To solve the problem in a computationally amenable manner, we perform closed-loop simulations with different control policies to generate a data set that defines a subset of a hyperstate within which the Bellman equation is iterated. A local approximator with a penalty function is designed for estimation of cost-to-go values over the continuous hyperstate space. An integrating process with an unknown gain is used for illustration.

关键词： approximate dynamic programming dual control adaptive control stochastic dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Separable approximations for joint capacity control and overbooking decisions in network revenue management

引用

JOURNAL OF REVENUE AND PRICING MANAGEMENT 2009年第1期8卷 3-20页

作者： Erdelyi, Alexander Topaloglu, Huseyin Cornell Univ Sch Operat Res & Informat Engn Ithaca NY 14853 USA

We develop a network revenue management model to jointly make capacity control and overbooking decisions. Our approach is based on the observation that if the penalty cost of denying boarding to the reservations at the departure time were given by a separable function, then the dynamic programming formulation of the network revenue management problem would decompose by the itineraries and it could be solved by focusing on one itinerary at a time. Motivated by this observation, we use an iterative and simulation-based method to build separable approximations to the penalty cost that we incur at the departure time. Computational experiments compare our model with two benchmark strategies that are based on a deterministic linear programming formulation. The profits obtained by our model improve over those obtained by the benchmark strategies by about 3 per cent on the average, which is a significant figure in the network revenue management setting. For the test problems with tight leg capacities, the profit improvements can be as high as 13 per cent.

关键词： airline network revenue management overbooking approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Reinforcement Learning Control of a Real Mobile Robot Using approximate Policy Iteration

引用

6th International Symposium on Neural Networks

作者： Zhang, Pengchen Xu, Xin Liu, Chunming Yuan, Qiping Natl Univ Def Technol Inst Automat Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783642015120

Machine learning for mobile robots has attracted lots of research interests in recent years. However, there are still many challenges to apply learning techniques in real mobile robots, e.g., generalization ill Continuous spaces, learning efficiency and convergence, etc. In this paper, a reinforcement learning path-following control strategy based oil approximate policy iteration (API) is developed for a real mobile robot. It has some advantages such as optimized control policies call be obtained without Much a Priori knowledge oil dynamic models of mobile robot, etc. Two kinds of API-based control method. i.e.. API with linear approximation and API with kernel machines, are implemented ill the path following control task and the efficiency of the proposed control strategy is illustrated in the experimental studies oil the real mobile robot based oil the Pioneer3-AT platform. Experimental results verify that the API-based learning, controller has better convergence and path following accuracy compared to conventional PD control methods. Finally, the learning control performance of the two API methods is also evaluated and compared.

关键词： Mobile robots approximate policy iteration Reinforcement learning Path following approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

dynamic Portfolio Optimization for Utility-Based Models

Dynamic Portfolio Optimization for Utility-Based Models

引用

International Conference on Information and Financial Engineering

作者： Fulga, Cristinca INCREST Bucharest Dept Math R-79622 Bucharest Romania

ISBN: (纸本)9780769536064

Portfolio management deals with the allocation of wealth among different investment opportunities, considering investor's preferences on risk. In this paper we consider a multiperiod model where the investor rebalances a portfolio at the beginning of each period facing uncertainty associated with the prices of the assets at future dates. Models of this decision problem tend to become very large because of the dynamic structure and uncertainty. We present a multiple period portfolio model over a finite horizon with transaction costs, a risk averse utility function and the uncertainty modeled using the scenario approach. We propose a new method for efficiently solving real problems;the procedure utilizes stochastic programming combined with decomposition and approximating techniques. Solving the resulting optimization problem relies on approximate dynamic programming techniques. The technique used for solving the portfolio problem provides a method whose effectiveness is proved by the experimental results.

关键词： dynamic Portfolio Optimization Progressive Hedging Algorithm approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：