检索结果-内蒙古大学图书馆

Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 2012年第13期22卷 1460-1483页

作者： Vamvoudakis, Kyriakos G. Lewis, F. L. Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

The two-player zero-sum (ZS) game problem provides the solution to the bounded L2-gain problem and so is important for robust control. However, its solution depends on solving a design HamiltonJacobiIsaacs (HJI) equation, which is generally intractable for nonlinear systems. In this paper, we present an online adaptive learning algorithm based on policy iteration to solve the continuous-time two-player ZS game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real time an approximate local solution to the game HJI equation. This method finds, in real time, suitable approximations of the optimal value and the saddle point feedback control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic/disturbance structure that involves simultaneous continuous-time adaptation of critic, actor, and disturbance neural networks. We call this online gaming algorithm synchronous ZS game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor, and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm in solving the HJI equation online for a linear system and a complex nonlinear system. Copyright (c) 2011 John Wiley & Sons, Ltd.

关键词： synchronous zero-sum game policy iteration Hamilton-Jacobi-Isaacs equation approximate dynamic programming Nash equilibrium

来源：评论

学校读者我要写书评

暂无评论

Multi-rate control policies for elastic traffic in CDMA networks

引用

PERFORMANCE EVALUATION 2012年第10期69卷 510-523页

作者： Papadaki, Katerina Friderikos, Vasilis London Sch Econ Dept Management Management Sci Grp London WC2A 2AE England Kings Coll London Ctr Telecommun Res Div Engn London WC2R 2LS England

In this paper a rate control scheme for downlink packet transmission in CDMA networks is proposed based on both the queue lengths and the channel states of mobile users. We are interested in optimal rate allocation policies for throughput maximisation over time and thus we formulate the problem as a discrete stochastic dynamic program. This dynamic program is exponentially complex in the number of users, which renders it impractical and therefore we use an approximate dynamic programming (DP) algorithm to obtain suboptimal rate allocation policies in real time. The numerical results reveal that the proposed algorithm significantly outperforms a number of different baseline greedy heuristics. (c) 2012 Elsevier B.V. All rights reserved.

关键词： Rate control CDMA approximate dynamic programming Stochastic arrivals

来源：评论

学校读者我要写书评

暂无评论

Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming

引用

AUTOMATICA 2012年第8期48卷 1825-1832页

作者： Wang, Ding Liu, Derong Wei, Qinglai Zhao, Dongbin Jin, Ning Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

An intelligent-optimal control scheme for unknown nonaffine nonlinear discrete-time systems with discount factor in the cost function is developed in this paper. The iterative adaptive dynamic programming algorithm is introduced to solve the optimal control problem with convergence analysis. Then, the implementation of the iterative algorithm via globalized dual heuristic programming technique is presented by using three neural networks, which will approximate at each iteration the cost function, the control law, and the unknown nonlinear system, respectively. In addition, two simulation examples are provided to verify the effectiveness of the developed optimal control approach. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： Adaptive critic designs Adaptive dynamic programming approximate dynamic programming Globalized dual heuristic programming Intelligent control Neural network Optimal control

来源：评论

学校读者我要写书评

暂无评论

An iterative ∈-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state

引用

NEURAL NETWORKS 2012年 32卷 236-244页

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, a finite horizon iterative adaptive dynamic programming (ADP) algorithm is proposed to solve the optimal control problem for a class of discrete-time nonlinear systems with unfixed initial state. A new is an element of-optimal control algorithm based on the iterative ADP approach is proposed that makes the performance index function iteratively converge to the greatest lower bound of all performance indices within an error is an element of in finite time. The convergence analysis of the proposed ADP algorithm in terms of performance index function and control policy is conducted. The optimal number of control steps can also be obtained by the proposed is an element of-optimal control algorithm for the unfixed initial state. Neural networks are used to approximate the performance index function, and compute the optimal control policy, respectively, for facilitating the implementation of the is an element of-optimal control algorithm. Finally, a simulation example is given to show the effectiveness of the proposed method. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： Adaptive dynamic programming approximate dynamic programming is an element of-optimal control Finite horizon Neural networks

来源：评论

学校读者我要写书评

暂无评论

Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic programming

引用

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2012年第3期9卷 628-634页

作者： Liu, Derong Wang, Ding Zhao, Dongbin Wei, Qinglai Jin, Ning Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

In this paper, a neuro-optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is developed. The iterative adaptive dynamic programming algorithm using globalized dual heuristic programming technique is introduced to obtain the optimal controller with convergence analysis in terms of cost function and control law. In order to carry out the iterative algorithm, a neural network is constructed first to identify the unknown controlled system. Then, based on the learned system model, two other neural networks are employed as parametric structures to facilitate the implementation of the iterative algorithm, which aims at approximating at each iteration the cost function and its derivatives and the control law, respectively. Finally, a simulation example is provided to verify the effectiveness of the proposed optimal control approach.

关键词： Adaptive dynamic programming approximate dynamic programming globalized dual heuristic programming intelligent control neural networks optimal control

来源：评论

学校读者我要写书评

暂无评论

Developing green fleet management strategies: Repair/retrofit/replacement decisions under environmental regulation

引用

TRANSPORTATION RESEARCH PART A-POLICY AND PRACTICE 2012年第8期46卷 1216-1226页

作者： Stasko, Timon H. Gao, H. Oliver Cornell Univ Ithaca NY 14853 USA

The considerable cost of maintaining large fleets has generated interest in cost minimization strategies. With many related decisions, numerous constraints, and significant sources of uncertainty (e.g. vehicle breakdowns), fleet managers face complex dynamic optimization problems. Existing methodologies frequently make simplifying assumptions or fail to converge quickly for large problems. This paper presents an approximate dynamic programming approach for making vehicle purchase, resale, and retrofit decisions in a fleet setting with stochastic vehicle breakdowns. Value iteration is informed by dual variables from linear programs, as well as other bounds on vehicle shadow prices. Sample problems are based on a government fleet seeking to comply with emissions regulation. The model predicts the expected cost of compliance, the rules the fleet manager will use in deciding how to comply, and the regulation's impact on the value of vehicles in the fleet. Stricter regulation lowers the value of some vehicle categories while raising the value of others. Such insights can help guide regulators, as well as the fleet managers they oversee. The methodologies developed could be applied more broadly to general multi-asset replacement problems, many of which have similar structures. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： Fleet management Parallel asset replacement Vehicle replacement Emissions regulations approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

LEAST SQUARES TEMPORAL DIFFERENCE METHODS: AN ANALYSIS UNDER GENERAL CONDITIONS

引用

SIAM JOURNAL ON CONTROL AND OPTIMIZATION 2012年第6期50卷 3310-3343页

作者： Yu, Huizhen MIT LIDS Cambridge MA 02139 USA

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD(lambda), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain different from the one corresponding to the policy under evaluation. We establish for the discounted cost criterion that LSTD(lambda) converges almost surely under mild, minimal conditions. We also analyze other properties of the iterates involved in the algorithm, including convergence in mean and boundedness. Our analysis draws on theories of both finite space Markov chains and weak Feller Markov chains on a topological space. Our results can be applied to other temporal difference algorithms and MDP models. As examples, we give a convergence analysis of a TD(lambda) algorithm and extensions to MDP with compact state and action spaces, as well as a convergence proof of a new LSTD algorithm with state-dependent lambda-parameters.

关键词： Markov decision processes approximate dynamic programming temporal difference methods importance sampling Markov chains

来源：评论

学校读者我要写书评

暂无评论

dynamic multi-appointment patient scheduling for radiation therapy

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2012年第2期223卷 573-584页

作者： Saure, Antoine Patrick, Jonathan Tyldesley, Scott Puterman, Martin L. Univ British Columbia Sauder Sch Business Vancouver BC V6T 1Z2 Canada Univ Ottawa Telfer Sch Management Ottawa ON K1N 6N5 Canada British Columbia Canc Agcy Vancouver BC V5Z 4E6 Canada

Seeking to reduce the potential impact of delays on radiation therapy cancer patients such as psychological distress, deterioration in quality of life and decreased cancer control and survival, and motivated by inefficiencies in the use of expensive resources, we undertook a study of scheduling practices at the British Columbia Cancer Agency (BCCA). As a result, we formulated and solved a discounted infinite-horizon Markov decision process for scheduling cancer treatments in radiation therapy units. The main purpose of this model is to identify good policies for allocating available treatment capacity to incoming demand, while reducing wait times in a cost-effective manner. We use an affine architecture to approximate the value function in our formulation and solve an equivalent linear programming model through column generation to obtain an approximate optimal policy for this problem. The benefits from the proposed method are evaluated by simulating its performance for a practical example based on data provided by the BCCA. (C) 2012 Elsevier B.V. All rights reserved.

关键词： Patient scheduling OR in health services Markov decision processes Linear programming approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Computing Near-Optimal Policies in Generalized Joint Replenishment

引用

INFORMS JOURNAL ON COMPUTING 2012年第1期24卷 148-164页

作者： Adelman, Daniel Klabjan, Diego Univ Chicago Booth Sch Business Chicago IL 60637 USA Northwestern Univ Dept Ind Engn & Management Sci Evanston IL 60208 USA

We provide a practical methodology for solving the generalized joint replenishment (GJR) problem, based on a mathematical programming approach to approximate dynamic programming. We show how to automatically generate a value function approximation basis built upon piecewise-linear ridge functions by developing and exploiting a theoretical connection with the problem of finding optimal cyclic schedules. We provide a variant of the algorithm that is effective in practice, and we exploit the special structure of the GJR problem to provide a coherent, implementable framework.

关键词： approximate dynamic programming piecewise-linear ridge functions generalized joint replenishment

来源：评论

学校读者我要写书评

暂无评论

Optimal Controller Design Algorithm For Non-Affine in Input Discrete-Time Nonlinear System

引用

JORDAN JOURNAL OF MECHANICAL AND INDUSTRIAL ENGINEERING 2012年第2期6卷 155-161页

作者： Al-Tamimi, A. Hashemite Univ Dept Mech Engn Zarqa Jordan

Convergence is proven of the value-iteration-based algorithm to find the optimal controller in the case of general non-affine in input nonlinear systems. That is, it is shown that algorithm converges to the optimal control and the optimal value function. It is assumed that at each iteration the value and action update equations can be exactly solved. Then two standard neural networks (NN) are used: a critic NN is used to approximate the value function while an action network is used to approximate the optimal control policy. (C) 2012 Jordan Journal of Mechanical and Industrial Engineering. All rights reserved

关键词： Optimal control Adaptive critics approximate dynamic programming Hamilton Jacobi Bellman Value iteration Policy iteration

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：