检索结果-内蒙古大学图书馆

Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control

AUTOMATICA 2007年第3期43卷 473-481页

作者： Al-Tamimi, Asma Lewis, Frank L. Abu-Khalaf, Murad Univ Texas Automat & Robot Res Inst Arlington TX 76118 USA

In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x, u, w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming (ADP) model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the GARE of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft. (C) 2007 Elsevier Ltd. All rights reserved.

关键词： adaptive critics approximate dynamic programming zero-sum games policy iterations H-infinity optimal control Q-function Q-learning adaptive control

来源：评论

学校读者我要写书评

暂无评论

Kernel-based least squares policy iteration for reinforcement learning

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2007年第4期18卷 973-992页

作者： Xu, Xin Hu, Dewen Lu, Xicheng Natl Univ Def Technol Coll Mechatron & Automat Inst Automat Changsha 410073 Peoples R China Natl Univ Def Technol Coll Mechatron & Automat Dept Automat Control Changsha 410073 Peoples R China Natl Univ Def Technol Sch Comp Changsha 410073 Peoples R China

In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating a

关键词： approximate dynamic programming kernel methods least squares Markov decision problems (MDPs) reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Simulation-based design of dual-mode controller for non-linear processes

引用

CANADIAN JOURNAL OF CHEMICAL ENGINEERING 2007年第4期85卷 506-511页

作者： Lee, Jong Min Lee, Jay H. Univ Alberta Dept Chem & Mat Engn Edmonton AB T6G 2G6 Canada Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

This paper presents a simulation-based approach for designing a non-linear override control scheme to improve the performance of a local linear controller. The higher-level non-linear controller monitors the dynamic state of the system and calculates an override control action whenever the system is predicted to move outside an acceptable operating regime under the local controller. The design of the non-linear override controller is based on a cost-to-go function, which is constructed by using simulation or operation data. The cost-to-go function delineates the admissible region of state space within which the local controller is effective, thereby yielding a switching rule.

关键词： approximate dynamic programming non-linear predictive control switching controller cost-to-go function invariant sets

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

IEEE International Symposium on approximate dynamic programming and Reinforcement Learning

作者： Sztykgold, Alexandre Coppin, Gilles Hudry, Olivier GET ENST Bretagne LUSSI Dept CNRS TAMCICUMR 2872 Bretagne Germany GET ENST Bretagne Dept Comp Sci CNRS LTCI UMR 5141 Bretagne Germany

ISBN: (纸本)9781424407064

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to Temporal Differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues.

关键词： decision aid game theory graph theory viability theory Temporal Differences methods approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Continuous-time ADP for linear systems with partially unknown dynamics

Continuous-time ADP for linear systems with partially unknow...

引用

IEEE International Symposium on approximate dynamic programming and Reinforcement Learning

作者： Vrabie, Draguna Abu-Khalaf, Murad Lewis, Frank L. Wang, Youyi Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA Nanyang Technol Univ Sch Elect & Elect Engn Singapore Singapore

ISBN: (纸本)9781424407064

approximate dynamic programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model knowledge requirements. In this paper is presented a novel online adaptive critic (AC) scheme, based on approximate dynamic programming (ADP), to solve the infinite horizon optimal control problem for continuous-time dynamical systems;thus bringing together concepts from the fields of computational intelligence and control theory. Only partial knowledge about the system model is used, as knowledge about the plant internal dynamics is not needed. The method is thus useful to determine the optimal controller for plants with partially unknown dynamics. It is shown that the proposed iterative ADP algorithm is in fact a Quasi-Newton method to solve the underlying Algebraic Riccati Equation (ARE) of the optimal control problem. An initial gain that determines a stabilizing control policy is not required. In control theory terms, in this paper is developed a direct adaptive control algorithm for obtaining the optimal control solution without knowing the system A matrix.

关键词： approximate dynamic programming adaptive critics policy iterations V-learning

来源：评论

学校读者我要写书评

暂无评论

The single-node dynamic service scheduling and dispatching problem

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2006年第1期170卷 1-23页

作者： Dall'Orto, LC Crainic, TG Leal, JE Powell, WB Univ Quebec Ecole Sci Gest Dept Management & Technol Montreal PQ H3C 3P8 Canada Pontificia Univ Catolica Rio de Janeiro Dept Ind Engn Rio De Janeiro Brazil Univ Montreal Ctr Res Transportat Montreal PQ H3C 3J7 Canada Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

In this paper, we focus on a particular version of the dynamic service network design (DSND) problem, namely the case of a single-terminal that dispatches services to a number of customers and other terminals. We present a time-dependent, stochastic formulation that aims to optimize the problem over a given planning horizon, and propose a solution approach based on dynamic programming principles. We also present a static, single-period, formulation of the single-node problem that appears as a subproblem when addressing the time-dependent version and general service network design cases. Despite its apparent simplicity, it is still a network design problem and exact solution methods are not sufficiently fast. We therefore propose two tabu search meta-heuristics based on the ejection-chain concept. We also introduce a learning mechanism that takes advantage of experience gathered in repeated executions. Experiments with problem instances derived from real cases indicate that the proposed solution methods are efficient and yield good solutions. (c) 2004 Elsevier B.V. All rights reserved.

关键词： dynamic service network design stochastic formulation approximate dynamic programming tabu search ejection chains

来源：评论

学校读者我要写书评

暂无评论

A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems

引用

NEURAL NETWORKS 2006年第10期19卷 1648-1660页

作者： Padhi, Radhakant Unnikrishnan, Nishant Wang, Xiaohua Balakrishnan, S. N. Univ Missouri Rolla Dept Mech & Aerosp Engn Rolla MO 65409 USA Indian Inst Sci Dept Aerosp Engn Bangalore 560012 Karnataka India

Even though dynamic programming offers an optimal control solution in a state feedback form, the method is overwhelmed by computational and storage requirements. approximate dynamic programming implemented with an Adaptive Critic (AC) neural network structure has evolved as a powerful alternative technique that obviates the need for excessive computations and storage requirements in solving optimal control problems. In this paper, an improvement to the AC architecture, called the "Single Network Adaptive Critic (SNAC)" is presented. This approach is applicable to a wide class of nonlinear systems where the optimal control (stationary) equation can be explicitly expressed in terms of the state and costate variables. The selection of this terminology is guided by the fact that it eliminates the use of one neural network (namely the action network) that is part of a typical dual network AC setup. As a consequence, the SNAC architecture offers three potential advantages: a simpler architecture, lesser computational load and elimination of the approximation error associated with the eliminated network. In order to demonstrate these benefits and the control synthesis technique using SNAC, two problems have been solved with the AC and SNAC approaches and their computational performances are compared. One of these problems is a real-life Micro-Electro-Mechanical-system (MEMS) problem, which demonstrates that the SNAC technique is applicable to complex engineering systems. (c) 2006 Elsevier Ltd. All rights reserved.

关键词： optimal control nonlinear control approximate dynamic programming adaptive critic single network adaptive critic SNAC architecture

来源：评论

学校读者我要写书评

暂无评论

A parallelizable dynamic fleet management model with random travel times

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2006年第2期175卷 782-805页

作者： Topaloglu, H. | Cornell Univ Sch Operat Res & Ind Engn Ithaca NY 14853 USA

In this paper, we present a stochastic model for the dynamic fleet management problem with random travel times. Our approach decomposes the problem into time-staged subproblems by formulating it as a dynamic program and uses approximations of the value function. In order to deal with random travel times, the state variable of our dynamic program includes all individual decisions over a relevant portion of the history. We show how to approximate the value function in a tractable manner under this new high-dimensional state variable. Under our approximation scheme, the subproblem for each time period decomposes with respect to locations, making our model very appealing for large-scale applications. Numerical work shows that the proposed approach provides high-quality solutions and performs significantly better than standard benchmark methods. (c) 2005 Elsevier B.V. All rights reserved.

关键词： transportation logistics approximate dynamic programming fleet management distributed decision-making

来源：评论

学校读者我要写书评

暂无评论

Intelligent optimal control of excitation and turbine systems in power networks

Intelligent optimal control of excitation and turbine system...

引用

General Meeting of the Power-Engineering-Society

作者： Venayagamoorthy, G. K. Harley, R. G. Univ Missouri Dept Elect & Comp Engn Real Time Power & Intelligent Syst Lab Rolla MO 65409 USA Georgia Inst Technol Sch Elect & Comp Engn Atlanta GA 30332 USA

ISBN: (纸本)9781424404926

The increasing complexity of the modern power grid highlights the need for advanced modeling and control techniques for effective control of excitation and turbine systems. The crucial factors affecting the modern power systems today is voltage control and system stabilization during small and large disturbances. Simulation studies and real-time laboratory experimental studies carried out are described and the results show the successful control of the power system excitation and turbine systems with adaptive and optimal neurocontrol approaches. Performances of the neurocontrollers are compared with the conventional PI controllers for damping under different operating conditions for small and large disturbances.

关键词： adaptive critic designs approximate dynamic programming excitation control neural networks optimal control reinforcement learning turbine control

来源：评论

学校读者我要写书评

暂无评论

A self-learning call admission control scheme for CDMA cellular networks

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2005年第5期16卷 1219-1228页

作者： Liu, DR Zhang, Y Zhang, HG Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA Northeastern Univ Sch Informat Sci & Engn Liaoning 110004 Peoples R China

In the present paper, a call admission control scheme that can learn from the network environment and user behavior is developed for code division multiple access (CDMA) cellular networks that handle both voice and data services. The idea is built upon a novel learning control architecture with only a single module instead of two or three modules in adaptive critic designs (ACDs). The use of adaptive critic approach for call admission control in wireless cellular networks is new. The call admission controller can perform learning in real-time as well as in offline environments and the controller improves its performance as it gains more experience. Another important contribution in the present work is the choice of utility function for the present self-learning control approach which makes the present learning process much more efficient than existing learning control methods. The performance of our algorithm will be shown through computer simulation and compared with existing algorithms.

关键词： adaptive critic designs (ACDs) approximate dynamic programming call admission control code division multiple access (CDMA) cellular networks neural dynamic programming wireless networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：