检索结果-内蒙古大学图书馆

Robust approximate Bilinear programming for Value Function Approximation

JOURNAL OF MACHINE LEARNING RESEARCH 2011年第10期12卷 3027-3063页

作者： Petrik, Marek Zilberstein, Shlomo IBM Corp Thomas J Watson Res Ctr Yorktown Hts NY 10598 USA Univ Massachusetts Dept Comp Sci Amherst MA 01003 USA

Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems.

关键词： value function approximation approximate dynamic programming Markov decision processes

来源：评论

学校读者我要写书评

暂无评论

Near-optimal Tracking Control of a Nonholonomic Mobile Robot with Uncertainties

引用

INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS 2012年第3期9卷

作者： Wang, Kai Beihang Univ BUAA Dept Syst & Control Beihang Peoples R China

A combined kinematic/torque control law is developed by using a backstepping design approach for a nonholonomic mobile robot with two driving wheels mounted on the same axis to track a reference trajectory. The auxiliary velocity control inputs are designed for the kinematic steering system to make the posture error asymptotically stable. Next, a computed-torque controller is designed such that the mobile robot's velocities converge on the given velocity inputs in an optimal manner by converting the tracking control problem into the regulation problem whereby the uncertainties in the dynamics of mobile robots are considered. The proposed online and forward-in-time policy iteration (PI) algorithm based on approximate dynamic programming (ADP) is used to solve the optimal control problem with unknown internal dynamics by using single neural networks (NNs) to approximate the cost function. Afterwards, the near-optimal control policy can be computed directly according to the cost function, which removes the action network appearing in the ordinary ADP method. The stability of the dynamical extension system is demonstrated using Lyapunov methods. The simulation results are provided to demonstrate the effectiveness of the proposed approach.

关键词： Backstepping Nonholonomic approximate dynamic programming Policy Iteration Lyapunov Methods

来源：评论

学校读者我要写书评

暂无评论

Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof

Discrete-time nonlinear HJB solution using approximate dynam...

引用

IEEE International Symposium on approximate dynamic programming and Reinforcement Learning

作者： Al-Tamimi, Asma Lewis, Frank Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424407064

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely Heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used- one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the Algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

关键词： adaptive critics approximate dynamic programming HJB policy iterations.

来源：评论

学校读者我要写书评

暂无评论

Virtual Generators: Simplified Online Power System Representations for Wide-Area Damping Control

Virtual Generators: Simplified Online Power System Represent...

引用

IEEE Power and Energy Society General Meeting

作者： Diogenes Molina Jiaqi Liang Ronald G. Harley Ganesh Kumar Venayagamoorthy Intelligent Power Infrastructure Consortium Department of Electrical and Computer Engineering Georgia Institute of Technology Atlanta GA 30332 USA Holcombe Department of Electrical and Computer Engineering Clemson University Clemson SC 29634 USA

ISBN: (纸本)9781467327275

This paper introduces a new concept called a Virtual Generator (VG). VGs are simplified representations of groups of coherent synchronous generators in a power system. They resemble commonly used power system dynamic equivalents obtained via generator aggregation techniques. Traditionally power system dynamic equivalents are developed offline, fixed, and used to replace large portions of the system that are considered external to the portion of the system being analyzed in detail. In contrast, VGs are calculated online, are not limited to representing external areas of the system being analyzed/controlled, and do not replace any portion of the power system. Instead, they allow wide-area damping controllers (WADCs) to exploit the realization that a group of coherent synchronous generators in a power system can be controlled as a single generating unit for achieving wide-area damping control objectives. The implementation of VGs is made possible by the availability of Wide-Area Measurements (WAMs) from Phasor Measurement Units (PMUs). To the authors' knowledge, this is the first time that the use of power system equivalencing techniques has been extended to real-time WADC. Simulation studies carried out on the 68-bus New England/New York power system demonstrate that intelligent controllers developed using VGs can significantly improve the stability of a power system by effectively damping low-frequency interarea oscillations.

关键词： virtual generator power system stabilizer wide-area control power system equivalents intelligent control approximate dynamic programming adaptive critic designs generator coherency interarea oscillations power systems damped control dynamos intelligent controller Phasor measurement units Power system dynamics Generating sets Synchronous generators representations Power system stability

来源：评论

学校读者我要写书评

暂无评论

Satisficing vs exploring when learning a constrained environment

Satisficing vs exploring when learning a constrained environ...

引用

International Conference on Soft Computing and Intelligent Systems

作者： Stephen Shervais Thaddeus T. Shannon College of Business and Public Administration Eastern Washington University Systems Science Program Portland State University

ISBN: (纸本)9781467327428

Satisficing is an efficient strategy for applying existing knowledge in a complex, constrained, environment. We present a set of agent-based simulations that demonstrate a higher payoff for satisficing strategies than for exploring strategies when using approximate dynamic programming methods for learning complex environments. In our constrained learning environment, satisficing agents outperformed exploring agent by approximately six percent, in terms of the number of tasks completed.

关键词： Component Satisficing approximate dynamic programming Q learning Agent-based simulation

来源：评论

学校读者我要写书评

暂无评论

Optimal Control of Unknown Discrete-Time Nonlinear Systems with Constrained Inputs Using GDHP Technique

Optimal Control of Unknown Discrete-Time Nonlinear Systems w...

引用

第三十一届中国控制会议

作者： LIU Derong,WANG Ding,LI Hongliang State Key Laboratory of Management and Control for Complex Systems Institute of Automation,Chinese Academy of Sciences, Beijing 100190,P.R.China

The adaptive dynamic programming(ADP) approach is employed to design an optimal controller for unknown discrete-time nonlinear systems with control ***,a neural network is constructed to identify the unknown dynamical system with stability ***,the iterative ADP algorithm is developed to solve the optimal control problem with convergence ***,two other neural networks are introduced to approximate the cost function and its derivative and the control law,under the framework of globalized dual heuristic programming ***,two simulation examples are included to verify the theoretical results.

关键词： Adaptive dynamic programming approximate dynamic programming Control constraints Neural networks Optimal control System identification

来源：评论

学校读者我要写书评

暂无评论

Neural-Network-Based Optimal Control for Discrete-Time Nonlinear Systems Using General Value Iteration

Neural-Network-Based Optimal Control for Discrete-Time Nonli...

引用

第三十一届中国控制会议

作者： LI Hongliang,LIU Derong,and WANG Ding State Key Laboratory of Management and Control for Complex Systems Institute of Automation,Chinese Academy of Sciences,Beijing 100190,P.R.China

In this paper,we propose a novel adaptive dynamic programming(ADP) scheme based on general value iteration to obtain near optimal control for discrete-time nonlinear systems with continuous state and control ***,the selection of initial value function is different from the traditional value iteration,and a new method is introduced to demonstrate the convergence property and convergence speed of the value ***,the control law obtained at each iteration can stabilize the system under some *** last,three neural networks with Levenberg-Marquardt training algorithm are used to approximate the unknown nonlinear system,the value function and the optimal control *** simulation example is presented to demonstrate the effectiveness of the present scheme.

关键词： Adaptive dynamic programming approximate dynamic programming optimal control value iteration neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

dynamic Asset Allocation Approaches for Counter- Piracy Operations

Dynamic Asset Allocation Approaches for Counter- Piracy Oper...

引用

International Conference on Information Fusion

作者： Woosun An Diego Fernando Martinez Ayala David Sidoti Manisha Mishra Xu Han Krishna R. Pattipati Eva D. Regnier David L. Kleinman James A. Hansen Dept. Electrical and Computer Engineering University of Connecticut Connecticut United States Naval Postgraduate School California United States Naval Research Laboratory California United States

ISBN: (纸本)9780982443859

Piracy on the high seas is a problem of world-wide concern. In response to this threat, the US Navy has developed a visualization tool known as the Pirate Attack Risk Surface (PARS) that integrates intelligence data, commercial shipping routes, and meteorological and oceanographic (METOC) information to predict regions where pirates may be present and where they may strike next. This paper proposes an algorithmic augmentation or add-on to PARS that allocates interdiction and surveillance assets so as to minimize the likelihood of a successful pirate attack over a fixed planning horizon. This augmentation, viewed as a tool for human planners, can be mapped closely to the decision support layer of the Battlespace on Demand (BonD) framework [32]. Our solution approach decomposes this NPhard optimization problem into two sequential phases. In Phase I, we solve the problem of allocating only the interdiction assets, such that regions with high cumulative probability of attack over the planning horizon are maximally covered. In Phase II, we solve the surveillance problem, where the area not covered by interdiction assets is partitioned into non-overlapping search regions (e.g., rectangular boxes) and assigned to a set of surveillance assets to maximize the cumulative detection probability over the planning horizon. In order to overcome the curse of dimensionality associated with dynamic programming (DP), we propose a Gauss-Seidel algorithm coupled with a rollout strategy for the interdiction problem. For the surveillance problem, we propose a partitioning algorithm coupled with an asymmetric assignment algorithm for allocating assets to the partitioned regions. Once the surveillance assets are assigned to search regions, the search path for each asset is determined based on a specific search strategy. The proposed algorithms are illustrated using a hypothetical scenario for conducting counterpiracy operations in a given Area of Responsibility (AOR).

关键词： component: Resource management problem Search problem Partitioning algorithm approximate dynamic programming Allocation problem Rollout Gauss-Seidel iteration

来源：评论

学校读者我要写书评

暂无评论

dynamic server allocation at parallel queues

引用

IIE TRANSACTIONS 2011年第12期43卷 863-877页

作者： Martonosi, Susan E. Harvey Mudd Coll Claremont CA 91711 USA

This article explores whether dynamically reassigning servers to parallel queues in response to queue imbalances can reduce average waiting time in those queues. approximate dynamic programming methods are used to determine when servers should be switched, and the performance of such dynamic allocations is compared to that of a pre-scheduled deterministic allocation. The proposed method is tested on both synthetic data and data from airport security checkpoints at Boston Logan International Airport. It is found that in situations where the uncertainty in customer arrival rates is significant, dynamically reallocating servers can substantially reduce waiting time. Moreover, it is found that intuitive switching strategies that are optimal for queues with homogeneous entry rates are not optimal in this setting.

关键词： Control of queues fluid queues approximate dynamic programming dynamic server allocation workforce management

来源：评论

学校读者我要写书评

暂无评论

ACCOUNTING RISK IN MULTISTAGE STOCHASTIC PROBLEMS USING approximate dynamic programming

引用

IFAC Proceedings Volumes 2007年第5期40卷 153-158页

作者： Nikolaos E. Pratikakis Matthew J. Realff Jay H. Lee Chemical and Biomolecular Engineering Georgia Institute of Technology311 Ferst Drive Atlanta GA 30332-0100 USA

This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected performance and downside risk represented by (CVαR α ). The modified stage-wise utility function is used within dynamic programming to generate a set of policies representing different levels of the trade-off. The approach is demonstrated in a shortest path optimal control problem and a project management problem modeled as constrained MDP. To address a more complex management problem, we utilize the Real Time approximate dynamic programming algorithm.

关键词： Real Time dynamic programming Risk Markov Decision Problems approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：