检索结果-内蒙古大学图书馆

10th International Conference on Artificial Intelligence and Soft Computing (ICAISC 2010)

作者： Hendzel, Zenon Szuster, Marcin Rzeszow Univ Technol Dept Appl Mech & Robot PL-35959 Rzeszow Poland

ISBN: (纸本)9783642132315

In this paper a discrete tracking control algorithm for a non-holonomic two wheeled mobile robot (WMR) is presented. The basis of the control algorithm is an Adaptive Critic Design (ACD) in two model-based configurations: Heuristic dynamic programming (HDP) and Dual Heuristic programming (DHP). In proposed control algorithm Actor Critic structure, composed of two neural networks (NN), is supplied by a PD controller and a supervisory term derived from the Lyapunov stability theorem. The control algorithm works on-line and does not require preliminary learning. Verification of the proposed control algorithm was realized on a WMR. Pioneer-2DX.

关键词： approximate dynamic programming Dual Heuristic programming Heuristic dynamic programming Neural Networks Tracking Control Wheeled Mobile Robots

来源：评论

学校读者我要写书评

暂无评论

Supplying Renewable Energy to Deferrable Loads: Algorithms and Economic Analysis

Supplying Renewable Energy to Deferrable Loads: Algorithms a...

引用

IEEE-Power-and-Energy-Society General Meeting

作者： Papavasiliou, Anthony Oren, Shmuel S. Univ Calif Berkeley Dept Ind Engn & Operat Res Berkeley CA 94709 USA

ISBN: (纸本)9781424483570

In this paper we propose a direct coupling of renewable generation with deferrable demand in order to mitigate the unpredictable and non-controllable fluctuation of renewable power supply. We cast our problem in the form of a stochastic dynamic program and we characterize the value function of the problem in order to develop efficient solution methods. We develop and compare two algorithms for optimally supplying renewable power to time-flexible electricity loads in the presence of a spot market, backward dynamic programming and approximate dynamic programming. We describe how our proposition compares to price responsive demand in terms capacity gains and energy market revenues for renewable generators, and we determine the optimal capacity of deferrable demand which can be reliably coupled to renewable generation.

关键词： approximate dynamic programming deferrable loads dynamic programming dynamic programming economic analysis energy market revenues power generation economics power markets renewable energy sources renewable generators renewable power supply stochastic dynamic program stochastic programming time-flexible electricity loads

来源：评论

学校读者我要写书评

暂无评论

An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adaptive dynamic programming

引用

Acta Automatica Sinica 2010年第1期36卷 121-129页

作者： Qing-Lai WEI Hua-Guang ZHANG De-Rong LIU Yan ZHAO Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy of Sciences Beijing 100190 P. R. China School of Information Science and Engineering Northeastern University Shenyang 110004 P. R. China Department of Automatic Control Engineering Shenyang Institute of Engineering Shenyang 110136 P. R. China

Abstract In this paper, an optimal control scheme for a class of nonlinear systems with time delays in both state and control variables with respect to a quadratic performance index function is proposed using a new iterative adaptive dynamic programming (ADP) algorithm. By introducing a delay matrix function, the explicit expression of the optimal control is obtained using the dynamic programming theory and the optimal control can iteratively be obtained using the adaptive critic technique. Convergence analysis is presented to prove that the performance index function can reach the optimum by the proposed method. Neural networks are used to approximate the performance index function, compute the optimal control policy, solve delay matrix function, and model the nonlinear system, respectively, for facilitating the implementation of the iterative ADP algorithm. Two examples are given to demonstrate the validity of the proposed optimal control scheme.

关键词： Adaptive dynamic programming (ADP) approximate dynamic programming time delay optimal control nonlinear system neural networks

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes

引用

AUTOMATICA 2005年第7期41卷 1281-1288页

作者： Lee, JM Lee, JH Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

We propose two approximate dynamic programming (ADP)-based strategies for control of nonlinear processes using input-output data. In the first strategy, which we term 'J-Iearning,' one builds an empirical nonlinear model using closed-loop test data and performs dynamic programming with it to derive an improved control policy. In the second strategy, called 'Q-learning,' one tries to learn an improved control policy in a model-less manner. Compared to the conventional model predictive control approach, the new approach offers some practical advantages in using nonlinear empirical models for process control. Besides the potential reduction in the on-line computational burden, it offers a convenient way to control the degree of model extrapolation in the calculation of optimal control moves. One major difficulty associated with using an empirical model within the multi-step predictive control setting is that the model can be excessively extrapolated into regions of the state space where identification data were scarce or nonexistent, leading to performances far worse than predicted by the model. Within the proposed ADP-based strategies, this problem is handled by imposing a penalty term designed on the basis of local data distribution. A CSTR example is provided to illustrate the proposed approaches. (c) 2005 Elsevier Ltd. All rights reserved.

关键词： nonlinear model identification nonlinear model predictive control approximate dynamic programming NARX model reinforcement learning Q-learning

来源：评论

学校读者我要写书评

暂无评论

Online Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration

Online Solution of Nonlinear Two-Player Zero-Sum Games Using...

引用

2010 49th IEEE Conference on Decision and Control

作者： Kyriakos G. Vamvoudakis F.L. Lewis Automation and Robotics Research Institute University of Texas at Arlington 7300 Jack Newell Blvd. S. Fort Worth TX 76118 USA

ISBN: (纸本)9781424477456

In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm 'synchronous' zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.

关键词： Synchronous Zero-Sum Game Policy Iteration H-infinity Hamilton-Jacobi-Isaacs equation approximate dynamic programming Policy Iteration Nashequilibrium Persistence of Excitation

来源：评论

学校读者我要写书评

暂无评论

Session-Level Load Balancing for High-Dimensional Systems

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2009年第8期54卷 2018-2023页

作者： Roubos, Dennis Bhulai, Sandjai Vrije Univ Amsterdam Fac Sci NL-1081 HV Amsterdam Netherlands

Load balancing is critical for the performance of big server clusters. Although many load balancers are available for improving performance in parallel applications, the load-balancing problem is not fully solved yet. Recent advances in security and architecture design advocate load balancing on a session level. However, due to the high dimensionality of session-level load balancing, little attention has been paid to this new problem. In this paper, we formulate the session-level load-balancing problem as a Markov decision problem. Then, we use approximate dynamic programming to obtain approximate load-balancing policies that are scalable with the problem instance. Extensive numerical experiments show that the policies have nearly optimal performance.

关键词： approximate dynamic programming Markov decision processes session-level load balancing

来源：评论

学校读者我要写书评

暂无评论

Natural actor-critic algorithms

引用

AUTOMATICA 2009年第11期45卷 2471-2482页

作者： Bhatnagar, Shalabh Sutton, Richard S. Ghavamzadeh, Mohammad Lee, Mark Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India Univ Alberta Dept Comp Sci RLAI Lab Edmonton AB T6G 2E8 Canada INRIA Lille Nord Europe Team SequeL Lille France

We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. (C) 2009 Elsevier Ltd. All rights reserved.

关键词： Actor-critic reinforcement learning algorithms Policy-gradient methods approximate dynamic programming Function approximation Two-timescale stochastic approximation Temporal difference learning Natural gradient

来源：评论

学校读者我要写书评

暂无评论

Swarm-based approximate dynamic optimization process for discrete particle swarm optimization system

引用

INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION 2009年第1-2期1卷 61-70页

作者： Kang, Qi Wang, Lei Wu, Qidi Tongji Univ Dept Control Sci & Engn Shanghai 201804 Peoples R China

This paper presents a convergence analysis of particle swarm optimization system by treating it as a discrete-time linear time-variant system firstly. And then, based on the results of system convergence conditions, dynamic optimal control of a deterministic PSO system for parameters optimization is studied by using dynamic programming;and an approximate dynamic programming algorithm - swarm-based approximate dynamic programming (swarm-ADP) is proposed in this paper. Finally, numerical simulations proved the validated of this presented dynamic optimization method.

关键词： particle swarm optimization PSO approximate dynamic programming dynamic optimization

来源：评论

学校读者我要写书评

暂无评论

Coding and control for communication networks

引用

QUEUEING SYSTEMS 2009年第1-4期63卷 195-216页

作者： Chen, Wei Traskov, Danail Heindlmaier, Michael Medard, Muriel Meyn, Sean Ozdaglar, Asuman Univ Illinois Dept Elect & Comp Engn Urbana IL 61801 USA Univ Illinois Coordinated Sci Lab Urbana IL 61801 USA Tech Univ Munich Inst Comm Engn Munich Germany MIT Dept Elect Engn & Comp Sci Cambridge MA 02139 USA

The purpose of this paper is to survey techniques for constructing effective policies for controlling complex networks, and to extend these techniques to capture special features of wireless communication networks under different networking scenarios. Among the key questions addressed are: The relationship between static network equilibria, and dynamic network control. The effect of coding on control and delay through rate regions. Routing, scheduling, and admission control. These approximations are the basis of a specific formulation of an h-MaxWeight policy for network routing. Simulations show a 50% improvement in average delay performance as compared to methods used in current practice.

关键词： Routing Scheduling Networks Coding Information theory approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Controlled exploration of state space in off-line ADP and its application to stochastic shortest path problems

引用

COMPUTERS & CHEMICAL ENGINEERING 2009年第12期33卷 2111-2122页

作者： Pratikakis, Nikolaos E. Realff, Matthew J. Lee, Jay H. Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

This paper addresses the problem of finding a control policy that drives a generic discrete event stochastic system from an initial state to a set of goal states with a specified probability. The control policy is iteratively constructed via an approximate dynamic programming (ADP) technique over a small subset of the state space that is evolved via Monte Carlo simulations. The effect of certain user-chosen parameters on the performance of the algorithm is investigated The method is evaluated on several stochastic shortest path (SSP) examples and on a manufacturing job shop problem. We solve SSP problems that contain up to one million states to illustrate the scaling of computational and memory benefits with respect to the problem size. In the case of the manufacturing job shop example. the proposed ADP approach outperforms a traditional rolling horizon math programming approach. (C) 2009 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Markov decision process Discrete time stochastic systems Simulation Controlled exploration

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：