检索结果-内蒙古大学图书馆

approximate Linear programming for Average Cost MDPs

MATHEMATICS OF OPERATIONS RESEARCH 2013年第3期38卷 535-544页

作者： Veatch, Michael H. Gordon Coll Dept Math Wenham MA 01984 USA

We consider the linear programming approach to approximate dynamic programming with an average cost objective and a finite state space. Using a Lagrangian form of the linear program (LP), the average cost error is shown to be a multiple of the best fit differential cost error. This result is analogous to previous error bounds for a discounted cost objective. Second, bounds are derived for average cost error and performance of the policy generated from the LP that involve the mixing time of the Markov decision process (MDP) under this policy or the optimal policy. These results improve on a previous performance bound involving mixing times.

关键词： approximate dynamic programming average cost Markov decision processes approximate linear programming

来源：评论

学校读者我要写书评

暂无评论

Adaptive Traffic Signal Control for Multi-intersection Based on Microscopic Model

Adaptive Traffic Signal Control for Multi-intersection Based...

引用

International Conference on Tools with Artificial Intelligence

作者： Biao Yin Mahjoub Dridi Abdellah El Moudni Laboratoire IRTES-SeT Université de Technologie de Belfort-Montbéliard (UTBM) Belfort France

ISBN: (纸本)9781509001644

In this paper, we mainly propose an online learning method for adaptive traffic signal control in a multi-intersection system. The method uses approximate dynamic programming (ADP) to achieve a near-optimal solution of the signal optimization in a distributed network, which is modeled in a microscopic way. The traffic network loading model and traffic signal control model are presented to serve as the basis of discrete-time control environment. The learning process of linear function approximation in ADP approach adopts the tunable parameters of the traffic states, including the vehicle queue length and the signal indication. ADP overcomes the computational complexity, which usually appears in large-scale problems solved by exact algorithms, such as dynamic programming. Moreover, the proposed adaptive phase sequence (APS) mode improves the performance by comparing with other control methods. The results in simulation show that our method performs quite well for adaptive traffic signal control problem.

关键词： adaptive signal control multi-intersection approximate dynamic programming adaptive phase sequence

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming for link scheduling in wireless mesh networks

引用

COMPUTERS & OPERATIONS RESEARCH 2008年第12期35卷 3848-3859页

作者： Papadaki, Katerina Friderikos, Vasilis London Sch Econ Dept Operat Res London WC2A 2AE England Kings Coll London Ctr Telecommun Res London WC2R 2LS England

In this paper a novel interference-based formulation and solution methodology for the problem of link scheduling in wireless mesh networks is proposed. Traditionally, this problem has been formulated as a deterministic integer program, which has been shown to be NP-hard. The proposed formulation is based on dynamic programming and allows greater flexibility since dynamic and stochastic components of the problem can be embedded into the optimization framework. By temporal decomposition we reduce the size of the integer program and using approximate dynamic programming (ADP) methods we tackle the curse of dimensionality. The numerical results reveal that the proposed algorithm outperforms well-known heuristics under different network topologies. Finally, the proposed ADP methodology can be used not only as an upper bound but also as a generic framework where different heuristics can be integrated. (c) 2007 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming scheduling wireless mesh networks

来源：评论

学校读者我要写书评

暂无评论

approximate modified policy iteration and its application to the game of Tetris

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2015年第1期16卷

作者： Bruno Scherrer Mohammad Ghavamzadeh Victor Gabillon Boris Lesner Matthieu Geist INRIA Nancy-Grand Est Team Maia Vandœuvre-ls-Nancy France Adobe Research & INRIA Lille San Jose CA INRIA Lille-Nord Europe Team SequeL Villeneuve d'Ascq France CentraleSupélec IMS-MaLIS Research Group & UMI (GeorgiaTech-CNRS) Metz France

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.

关键词： Markov decision processes approximate dynamic programming finite-sample analysis game of tetris performance bounds reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive dynamic programming

引用

IEEE TRANSACTIONS ON CYBERNETICS 2014年第12期44卷 2820-2833页

作者： Wei, Qinglai Wang, Fei-Yue Liu, Derong Yang, Xiong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming approximation error neural networks neuro-dynamic programming nonlinear systems optimal control reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014年第3期25卷 635-641页

作者： Xu, Bin Yang, Chenguang Shi, Zhongke Northwestern Polytech Univ Sch Automat Xian 710072 Peoples R China Univ Plymouth Sch Comp & Math Plymouth PL4 8AA Devon England Beijing Inst Technol Sch Automat Beijing 100086 Peoples R China

In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic NN control architecture includes two NNs, whereas the critic NN is used to approximate the strategic utility function, and the action NN is employed to minimize both the strategic utility function and the tracking error. A deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit. The uniformly ultimate boundedness of closed-loop signals is shown via Lyapunov stability analysis. Simulation results are presented to demonstrate the effectiveness of the proposed control.

关键词： approximate dynamic programming discrete-time system output feedback control pure-feedback system radial basis function neural network (RBF NN)

来源：评论

学校读者我要写书评

暂无评论

A particle-based policy for the optimal control of Markov decision processes

引用

IFAC Proceedings Volumes 2014年第3期47卷 10518-10523页

作者： M. Pirotta G. Manganini L. Piroddi M. Prandini M. Restelli Dipartimento di Elettronica Informazione e Bioingegneria Politecnico di Milano Piazza Leonardo da Vinci 32 20133 Milano Italy

When the state dimension is large, classical approximate dynamic programming techniques may become computationally unfeasible, since the complexity of the algorithm grows exponentially with the state space size (curse of dimensionality). Policy search techniques are able to overcome this problem because, instead of estimating the value function over the entire state space, they search for the optimal control policy in a restricted parameterized policy space. This paper presents a new policy parametrization that exploits a single point (particle) to represent an entire region of the state space and can be tuned through a recently introduced policy gradient method with parameter-based exploration. Experiments demonstrate the superior performance of the proposed approach in high dimensional environments.

关键词： Markov decision processes Stochastic optimal control approximate dynamic programming Reinforcement learning Policy search

来源：评论

学校读者我要写书评

暂无评论

Nearly Optimal Control Scheme for Discrete-Time Nonlinear Systems With Finite Approximation Errors Using Generalized Value Iteration Algorithm

引用

IFAC Proceedings Volumes 2014年第3期47卷 4134-4139页

作者： Qinglai Wei Derong Liu The State Key Laboratory of Management and Control for Complex Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 China (Tel: +86-10-82544761 Fax: +86-10-82544799

In this paper, a new generalized value iteration algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The idea is to use iterative adaptive dynamic programming (ADP) to obtain the iterative control law which makes the iterative performance index function reach the optimum. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot be accurately obtained, a new design method of the convergence criterion for the generalized value iteration algorithm with finite approximation errors is established to make the iterative performance index functions converge to a finite neighborhood of the lowest bound of all performance index functions. Simulation results are given to illustrate the performance of the developed algorithm.

关键词： Adaptive dynamic programming approximate dynamic programming nonlinear system optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

dynamic Planning of System of Systems Architecture Evolution

引用

Procedia Computer Science 2014年 28卷 449-456页

作者： Zhemei Fang Daniel DeLaurentis Purdue Unicersity 701W. Stadium Avenue.West Lafayette IN 47907 USA

The dynamic planning and development of a large collection of systems or a ‘System of Systems’ (SoS) pose significant programmatic challenges due to the complex interactions that exist between its constituent systems. Decisions to add, remove, or reconstitute connections between systems can result in repercussive failures across operational and developmental dimensions of an SoS. The work conducted in this research is part of a larger body of work funded by the DoD Systems Engineering Research Center (SERC) towards the development of an Analytic Workbench. This paper in particular develops a tool that adopts an operations research-based perspective to SoS level planning based on metrics of cost, performance, schedule and risk. Specifically, our work employs an approximate dynamic programming approach that is well suited to address issues of computational tractability of the resulting dynamic planning optimization problem. This approach allows for identification of near-optimal multi-stage decisions in evolving SoS architectures. A Naval Warfare Scenario SoS example problem illustrates application of the method.

关键词： System of Systems Architecture Evolution approximate dynamic programming Planning Uncertainty

来源：评论

学校读者我要写书评

暂无评论

Online Optimal Switching of Single Phase DC/AC Inverters using Partial Information

Online Optimal Switching of Single Phase DC/AC Inverters usi...

引用

American Control Conference

作者： Kyriakos G. Vamvoudakis Joao P. Hespanha Center for Control Dynamical-systems and Computation (CCDC) University of California Santa Barbara CA 93106-9560 USA

ISBN: (纸本)9781479932757

This paper proposes an online optimal tracking algorithm to provide the desired voltage magnitude and frequency at the load. This eventually will work as a DC/AC inverter that with appropriate switching of semiconductor devices will convert low DC voltage to high AC voltage. An L - C filter is used to reduce the effects caused by switching semiconductor devices. The proposed control scheme ensures a good tracking of an exosystem that provides the desired voltage magnitude and frequency. It builds upon the ideas of approximate dynamic programming (ADP) and uses only partial information of the system and exosystem. A Lyapunov stability proof ensures that the closed-loop system is asympotically stable. Finally, simulations show the effectiveness of the proposed approach.

关键词： Power systems voltage source inverter approximate dynamic programming voltage source inverter power systems voltage magnitude dynamic programming Semiconductor devices VOLTS DIRECT CURRENT Inventors Computer based information systems VOLTS ALTERNATING CURRENT switching Closed loop systems single phase

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：