检索结果-内蒙古大学图书馆

第三十届中国控制会议

作者： WANG Ding,LIU Derong,WEI Qinglai Key Laboratory of Complex Systems and Intelligence Science,Institute of Automation,Chinese Academy of Sciences, Beijing 100190,P.R.China

This paper deals with the flnite-horizon optimal tracking control for a class of discrete-time nonlinear systems using the iterative adaptive dynamic programming(ADP) ***,the optimal tracking problem is converted into designing a flnite-horizon optimal regulator for the tracking error ***,with convergence analysis in terms of cost function and control law,the iterative ADP algorithm via heuristic dynamic programming(HDP) technique is introduced to obtain the flnite-horizon optimal tracking controller which makes the cost function close to its optimal value within an e-error ***, three neural networks are used to implement the algorithm,which aims at approximating the cost function,the control law,and the error dynamics,*** last,an example is included to demonstrate the effectiveness of the proposed approach.

关键词： Adaptive critic designs Adaptive dynamic programming approximate dynamic programming Finite-horizon optimal tracking control Learning control Neural control

来源：评论

学校读者我要写书评

暂无评论

HIGH-DIMENSIONAL PORTFOLIO OPTIMIZATION WITH TRANSACTION COSTS

引用

INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED FINANCE 2016年第4期19卷 1650025-1650025页

作者： Broadie, Mark Shen, Weiwei Columbia Univ Grad Sch Business New York NY 10027 USA Columbia Univ Appl Phys & Appl Math New York NY 10027 USA

This paper studies Merton's portfolio optimization problem with proportional transaction costs in a discrete-time finite horizon. Facing short-sale and borrowing constraints, investors have access to a risk-free asset and multiple risky assets whose returns follow a multivariate geometric Brownian motion. Lower and upper bounds for optimal solutions up to the problem with 20 risky assets and 40 investment periods are computed. Three lower bounds are proposed: the value function optimization (VF), the hyper-sphere and the hyper-cube policy parameterizations (HS and HC). VF attacks the conundrums in traditional value function iteration for high-dimensional dynamic programs with continuous decision and state spaces. HS and HC respectively approximate the geometry of the trading policy in the high-dimensional state space by two surfaces. To evaluate lower bounds, two new upper bounds are provided via a duality method based on a new auxiliary problem (OMG and OMG2). Compared with existing methods across various suites of parameters, new methods lucidly show superiority. The three lower bound methods always achieve higher utilities, HS and HC cut run times by a factor of 100, and OMG and OMG2 mostly provide tighter upper bounds. In addition, how the no-trading region characterizing the optimal policy deforms when short-sale and borrowing constraints bind is investigated.

关键词： Portfolio optimization transaction costs value function iteration approximate dynamic programming lower and upper bounds

来源：评论

学校读者我要写书评

暂无评论

A Least-Squares Temporal Difference based method for solving resource allocation problems

引用

IFAC JOURNAL OF SYSTEMS AND CONTROL 2020年 13卷

作者： Forootani, Ali Tipaldi, Massimo Zarch, Majid Ghaniee Liuzza, Davide Glielmo, Luigi Univ Sannio Dept Engn Piazza Roma I-82100 Benevento Italy Bu Ali Sina Univ Dept Elect Engn Hamadan Hamadan Iran ENEA Fus & Nucl Safety Dept Rome Italy

Value function approximation has a central role in approximate dynamic programming (ADP) to overcome the so-called curse of dimensionality associated to real stochastic processes. In this regard, we propose a novel Least-Squares Temporal Difference (LSTD) based method: the ``Multi-trajectory Greedy LSTD'' (MG-LSTD). It is an exploration-enhanced recursive LSTD algorithm with the policy improvement embedded within the LSTD iterations. It makes use of multi-trajectories Monte Carlo simulations in order to enhance the system state space exploration. This method is applied for solving resource allocation problems modeled via a constrained Stochastic dynamic programming (SDP) based framework. In particular, such problems are formulated as a set of parallel Birth-Death Processes (BDPs). Some operational scenarios are defined and solved to show the effectiveness of the proposed approach. Finally, we provide some experimental evidence on the MG-LSTD algorithm convergence properties in function of its key-parameters. (C) 2020 Elsevier Ltd. All rights reserved.

关键词： Least-squares temporal difference approximate dynamic programming Markov decision process Birth-death process Monte Carlo simulations

来源：评论

学校读者我要写书评

暂无评论

Optimal and approximate algorithms for sequential clinical scheduling with no-shows

引用

IIE Transactions on Healthcare Systems Engineering 2011年第1期1卷 20-36页

作者： Lin, J. Muthuraman, Kumar Lawley, Mark Weldon School of Biomedical Engineering Purdue University West Lafayette IN 47907-2032 206 S. Martin Jischke Drive United States McCombs School of Business University of Texas Austin TX United States

The accessibility and efficiency of outpatient clinic operations are largely affected by appointment schedules. Clinical scheduling is a process of assigning physician appointment times to sequentially calling patients. A significant problem in clinical operations is patient no-show, that is, scheduled patients not showing for their appointments. Overbooking can compensate revenue loss due to no-show, but naive overbooking can result in longer patient waiting times and uneven physician work loads. In the past few years, new overbooking methods have been developed for sequential scheduling that yield higher expected profit than simple scheduling rules, but these often fail to exploit information about the future call-in process (they are myopic). To fully use this important information, we develop a Markov Decision Processes (MDP) model for sequential clinical scheduling that books patients to optimize the performance of clinic operations. The model is solved by dynamic programming (DP) for small problems. approximate dynamic programming (ADP) algorithms based on aggregation and simulation are developed to find schedules for larger problems. Our computational experiments indicate good improvement over myopic methods. © 2011 Copyright Taylor and Francis Group, LLC.

关键词： approximate dynamic programming Markov Decision Process outpatient clinics patient no-shows Sequential scheduling

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning based finite-horizon optimal control for a discrete-time affine nonlinear system

Deep reinforcement learning based finite-horizon optimal con...

引用

SICE Annual Conference

作者： Jong Woo Kim Byung Jun Park Haeun Yoo Jay H. Lee Jong Min Lee School of Chemical and Biological Engineering Seoul National University Seoul Republic of Korea Chemical and Biomolecular Engineering Department Korea Advanced Institute of Science and Technology Daejeon Republic of Korea

approximate dynamic programming (ADP) aims to obtain an approximate numerical solution to the discrete-time Hamilton-Jacobi-Bellman (HJB) equation. Heuristic dynamic programming (HDP) is a two-stage iterative scheme of ADP by separating the HJB equation into two equations, one for the value function and another for the policy function, which are referred to as the critic and the actor, respectively. Previous ADP implementations have been limited by the choice of function approximator, which requires significant prior domain knowledge or a large number of parameters to be fitted. However, recent advances in deep learning brought by the computer science community enable the use of deep neural networks (DNN) to approximate high-dimensional nonlinear functions without prior domain knowledge. Motivated by this, we examine the potential of DNNs as function approximators of the critic and the actor. In contrast to the infinite-horizon optimal control problem, the critic and the actor of the finite horizon optimal control (FHOC) problem are time-varying functions and have to satisfy a boundary condition. DNN structure and training algorithm suitable for FHOC are presented. Illustrative examples are provided to demonstrate the validity of the proposed method.

关键词： Reinforcement learning approximate dynamic programming Deep learning Actor-critic method Finite horizon optimal control

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Generalized Policy Iteration ADP Algorithm With Approximation Errors

Discrete-Time Generalized Policy Iteration ADP Algorithm Wit...

引用

IEEE Symposium Series on Computational Intelligence

作者： Qinglai Wei Benkai Li Ruizhuo Song The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences Beijing China School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing China

This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.

关键词： Adaptive critic designs Adaptive dynamic programming approximate dynamic programming Neuro-dynamic programming Generalized policy iteration Nonlinear systems Optimal control Neural networks Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Adaptive Traffic Signal Control for Multi-intersection Based on Microscopic Model

Adaptive Traffic Signal Control for Multi-intersection Based...

引用

International Conference on Tools with Artificial Intelligence

作者： Biao Yin Mahjoub Dridi Abdellah El Moudni Laboratoire IRTES-SeT Université de Technologie de Belfort-Montbéliard (UTBM) Belfort France

ISBN: (纸本)9781509001644

In this paper, we mainly propose an online learning method for adaptive traffic signal control in a multi-intersection system. The method uses approximate dynamic programming (ADP) to achieve a near-optimal solution of the signal optimization in a distributed network, which is modeled in a microscopic way. The traffic network loading model and traffic signal control model are presented to serve as the basis of discrete-time control environment. The learning process of linear function approximation in ADP approach adopts the tunable parameters of the traffic states, including the vehicle queue length and the signal indication. ADP overcomes the computational complexity, which usually appears in large-scale problems solved by exact algorithms, such as dynamic programming. Moreover, the proposed adaptive phase sequence (APS) mode improves the performance by comparing with other control methods. The results in simulation show that our method performs quite well for adaptive traffic signal control problem.

关键词： adaptive signal control multi-intersection approximate dynamic programming adaptive phase sequence

来源：评论

学校读者我要写书评

暂无评论

approximate modified policy iteration and its application to the game of Tetris

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2015年第1期16卷

作者： Bruno Scherrer Mohammad Ghavamzadeh Victor Gabillon Boris Lesner Matthieu Geist INRIA Nancy-Grand Est Team Maia Vandœuvre-ls-Nancy France Adobe Research & INRIA Lille San Jose CA INRIA Lille-Nord Europe Team SequeL Villeneuve d'Ascq France CentraleSupélec IMS-MaLIS Research Group & UMI (GeorgiaTech-CNRS) Metz France

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.

关键词： Markov decision processes approximate dynamic programming finite-sample analysis game of tetris performance bounds reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

dynamic policy programming

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2012年第1期13卷

作者： Mohammad Gheshlaghi Azar Vicenç Gómez Hilbert J. Kappen Department of Biophysics Radboud University Nijmegen Nijmegen The Netherlands

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. DPP is an incremental algorithm that forces a gradual change in policy update. This allows us to prove finite-iteration and asymptotic l∞-norm performance-loss bounds in the presence of approximation/ estimation error which depend on the average accumulated error as opposed to the standard bounds which are expressed in terms of the supremum of the errors. The dependency on the average error is important in problems with limited number of samples per iteration, for which the average of the errors can be significantly smaller in size than the supremum of the errors. Based on these theoretical results, we prove that a sampling-based variant of DPP (DPP-RL) asymptotically converges to the optimal policy. Finally, we illustrate numerically the applicability of these results on some benchmark problems and compare the performance of the approximate variants of DPP with some existing reinforcement learning (RL) methods.

关键词： Markov decision processes Monte-Carlo methods approximate dynamic programming function approximation reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：