检索结果-内蒙古大学图书馆

Finite-Horizon Near-Optimal Output Feedback Neural Network Control of Quantized Nonlinear Discrete-Time Systems With Input Constraint

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015年第8期26卷 1776-1788页

作者： Xu, Hao Zhao, Qiming Jagannathan, Sarangapani Texas A&M Univ Corpus Christi Coll Sci & Engn Corpus Christi TX 78412 USA DENSO Int Amer Inc Southfield MI 48033 USA Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA

The output feedback-based near-optimal regulation of uncertain and quantized nonlinear discrete-time systems in affine form with control constraint over finite horizon is addressed in this paper. First, the effect of input constraint is handled using a nonquadratic cost functional. Next, a neural network (NN)-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix so that a separate identifier is not needed. Then, approximate dynamic programming-based actor-critic framework is utilized to approximate the time-varying solution of the Hamilton-Jacobi-Bellman using NNs with constant weights and time-dependent activation functions. A new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. Finally, a novel dynamic quantizer for the control inputs with adaptive step size is designed to eliminate the quantization error overtime, thus overcoming the drawback of the traditional uniform quantizer. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability. Simulation results are given to show the effectiveness and feasibility of the proposed method.

关键词： approximate dynamic programming finite horizon Hamilton-Jacobi-Bellman (HJB) equation neural network (NN) optimal regulation quantization

来源：评论

学校读者我要写书评

暂无评论

Optimal switching between controlled subsystems with free mode sequence

引用

NEUROCOMPUTING 2015年第PartC期149卷 1620-1630页

作者： Heydari, Ali Balakrishnan, S. N. South Dakota Sch Mines & Technol Dept Mech Engn Rapid City SD 57701 USA Missouri Univ Sci & Technol Mech & Aerosp Engn Dept Rolla MO USA

The problem of optimal switching and control of nonlinear switching systems with controlled subsystems is investigated in this study where the mode sequence and the switching times between the modes are unspecified. An approximate dynamic programming based method is developed which provides a feedback solution for unspecified initial conditions and different final times. The convergence of the proposed algorithm is proved. Versatility of the method and its performance are illustrated through different numerical examples. (C) 2014 Elsevier B.V. All rights reserved.

关键词： Optimal switching approximate dynamic programming Switching system Adaptive critics

来源：评论

学校读者我要写书评

暂无评论

Classification-Based approximate Policy Iteration

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2015年第11期60卷 2989-2993页

作者： Farahmand, Amir-massoud Precup, Doina Barreto, Andre M. S. Ghavamzadeh, Mohammad McGill Univ Sch Comp Sci Montreal PQ H3A 0E9 Canada Carnegie Mellon Univ Inst Robot Pittsburgh PA 15213 USA Mitsubishi Elect Res Labs Cambridge MA 02139 USA Natl Lab Sci Comp LNCC BR-25651075 Petropolis Brazil

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework that can exploit regularities of both. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms, and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results.

关键词： approximate dynamic programming approximate policy iteration classification finite-sample analysis reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

approximate Modified Policy Iteration and its Application to the Game of Tetris

引用

JOURNAL OF MACHINE LEARNING RESEARCH 2015年第1期16卷 1629-1676页

作者： Scherrer, Bruno Ghavamzadeh, Mohammad Gabillon, Victor Lesner, Boris Geist, Matthieu INRIA Nancy Grand Est Team Maia 615 Rue Jardin Bot F-54600 Vandoeuvre Ls Nancy France Adobe Res San Jose CA 95110 USA INRIA Lille San Jose CA 95110 USA INRIA Lille Nord Europe Team SequeL F-59650 Villeneuve Dascq France GeorgiaTech CNRS IMS MaLIS Res Grp Cent Supelec F-57070 Metz France GeorgiaTech CNRS UMI 2958 F-57070 Metz France

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.

关键词： approximate dynamic programming reinforcement learning Markov decision processes finite-sample analysis performance bounds game of tetris

来源：评论

学校读者我要写书评

暂无评论

Effective Load Carrying Capability Evaluation of Renewable Energy via Stochastic Long-Term Hourly Based SCUC

引用

IEEE TRANSACTIONS ON SUSTAINABLE ENERGY 2015年第1期6卷 188-197页

作者： Chen, Zhi Wu, Lei Shahidehpour, Mohammad Arkansas Tech Univ Dept Elect Engn Russellville AR 72801 USA Clarkson Univ Dept Elect & Comp Engn Potsdam NY 13699 USA IIT Dept Elect & Comp Engn Chicago IL 60616 USA King Abdulaziz Univ Jeddah 22254 Saudi Arabia

This paper evaluates the effective load carrying capability (ELCC) of renewable resources, including wind and solar, via the stochastic long-term hourly based security-constrained unit commitment (SCUC) model. Different from traditional approaches which approximate ELCC of renewable resources using system peak loads, nonsequential block load duration curves, or rolling-based sequential methods, the stochastic long-term hourly based SCUC could accurately examine the impacts of short-term variability and uncertainty of renewable resources as well as chronological operation details of generators on hourly supply-demand imbalance and power system reliability in a long-term horizon. Uncertainties of hourly wind, solar, and load in a 1-year horizon are simulated via the scenario tree using the Monte Carlo method, and approximate dynamic programming is adopted for effectively solving the stochastic long-term hourly based SCUC model. Variability correlations between wind speed and solar radiation are considered within the scenario sampling procedure. Moreover, parallel computing is designed with the pipeline structure for accelerating the computational performance of approximate dynamic programming. Numerical case studies on the modified IEEE 118-bus system illustrate the effectiveness of the proposed stochastic long-term hourly based SCUC model and the approximate dynamic programming solution approach for evaluating ELCC of renewable resources. This would help independent system operators (ISO) design's effective long-term planning strategies for operating power systems efficiently and reliably.

关键词： approximate dynamic programming effective load carrying capability (ELCC) power system reliability renewable resource integration stochastic long-term hourly based security-constrained unit commitment (SCUC)

来源：评论

学校读者我要写书评

暂无评论

Generalized Policy Iteration Adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2015年第12期45卷 1577-1591页

作者： Liu, Derong Wei, Qinglai Yan, Pengfei Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptive dynamic programming algorithm to obtain iterative control laws which make the iterative value functions converge to the optimum. Initialized by an admissible control law, it is shown that the iterative value functions are monotonically nonincreasing and converge to the optimal solution of Hamilton-Jacobi-Bellman equation, under the assumption that a perfect function approximation is employed. The admissibility property is analyzed, which shows that any of the iterative control laws can stabilize the nonlinear system. Neural networks are utilized to implement the generalized policy iteration algorithm, by approximating the iterative value function and computing the iterative control law, respectively, to achieve approximate optimal control. Finally, numerical examples are presented to verify the effectiveness of the present generalized policy iteration algorithm.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming generalized policy iteration neural networks neuro-dynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Elective Patient Admission and Scheduling under Multiple Resource Constraints

引用

PRODUCTION AND OPERATIONS MANAGEMENT 2015年第12期24卷 1907-1930页

作者： Barz, Christiane Rajaram, Kumar Tech Univ Berlin Sch Econ & Management 7 D-10623 Berlin Germany Univ Calif Los Angeles Anderson Sch Management Los Angeles CA 90095 USA

We consider a patient admission problem to a hospital with multiple resource constraints (e. g., OR and beds) and a stochastic evolution of patient care requirements across multiple resources. There is a small but significant proportion of emergency patients who arrive randomly and have to be accepted at the hospital. However, the hospital needs to decide whether to accept, postpone, or even reject the admission from a random stream of non-emergency elective patients. We formulate the control process as a Markov decision process to maximize expected contribution net of overbooking costs, develop bounds using approximate dynamic programming, and use them to construct heuristics. We test our methods on data from the Ronald Reagan UCLA Medical Center and find that our intuitive newsvendor-based heuristic performs well across all scenarios.

关键词： patient admission patient scheduling multiple resources Markov decision process approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

HIGH-DIMENSIONAL PORTFOLIO OPTIMIZATION WITH TRANSACTION COSTS

引用

INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED FINANCE 2016年第4期19卷 1650025-1650025页

作者： Broadie, Mark Shen, Weiwei Columbia Univ Grad Sch Business New York NY 10027 USA Columbia Univ Appl Phys & Appl Math New York NY 10027 USA

This paper studies Merton's portfolio optimization problem with proportional transaction costs in a discrete-time finite horizon. Facing short-sale and borrowing constraints, investors have access to a risk-free asset and multiple risky assets whose returns follow a multivariate geometric Brownian motion. Lower and upper bounds for optimal solutions up to the problem with 20 risky assets and 40 investment periods are computed. Three lower bounds are proposed: the value function optimization (VF), the hyper-sphere and the hyper-cube policy parameterizations (HS and HC). VF attacks the conundrums in traditional value function iteration for high-dimensional dynamic programs with continuous decision and state spaces. HS and HC respectively approximate the geometry of the trading policy in the high-dimensional state space by two surfaces. To evaluate lower bounds, two new upper bounds are provided via a duality method based on a new auxiliary problem (OMG and OMG2). Compared with existing methods across various suites of parameters, new methods lucidly show superiority. The three lower bound methods always achieve higher utilities, HS and HC cut run times by a factor of 100, and OMG and OMG2 mostly provide tighter upper bounds. In addition, how the no-trading region characterizing the optimal policy deforms when short-sale and borrowing constraints bind is investigated.

关键词： Portfolio optimization transaction costs value function iteration approximate dynamic programming lower and upper bounds

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming of Continuous Annealing process

Approximate Dynamic Programming of Continuous Annealing proc...

引用

IEEE International Conference on Automation and Logistics

作者： Zhang, Yingwei Guo, Chao Chen, Xue Teng, Yongdong Northeastern Univ Minist Educ Key Lab Integrated Automat Proc Ind Shenyang 110004 Liaoning Peoples R China

ISBN: (纸本)9781424447947

approximate dynamic programming method is a combination of neural networks, reinforcement learning, as well as the idea of dynamic programming. It is an online control method which bases on actual data rather than a precise mathematical model of the system. This method is suitable for the optimal control of nonlinear systems, and can avoid the problem of dimension disaster. It can effectively solve the non-linearity of the plant or the uncertainty problem caused by the uncertainty of the system modeling. So, it is suitable for processing the complex system and task of time-varying. The heating section of the continuous annealing furnace consumes a large number of energy, and the dynamic programming method has some limitation for solve the problems. We design the optimization controller for the heating section of the annealing furnace based on the approximate dynamic programming method. In this paper, it mainly gives the basic structure and algorithm of the action-dependent heuristic dynamic programming method (ADHDP), and designs the temperature optimization controller of the heating section in the continuous annealing furnace based on the ADHDP method. Simulation shows the temperature controller based on ADHDP has some theoretical and practical significance for the future practical application.

关键词： approximate dynamic programming Continuous Annealing Furnace ADHDP Neural Network Temperature Control

来源：评论

学校读者我要写书评

暂无评论

dynamic Multi-period Freight Consolidation 6th

Dynamic Multi-period Freight Consolidation

引用

6th International Conference on Computational Logistics (ICCL)

作者： Rivera, Arturo Perez Mes, Martijn Univ Twente Dept Ind Engn & Business Informat Syst NL-7500 AE Enschede Netherlands

ISBN: (纸本)9783319242644;9783319242637

Logistic Service Providers (LSPs) offering hinterland transportation face the trade-off between efficiently using the capacity of long-haul vehicles and minimizing the first and last-mile costs. To achieve the optimal trade-off, freights have to be consolidated considering the variation in the arrival of freight and their characteristics, the applicable transportation restrictions, and the interdependence of decisions over time. We propose the use of a Markov model and an approximate dynamic programming (ADP) algorithm to consolidate the right freights in such transportation settings. Our model incorporates probabilistic knowledge of the arrival of freights and their characteristics, as well as generic definitions of transportation restrictions and costs. Using small test instances, we show that our ADP solution provides accurate approximations to the optimal solution of the Markov model. Using larger problem instances, we show that our modeling approach has significant benefits when compared to common-practice heuristic approaches.

关键词： Intermodal transportation Transportation planning Consolidation Time horizon approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：