The output feedback-based near-optimal regulation of uncertain and quantized nonlinear discrete-time systems in affine form with control constraint over finite horizon is addressed in this paper. First, the effect of ...
详细信息
The output feedback-based near-optimal regulation of uncertain and quantized nonlinear discrete-time systems in affine form with control constraint over finite horizon is addressed in this paper. First, the effect of input constraint is handled using a nonquadratic cost functional. Next, a neural network (NN)-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix so that a separate identifier is not needed. Then, approximate dynamic programming-based actor-critic framework is utilized to approximate the time-varying solution of the Hamilton-Jacobi-Bellman using NNs with constant weights and time-dependent activation functions. A new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. Finally, a novel dynamic quantizer for the control inputs with adaptive step size is designed to eliminate the quantization error overtime, thus overcoming the drawback of the traditional uniform quantizer. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability. Simulation results are given to show the effectiveness and feasibility of the proposed method.
The problem of optimal switching and control of nonlinear switching systems with controlled subsystems is investigated in this study where the mode sequence and the switching times between the modes are unspecified. A...
详细信息
The problem of optimal switching and control of nonlinear switching systems with controlled subsystems is investigated in this study where the mode sequence and the switching times between the modes are unspecified. An approximate dynamic programming based method is developed which provides a feedback solution for unspecified initial conditions and different final times. The convergence of the proposed algorithm is proved. Versatility of the method and its performance are illustrated through different numerical examples. (C) 2014 Elsevier B.V. All rights reserved.
Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularit...
详细信息
Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework that can exploit regularities of both. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms, and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results.
Modified policy iteration (MPI) is a dynamicprogramming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its ...
详细信息
Modified policy iteration (MPI) is a dynamicprogramming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.
This paper evaluates the effective load carrying capability (ELCC) of renewable resources, including wind and solar, via the stochastic long-term hourly based security-constrained unit commitment (SCUC) model. Differe...
详细信息
This paper evaluates the effective load carrying capability (ELCC) of renewable resources, including wind and solar, via the stochastic long-term hourly based security-constrained unit commitment (SCUC) model. Different from traditional approaches which approximate ELCC of renewable resources using system peak loads, nonsequential block load duration curves, or rolling-based sequential methods, the stochastic long-term hourly based SCUC could accurately examine the impacts of short-term variability and uncertainty of renewable resources as well as chronological operation details of generators on hourly supply-demand imbalance and power system reliability in a long-term horizon. Uncertainties of hourly wind, solar, and load in a 1-year horizon are simulated via the scenario tree using the Monte Carlo method, and approximate dynamic programming is adopted for effectively solving the stochastic long-term hourly based SCUC model. Variability correlations between wind speed and solar radiation are considered within the scenario sampling procedure. Moreover, parallel computing is designed with the pipeline structure for accelerating the computational performance of approximate dynamic programming. Numerical case studies on the modified IEEE 118-bus system illustrate the effectiveness of the proposed stochastic long-term hourly based SCUC model and the approximate dynamic programming solution approach for evaluating ELCC of renewable resources. This would help independent system operators (ISO) design's effective long-term planning strategies for operating power systems efficiently and reliably.
This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptive dynamicprogramming algori...
详细信息
This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptive dynamicprogramming algorithm to obtain iterative control laws which make the iterative value functions converge to the optimum. Initialized by an admissible control law, it is shown that the iterative value functions are monotonically nonincreasing and converge to the optimal solution of Hamilton-Jacobi-Bellman equation, under the assumption that a perfect function approximation is employed. The admissibility property is analyzed, which shows that any of the iterative control laws can stabilize the nonlinear system. Neural networks are utilized to implement the generalized policy iteration algorithm, by approximating the iterative value function and computing the iterative control law, respectively, to achieve approximate optimal control. Finally, numerical examples are presented to verify the effectiveness of the present generalized policy iteration algorithm.
We consider a patient admission problem to a hospital with multiple resource constraints (e. g., OR and beds) and a stochastic evolution of patient care requirements across multiple resources. There is a small but sig...
详细信息
We consider a patient admission problem to a hospital with multiple resource constraints (e. g., OR and beds) and a stochastic evolution of patient care requirements across multiple resources. There is a small but significant proportion of emergency patients who arrive randomly and have to be accepted at the hospital. However, the hospital needs to decide whether to accept, postpone, or even reject the admission from a random stream of non-emergency elective patients. We formulate the control process as a Markov decision process to maximize expected contribution net of overbooking costs, develop bounds using approximate dynamic programming, and use them to construct heuristics. We test our methods on data from the Ronald Reagan UCLA Medical Center and find that our intuitive newsvendor-based heuristic performs well across all scenarios.
This paper studies Merton's portfolio optimization problem with proportional transaction costs in a discrete-time finite horizon. Facing short-sale and borrowing constraints, investors have access to a risk-free a...
详细信息
This paper studies Merton's portfolio optimization problem with proportional transaction costs in a discrete-time finite horizon. Facing short-sale and borrowing constraints, investors have access to a risk-free asset and multiple risky assets whose returns follow a multivariate geometric Brownian motion. Lower and upper bounds for optimal solutions up to the problem with 20 risky assets and 40 investment periods are computed. Three lower bounds are proposed: the value function optimization (VF), the hyper-sphere and the hyper-cube policy parameterizations (HS and HC). VF attacks the conundrums in traditional value function iteration for high-dimensional dynamic programs with continuous decision and state spaces. HS and HC respectively approximate the geometry of the trading policy in the high-dimensional state space by two surfaces. To evaluate lower bounds, two new upper bounds are provided via a duality method based on a new auxiliary problem (OMG and OMG2). Compared with existing methods across various suites of parameters, new methods lucidly show superiority. The three lower bound methods always achieve higher utilities, HS and HC cut run times by a factor of 100, and OMG and OMG2 mostly provide tighter upper bounds. In addition, how the no-trading region characterizing the optimal policy deforms when short-sale and borrowing constraints bind is investigated.
approximate dynamic programming method is a combination of neural networks, reinforcement learning, as well as the idea of dynamicprogramming. It is an online control method which bases on actual data rather than a p...
详细信息
ISBN:
(纸本)9781424447947
approximate dynamic programming method is a combination of neural networks, reinforcement learning, as well as the idea of dynamicprogramming. It is an online control method which bases on actual data rather than a precise mathematical model of the system. This method is suitable for the optimal control of nonlinear systems, and can avoid the problem of dimension disaster. It can effectively solve the non-linearity of the plant or the uncertainty problem caused by the uncertainty of the system modeling. So, it is suitable for processing the complex system and task of time-varying. The heating section of the continuous annealing furnace consumes a large number of energy, and the dynamicprogramming method has some limitation for solve the problems. We design the optimization controller for the heating section of the annealing furnace based on the approximate dynamic programming method. In this paper, it mainly gives the basic structure and algorithm of the action-dependent heuristic dynamicprogramming method (ADHDP), and designs the temperature optimization controller of the heating section in the continuous annealing furnace based on the ADHDP method. Simulation shows the temperature controller based on ADHDP has some theoretical and practical significance for the future practical application.
Logistic Service Providers (LSPs) offering hinterland transportation face the trade-off between efficiently using the capacity of long-haul vehicles and minimizing the first and last-mile costs. To achieve the optimal...
详细信息
ISBN:
(纸本)9783319242644;9783319242637
Logistic Service Providers (LSPs) offering hinterland transportation face the trade-off between efficiently using the capacity of long-haul vehicles and minimizing the first and last-mile costs. To achieve the optimal trade-off, freights have to be consolidated considering the variation in the arrival of freight and their characteristics, the applicable transportation restrictions, and the interdependence of decisions over time. We propose the use of a Markov model and an approximate dynamic programming (ADP) algorithm to consolidate the right freights in such transportation settings. Our model incorporates probabilistic knowledge of the arrival of freights and their characteristics, as well as generic definitions of transportation restrictions and costs. Using small test instances, we show that our ADP solution provides accurate approximations to the optimal solution of the Markov model. Using larger problem instances, we show that our modeling approach has significant benefits when compared to common-practice heuristic approaches.
暂无评论