An adaptive optimal control algorithm for system with uncertain dynamics is formulated under a Reinforcement Learning framework. An embedded exploratory component, is included explicitly in the objective function of a...
详细信息
An adaptive optimal control algorithm for system with uncertain dynamics is formulated under a Reinforcement Learning framework. An embedded exploratory component, is included explicitly in the objective function of an output feedback receding horizon Model Predictive Control problem. The optimization is formulated as a Quadratically Constrained Quadratic Program and it is solved to epsilon-global optimality. The iterative interaction between the action specified by the optimal solution and the approximation of cost functions balances the exploitation of current knowledge and the need for exploration. The proposed method is shown to converge to the optimal policy for a controllable discrete time linear plant with unknown output parameters. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
This paper presents an approach for recasting Markov Decision Process (MDP) problems into heuristics based planning problems. The basic idea is to use temporal decomposition of the state space based on a subset of sta...
详细信息
ISBN:
(纸本)9781467391276
This paper presents an approach for recasting Markov Decision Process (MDP) problems into heuristics based planning problems. The basic idea is to use temporal decomposition of the state space based on a subset of state space referred to as termination sample space. Specifically, the recasting of MDP problems is done in three steps. First step is to define a state space adaptation criterion based on the termination sample space. Second step is to define an action selection heuristic from each state. Third and final step is to define a recursion or backtracking methodology to avoid dead ends and infinite loops. All three steps have been described and discussed. A case study involving fault detection and alarm generation for the reaction wheels of a satellite mission has been discussed. The proposed approach has been compared with existing approaches for recasting MDP problems using the case study. Computational reduction achieved by the proposed approach is evident from the results.
In order to accept future high-yield booking requests, airlines protect seats from low-yield passengers. More seats may be reserved when passengers faced with closed fare classes can upsell to open higher fare classes...
详细信息
In order to accept future high-yield booking requests, airlines protect seats from low-yield passengers. More seats may be reserved when passengers faced with closed fare classes can upsell to open higher fare classes. We address the airline revenue management problem with capacity nesting and customer upsell, and formulate this problem by a stochastic optimization model to determine a set of static protection levels for each itinerary. We apply an approximate dynamic programming framework to approximate the objective function by piecewise linear functions, whose slopes (marginal revenue) are iteratively updated and returned by an efficient heuristic that simultaneous handles both nesting and upsells. The resulting allocation policy is tested over a real airline network and benchmarked against the randomized linear programming bid-price policy under various demand settings. Simulation results suggest that the proposed allocation policy significantly outperforms when incremental demand or upsell probability are high. Structural analyses are also provided for special demand dependence cases.
In industrial applications, practitioners usually face a considerable complexity when optimizing operating strategies under uncertainty. Typical real-world problems arising in practice are notoriously challenging from...
详细信息
ISBN:
(纸本)9781467399418
In industrial applications, practitioners usually face a considerable complexity when optimizing operating strategies under uncertainty. Typical real-world problems arising in practice are notoriously challenging from a computational viewpoint, requiring solutions to Markov Decision problems in high dimensions. In this work, we address a novel approach to obtain an approximate solution to a certain class of problems, whose state process follows a controlled linear dynamics. Our techniques is illustrated by an implementation within the statistical language R, which we discuss by solving a typical problem arising in practice.
In the future, residential energy users can seize the full potential of demand response schemes by using an automated home energy management system (HEMS) to schedule their distributed energy resources. In order to ge...
详细信息
ISBN:
(纸本)9788894105124
In the future, residential energy users can seize the full potential of demand response schemes by using an automated home energy management system (HEMS) to schedule their distributed energy resources. In order to generate high quality schedules, a HEMS needs to consider the stochastic nature of the PV generation and energy consumption as well as its inter-daily variations over several days. However, extending the decision horizon of proposed optimisation techniques is computationally difficult and moreover, these approaches are only computationally feasible with a limited number of storage devices and a low-resolution decision horizon. Given these existing shortcomings, this paper presents an approximate dynamic programming (ADP) approach with temporal difference learning for implementing a computationally efficient HEMS. In ADP, we obtain policies from value function approximations by stepping forward in time, compared to the value functions obtained by backward induction in DP. We use empirical data collected during the Smart Grid Smart City project in NSW, Australia, to estimate the parameters of a Markov chain model of PV output and electrical demand, which are then used in all simulations. To evaluate the quality of the solutions generated by ADP, we compare the ADP method to stochastic mixed-integer linear programming (MILP) and dynamicprogramming (DP). Our results show that ADP computes a solution much quicker than both DP and stochastic MILP, while providing better quality solutions than stochastic MILP and only a slight reduction in quality compared to the DP solution. Moreover, unlike the computationally-intensive DP, the ADP approach is able to consider a decision horizon beyond one day while also considering multiple storage devices, which results in a HEMS that can capture additional financial benefits.
A lot of work has recently been published regarding metrics that could identify high quality paths in Wireless Mesh Networks (WMN). While results are encouraging, no optimal strategy has yet been identified that could...
详细信息
ISBN:
(纸本)9781424488650
A lot of work has recently been published regarding metrics that could identify high quality paths in Wireless Mesh Networks (WMN). While results are encouraging, no optimal strategy has yet been identified that could estimate link quality and incorporate both the link reliability measurements as well as the bandwidth capacity. Furthermore, link estimation remains an open problem. Considering multi-user environment, any optimal solution would also need to consider multiple communication flows. These arguments have led us to study an approximate dynamic programming (DP) solution capable of utilizing limited network knowledge and stochastic process. Instead of proposing yet another link metric, we analyze a stochastic DP solution to the routing problem using a well established routing metric. Unlike deterministic DP where communication demands are fixed a priori and an optimal path is calculated before any real demands are known, we consider a more realistic scenario with stochastic metrics and formulate the optimal strategy for routing in WMNs. Performance results are given using simulation results.
Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few...
详细信息
ISBN:
(纸本)9781509018376
Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A popular approximation technique is known as Temporal Difference (TD) learning. The algorithm introduced in this paper is intended to resolve two well-known problems with this approach: In the discounted-cost setting, the variance of the algorithm diverges as the discount factor approaches unity. Second, for the average cost setting, unbiased algorithms exist only in special cases. It is shown that the gradient of any of these value functions admits a representation that lends itself to algorithm design. Based on this result, the new differential TD method is obtained for Markovian models on Euclidean space with smooth dynamics. Numerical examples show remarkable improvements in performance. In application to speed scaling, variance is reduced by two orders of magnitude.
In the field of dynamic vehicle routing, the importance to integrate stochastic information about possible future events in current decision making increases. Integration is achieved by anticipatory solution approache...
详细信息
ISBN:
(纸本)9783319503493;9783319503486
In the field of dynamic vehicle routing, the importance to integrate stochastic information about possible future events in current decision making increases. Integration is achieved by anticipatory solution approaches, often based on approximate dynamic programming (ADP). ADP methods estimate the expected mean values of future outcomes. In many cases, decision makers are risk-averse, meaning that they avoid "risky" decisions with highly volatile outcomes. Current ADP methods in the field of dynamic vehicle routing are not able to integrate risk-aversion. In this paper, we adapt a recently proposed ADP method explicitly considering risk-aversion to a dynamic vehicle routing problem with stochastic requests. We analyze how risk-aversion impacts solutions' quality and variance. We show that a mild risk-aversion may even improve the risk-neutral objective.
This paper proposes an approximate dynamic programming (ADP) based approach for solving the long-term renewable generation expansion planning problem by considering the effects of hourly security-constrained unit comm...
详细信息
ISBN:
(纸本)9781509041688
This paper proposes an approximate dynamic programming (ADP) based approach for solving the long-term renewable generation expansion planning problem by considering the effects of hourly security-constrained unit commitment (SCUC). Compared with traditional approaches, the proposed approach can assist Independent System Operators (ISO) to determine the dynamic expansion for multiple years on the basis of current year information, hence it can reduce the computation burden from future states forecasting. The objective of the proposed long-term power system planning is to minimize the total costs of investment and operation for the base case with forecast values while considering variable cost from uncertainties. Numerical case studies on a 6-bus system illustrate the effectiveness of the proposed ADP based long-term planning model for the integration of renewable energy under various uncertainties.
We extend earlier works on continuous potential games to the most general case: stochastic time varying environment, stochastic rewards, non-reduced form and constrained state-action sets. We provide conditions for a ...
详细信息
ISBN:
(纸本)9781479999880
We extend earlier works on continuous potential games to the most general case: stochastic time varying environment, stochastic rewards, non-reduced form and constrained state-action sets. We provide conditions for a Markov Nash equilibrium (MNE) of the game to be equivalent to the solution of a single control problem. Then, we address the problem of learning this MNE when the reward and state transition models are unknown. We follow a reinforcement learning approach and extend previous algorithms for working with constrained state-action subsets of real vector spaces. As an application example, we simulate a network flow optimization model, in which the relays have batteries that deplete with a random factor. The results obtained with the proposed framework are close to optimal.
暂无评论