检索结果-内蒙古大学图书馆

11th IFAC Symposium on dynamics and Control of Process Systems including Biosystems

作者： Morinelly, Juan E. Ydstie, B. Erik Carnegie Mellon Univ Dept Chem Engn Pittsburgh PA 15213 USA

An adaptive optimal control algorithm for system with uncertain dynamics is formulated under a Reinforcement Learning framework. An embedded exploratory component, is included explicitly in the objective function of an output feedback receding horizon Model Predictive Control problem. The optimization is formulated as a Quadratically Constrained Quadratic Program and it is solved to epsilon-global optimality. The iterative interaction between the action specified by the optimal solution and the approximation of cost functions balances the exploitation of current knowledge and the need for exploration. The proposed method is shown to converge to the optimal policy for a controllable discrete time linear plant with unknown output parameters. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： Adaptive control dual control optimal control model predictive control reinforcement learning approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Conversion of MDP Problems into Heuristics Based Planning Problems using Temporal Decomposition 13

Conversion of MDP Problems into Heuristics Based Planning Pr...

引用

13th International Bhurban Conference on Applied Sciences and Technology (IBCAST)

作者： Gillani, Rida Nasir, Ali Univ Cent Punjab Dept Elect Engn Lahore Pakistan

ISBN: (纸本)9781467391276

This paper presents an approach for recasting Markov Decision Process (MDP) problems into heuristics based planning problems. The basic idea is to use temporal decomposition of the state space based on a subset of state space referred to as termination sample space. Specifically, the recasting of MDP problems is done in three steps. First step is to define a state space adaptation criterion based on the termination sample space. Second step is to define an action selection heuristic from each state. Third and final step is to define a recursion or backtracking methodology to avoid dead ends and infinite loops. All three steps have been described and discussed. A case study involving fault detection and alarm generation for the reaction wheels of a satellite mission has been discussed. The proposed approach has been compared with existing approaches for recasting MDP problems using the case study. Computational reduction achieved by the proposed approach is evident from the results.

关键词： Markov Decision Processes Temporal Decomposition Heuristics Based Planning approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Itinerary-based nesting control with upsell

引用

JOURNAL OF REVENUE AND PRICING MANAGEMENT 2016年第2期15卷 107-137页

作者： Pun, Chan Seng Klabjan, Diego Karaesmen, Fikri Shebalov, Sergey Northwestern Univ Evanston IL 60208 USA Sabre Holdings Southlake TX USA Koc Univ Istanbul Turkey

In order to accept future high-yield booking requests, airlines protect seats from low-yield passengers. More seats may be reserved when passengers faced with closed fare classes can upsell to open higher fare classes. We address the airline revenue management problem with capacity nesting and customer upsell, and formulate this problem by a stochastic optimization model to determine a set of static protection levels for each itinerary. We apply an approximate dynamic programming framework to approximate the objective function by piecewise linear functions, whose slopes (marginal revenue) are iteratively updated and returned by an efficient heuristic that simultaneous handles both nesting and upsells. The resulting allocation policy is tested over a real airline network and benchmarked against the randomized linear programming bid-price policy under various demand settings. Simulation results suggest that the proposed allocation policy significantly outperforms when incremental demand or upsell probability are high. Structural analyses are also provided for special demand dependence cases.

关键词： network revenue management capacity nesting customer upsell approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Solving Control Problems with Linear State dynamics - a Practical User Guide 2

Solving Control Problems with Linear State Dynamics - a Prac...

引用

2nd International Symposium on Stochastic Models in Reliability Engineering, Life Science, and Operations Management (SMRLO)

作者： Hinz, Juri Yee, Jeremy Univ Technol Sydney Sch Math Sydney NSW Australia

ISBN: (纸本)9781467399418

In industrial applications, practitioners usually face a considerable complexity when optimizing operating strategies under uncertainty. Typical real-world problems arising in practice are notoriously challenging from a computational viewpoint, requiring solutions to Markov Decision problems in high dimensions. In this work, we address a novel approach to obtain an approximate solution to a certain class of problems, whose state process follows a controlled linear dynamics. Our techniques is illustrated by an implementation within the statistical language R, which we discuss by solving a typical problem arising in practice.

关键词： approximate dynamic programming convex switching systems Markov decision processes optimal switching

来源：评论

学校读者我要写书评

暂无评论

Energy management of PV-storage systems: ADP approach with temporal difference learning 19

Energy management of PV-storage systems: ADP approach with t...

引用

19th Power Systems Computation Conference (PSCC)

作者： Keerthisinghe, Chanaka Verbic, Gregor Chapman, Archie C. Univ Sydney Sch Elect & Informat Engn Sydney NSW Australia

ISBN: (纸本)9788894105124

In the future, residential energy users can seize the full potential of demand response schemes by using an automated home energy management system (HEMS) to schedule their distributed energy resources. In order to generate high quality schedules, a HEMS needs to consider the stochastic nature of the PV generation and energy consumption as well as its inter-daily variations over several days. However, extending the decision horizon of proposed optimisation techniques is computationally difficult and moreover, these approaches are only computationally feasible with a limited number of storage devices and a low-resolution decision horizon. Given these existing shortcomings, this paper presents an approximate dynamic programming (ADP) approach with temporal difference learning for implementing a computationally efficient HEMS. In ADP, we obtain policies from value function approximations by stepping forward in time, compared to the value functions obtained by backward induction in DP. We use empirical data collected during the Smart Grid Smart City project in NSW, Australia, to estimate the parameters of a Markov chain model of PV output and electrical demand, which are then used in all simulations. To evaluate the quality of the solutions generated by ADP, we compare the ADP method to stochastic mixed-integer linear programming (MILP) and dynamic programming (DP). Our results show that ADP computes a solution much quicker than both DP and stochastic MILP, while providing better quality solutions than stochastic MILP and only a slight reduction in quality compared to the DP solution. Moreover, unlike the computationally-intensive DP, the ADP approach is able to consider a decision horizon beyond one day while also considering multiple storage devices, which results in a HEMS that can capture additional financial benefits.

关键词： demand response home energy management distributed energy resources approximate dynamic programming dynamic programming stochastic mixed-integer linear programming value function approximation temporal difference learning

来源：评论

学校读者我要写书评

暂无评论

Stochastic approximate dynamic programming with Link Estimation for High Quality Path Selection in Wireless Mesh Networks

Stochastic Approximate Dynamic Programming with Link Estimat...

引用

Globecom Workshops

作者： Oliveira, Talmai Agrawal, Dharma P. Univ Cincinnati Sch Comp Sci & Informat Ctr Distributed & Mobile Comp Cincinnati OH 45221 USA

ISBN: (纸本)9781424488650

A lot of work has recently been published regarding metrics that could identify high quality paths in Wireless Mesh Networks (WMN). While results are encouraging, no optimal strategy has yet been identified that could estimate link quality and incorporate both the link reliability measurements as well as the bandwidth capacity. Furthermore, link estimation remains an open problem. Considering multi-user environment, any optimal solution would also need to consider multiple communication flows. These arguments have led us to study an approximate dynamic programming (DP) solution capable of utilizing limited network knowledge and stochastic process. Instead of proposing yet another link metric, we analyze a stochastic DP solution to the routing problem using a well established routing metric. Unlike deterministic DP where communication demands are fixed a priori and an optimal path is calculated before any real demands are known, we consider a more realistic scenario with stochastic metrics and formulate the optimal strategy for routing in WMNs. Performance results are given using simulation results.

关键词： approximate dynamic programming Optimal Path Selection Reliable Link Wireless Mesh Network

来源：评论

学校读者我要写书评

暂无评论

Differential TD Learning for Value Function Approximation 55

Differential TD Learning for Value Function Approximation

引用

55th IEEE Conference on Decision and Control (CDC)

作者： Devraj, Adithya M. Meyn, Sean P. Univ Florida Dept Elect & Comp Engn Gainesville FL 32611 USA

ISBN: (纸本)9781509018376

Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A popular approximation technique is known as Temporal Difference (TD) learning. The algorithm introduced in this paper is intended to resolve two well-known problems with this approach: In the discounted-cost setting, the variance of the algorithm diverges as the discount factor approaches unity. Second, for the average cost setting, unbiased algorithms exist only in special cases. It is shown that the gradient of any of these value functions admits a representation that lends itself to algorithm design. Based on this result, the new differential TD method is obtained for Markovian models on Euclidean space with smooth dynamics. Numerical examples show remarkable improvements in performance. In application to speed scaling, variance is reduced by two orders of magnitude.

关键词： Reinforcement learning approximate dynamic programming Poisson's equation Stochastic optimal control

来源：评论

学校读者我要写书评

暂无评论

Risk-Averse Anticipation for dynamic Vehicle Routing 10th

Risk-Averse Anticipation for Dynamic Vehicle Routing

引用

10th International Conference on Learning and Intelligent Optimization (LION)

作者： Ulmer, Marlin W. Voss, Stefan Tech Univ Carolo Wilhelmina Braunschweig Muhlenpfordtstr 23 D-38106 Braunschweig Germany Univ Hamburg Von Melle Pk 5 D-20146 Hamburg Germany

ISBN: (纸本)9783319503493;9783319503486

In the field of dynamic vehicle routing, the importance to integrate stochastic information about possible future events in current decision making increases. Integration is achieved by anticipatory solution approaches, often based on approximate dynamic programming (ADP). ADP methods estimate the expected mean values of future outcomes. In many cases, decision makers are risk-averse, meaning that they avoid "risky" decisions with highly volatile outcomes. Current ADP methods in the field of dynamic vehicle routing are not able to integrate risk-aversion. In this paper, we adapt a recently proposed ADP method explicitly considering risk-aversion to a dynamic vehicle routing problem with stochastic requests. We analyze how risk-aversion impacts solutions' quality and variance. We show that a mild risk-aversion may even improve the risk-neutral objective.

关键词： dynamic vehicle routing Anticipation Risk-aversion approximate dynamic programming Stochastic customer requests

来源：评论

学校读者我要写书评

暂无评论

ADP Based Long-Term Renewable Generation Planning While Considering the Effects of Hourly SCUC

ADP Based Long-Term Renewable Generation Planning While Cons...

引用

IEEE-Power-and-Energy-Society General Meeting (PESGM)

作者： Chen, Zhi Wu, Lei Arkansas Tech Univ Dept Elect Engn Russellville AR 72801 USA Clarkson Univ Dept Elect & Comp Engn Potsdam NY USA

ISBN: (纸本)9781509041688

This paper proposes an approximate dynamic programming (ADP) based approach for solving the long-term renewable generation expansion planning problem by considering the effects of hourly security-constrained unit commitment (SCUC). Compared with traditional approaches, the proposed approach can assist Independent System Operators (ISO) to determine the dynamic expansion for multiple years on the basis of current year information, hence it can reduce the computation burden from future states forecasting. The objective of the proposed long-term power system planning is to minimize the total costs of investment and operation for the base case with forecast values while considering variable cost from uncertainties. Numerical case studies on a 6-bus system illustrate the effectiveness of the proposed ADP based long-term planning model for the integration of renewable energy under various uncertainties.

关键词： Power system planning approximate dynamic programming renewable resource integration power system reliability

来源：评论

学校读者我要写书评

暂无评论

LEARNING IN CONSTRAINED STOCHASTIC dynamic POTENTIAL GAMES 41

LEARNING IN CONSTRAINED STOCHASTIC DYNAMIC POTENTIAL GAMES

引用

41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Macua, Sergio Valcarcel Zazo, Santiago Zazo, Javier Univ Politecn Madrid E-28040 Madrid Spain

ISBN: (纸本)9781479999880

We extend earlier works on continuous potential games to the most general case: stochastic time varying environment, stochastic rewards, non-reduced form and constrained state-action sets. We provide conditions for a Markov Nash equilibrium (MNE) of the game to be equivalent to the solution of a single control problem. Then, we address the problem of learning this MNE when the reward and state transition models are unknown. We follow a reinforcement learning approach and extend previous algorithms for working with constrained state-action subsets of real vector spaces. As an application example, we simulate a network flow optimization model, in which the relays have batteries that deplete with a random factor. The results obtained with the proposed framework are close to optimal.

关键词： approximate dynamic programming game theory multi-agent network flow reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：