检索结果-内蒙古大学图书馆

Application of machine learning to assess the value of information in polymer flooding

Petroleum Research 2021年第4期6卷 309-320页

作者： Amine Tadjer Reidar B.Bratvold Aojie Hong Remus Hanea University of Stavanger Norway Equinor Norway

In this work,we provide a more consistent alternative for performing value of information(VOI)analyses to address sequential decision problems in reservoir management and generate insights on the process of reservoir *** sequential decision problems are often solved and modeled as stochastic dynamic programs,but once the state space becomes large and complex,traditional techniques,such as policy iteration and backward induction,quickly become computationally demanding and *** resolve these issues and utilize fewer computational resources,we instead make use of a viable alternative called approximate dynamic programming(ADP),which is a powerful solution technique that can handle complex,large-scale problems and discover a near-optimal solution for intractable sequential decision *** compare and test the performance of several machine learning techniques that lie within the domain of ADP to determine the optimal time for beginning a polymer flooding process within a reservoir development *** approximate dynamic approach utilized here takes into account both the effect of the information obtained before a decision is made and the effect of the information that might be obtained to support future decisions while significantly improving both the timing and the value of the decision,thereby leading to a significant increase in economic performance.

关键词： Value of information Reservoir development plan approximate dynamic programming Machine learning

来源：评论

学校读者我要写书评

暂无评论

Real-time dispatch of integrated electricity and thermal system incorporating storages via a stochastic dynamic programming with imitation learning

引用

INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS 2023年 153卷

作者： Pan, Zhenning Yu, Tao Huang, Wenqi Wu, Yufeng Chen, Junbin Zhu, Kedong Lu, Jidong South China Univ Technol Sch Elect Power Engn Guangzhou 510640 Peoples R China CSG Digital Grid Res Inst Co Ltd Guangzhou 510670 Peoples R China

Coordinated dispatch of integrated electricity and thermal system (IETS) provides extra operation flexibility which is further improved by integration of electrical and thermal storages. However, the problem non-convexity and multiple uncertainties hinder its optimal real-time dispatch. This paper proposes an approximate dynamic programming with imitation learning (ADP-IL) based real-time dispatch policy for IETS with electrical and thermal storages, which is computationally efficient and adaptive to uncertainties while satisfying complex networks constraints. First, real-time dispatch of IETS is reformulated as a multistage stochastic sequential optimization and non-convex terms are addressed by mixed integer programming. Next, an ADP which exploits value function monotonicity is employed to temporally decompose original problem and an off-line pre-learning is incorporated to address dimension complexity. Then, imitation learning, which allows the algorithm to learn from expert demonstration, is introduced to further accelerate off-line pre-learning. After sufficient learning, ADP-IL provides high quality real-time solution with remarkable computation efficiency. Comprehensive studies verify optimality, adaptability, efficiency, and scalability of ADP-IL.

关键词： Integrated electricity and thermal system Stochastic optimization Thermal storage Real-time optimization approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

An exposition of least square Monte Carlo approach for real options valuation

引用

GEOENERGY SCIENCE AND ENGINEERING 2023年 222卷

作者： Ahmadi, Rouholah Bratvold, Reidar Brumer Univ Stavanger Fac Sci & Technol Dept Energy Resources Stavanger Norway Natl IOR Centre Norway Bergen Norway

The least square Monte Carlo simulation (LSM) approach is a state-of-the-art approach built upon approximate dynamic programming for the selection of single or multiple exercise options, and it has been extensively used for sequential decision-making and real options valuation. Although it has been broadly discussed and employed in many real-world applications, relatively less attention has been given to some of its implementation details. This paper aims to contribute to an improved understanding of sequential decision-making and real options valuation using the LSM algorithm. In this paper, we illustrate and argue the impact of only including the in-the -money (ITM) paths and the choice of regression functions for a specific example. A simple oil production problem with two common embedded options (shrink versus expands) has been utilized for the analysis. The analyses conducted in this article for the considered decision situation confirm that when at least one of the options (decision alternatives) is relevant to the out-of-the-money (OOTM) paths, it is crucial to consider all the paths to benefit from all the flexibilities. Furthermore, the choice of regression function is shown to have minor effect on the optimal strategies and the expected project value as long as the options are deeply ITM or OOTM. However, when the options are poorly ITM or OOTM, we might reach a different solution than what would have been achieved had we used an exact dynamic programming approach. In addition, excluding OOTM paths from the analysis adversely affects the option valuation part of the LSM for the current application, leading to lower project values. In contrast, the impact of path exclusion on regression performance is less emphasized. This work also verifies the robustness of the LSM approach for the selection of the polynomial order used for tuning the regression function. While some of the findings herein are problem-specific, a similar methodology can be used to evaluate

关键词： Uncertainty analysis BDH problem approximate dynamic programming Regression Sequential decision-making Optimal decision policy

来源：评论

学校读者我要写书评

暂无评论

Temporal logic guided safe model-based reinforcement learning: A hybrid systems approach

引用

NONLINEAR ANALYSIS-HYBRID SYSTEMS 2023年 47卷

作者： Cohen, Max H. Serlin, Zachary Leahy, Kevin Belta, Calin Boston Univ Dept Mech Engn 110 Cummington Mall Boston MA 02215 USA MIT Lincoln Lab Lexington MA USA

This paper studies the problem of synthesizing control policies for uncertain continuous -time nonlinear systems from linear temporal logic (LTL) specifications using model-based reinforcement learning (MBRL). Rather than taking an abstraction-based approach, we view the interaction between the LTL formula's corresponding Buchi automaton and the nonlinear system as a hybrid automaton whose discrete dynamics match exactly those of the Buchi automaton. To find satisfying control policies, we pose a sequence of optimal control problems associated with states in the accepting run of the automaton and leverage control barrier functions (CBFs) to prevent specification violation. Since solving many optimal control problems for a nonlinear system is computationally intractable, we take a learning-based approach in which the value function of each problem is learned online in real-time. Specifically, we propose a novel off-policy MBRL algorithm that allows one to simultaneously learn the uncertain dynamics of the system and the value function of each optimal control problem online while adhering to CBF-based safety constraints. Unlike related approaches, the MBRL method presented herein decouples convergence, stability, and safety, allowing each aspect to be studied independently, leading to stronger safety guarantees than those developed in related works. Numerical results are presented to validate the efficacy of the proposed method.(c) 2022 Elsevier Ltd. All rights reserved.

关键词： Lyapunov methods Reinforcement learning Adaptive control approximate dynamic programming Temporal logics

来源：评论

学校读者我要写书评

暂无评论

Value-gradient iteration with quadratic approximate value functions

引用

ANNUAL REVIEWS IN CONTROL 2023年 56卷

作者： Yang, Alan Boyd, Stephen Stanford Univ Dept Elect Engn Stanford CA 94305 USA

We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem, typically a quadratic program, which can be carried out reliably in real-time. Such policies often perform well even when the approximate value function is not a particularly good approximation of the true value function. We propose value-gradient iteration, which fits the gradient of value function, with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning. We find that the method can find a good policy with computational effort comparable to that required to just evaluate a control policy via simulation.

关键词： approximate dynamic programming Stochastic control Convex optimization Value function approximation Supply chain optimization

来源：评论

学校读者我要写书评

暂无评论

Optimizing Trading Decisions for Hydro Storage Systems Using approximate Dual dynamic programming

引用

OPERATIONS RESEARCH 2013年第4期61卷 810-823页

作者： Loehndorf, Nils Wozabal, David Minner, Stefan Vienna Univ Econ & Business A-1020 Vienna Austria Tech Univ Munich D-80333 Munich Germany

We propose a new approach to optimize operations of hydro storage systems with multiple connected reservoirs whose operators participate in wholesale electricity markets. Our formulation integrates short-term intraday with long-term interday decisions. The intraday problem considers bidding decisions as well as storage operation during the day and is formulated as a stochastic program. The interday problem is modeled as a Markov decision process of managing storage operation over time, for which we propose integrating stochastic dual dynamic programming with approximate dynamic programming. We show that the approximate solution converges toward an upper bound of the optimal solution. To demonstrate the efficiency of the solution approach, we fit an econometric model to actual price and inflow data and apply the approach to a case study of an existing hydro storage system. Our results indicate that the approach is tractable for a real-world application and that the gap between theoretical upper and a simulated lower bound decreases sufficiently fast.

关键词： OR in energy stochastic programming Markov decision processes approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

approximated multi-agent fitted Q iteration

引用

SYSTEMS & CONTROL LETTERS 2023年 177卷

作者： Lesage-Landry, Antoine Callaway, Duncan S. Polytech Montreal Dept Elect Engn Mila & GERAD 2500 Polytech Rd Montreal PQ H3T 1J4 Canada Univ Calif Berkeley Energy & Resources Grp 337 Giannini Hall Berkeley CA 94720 USA

We formulate an efficient approximation for multi-agent batch reinforcement learning, the approxi-mated multi-agent fitted Q iteration (AMAFQI). We present a detailed derivation of our approach. We propose an iterative policy search and show that it yields a greedy policy with respect to multiple approximations of the centralized, learned Q-function. In each iteration and policy evaluation, AMAFQI requires a number of computations that scales linearly with the number of agents whereas the analogous number of computations increase exponentially for the fitted Q iteration (FQI), a commonly used approaches in batch reinforcement learning. This property of AMAFQI is fundamental for the design of a tractable multi-agent approach. We evaluate the performance of AMAFQI and compare it to FQI in numerical simulations. The simulations illustrate the significant computation time reduction when using AMAFQI instead of FQI in multi-agent problems and corroborate the similar performance of both approaches. & COPY;2023 Elsevier B.V. All rights reserved.

关键词： approximate dynamic programming Batch reinforcement learning Markov decision process Multi-agent reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Opportunities for reinforcement learning in stochastic dynamic vehicle routing

引用

COMPUTERS & OPERATIONS RESEARCH 2023年 150卷

作者： Hildebrandt, Florentin D. Thomas, Barrett W. Ulmer, Marlin W. Otto von Guericke Univ Dept Management Sci Magdeburg Germany Univ Iowa Dept Business Analyt Iowa City IA USA

There has been a paradigm-shift in urban logistic services in the last years;demand for real-time, instant mobility and delivery services grows. This poses new challenges to logistic service providers as the underlying stochastic dynamic vehicle routing problems (SDVRPs) require anticipatory real-time routing actions. The complexity of finding efficient routing actions is multiplied by the challenge of evaluating such actions with respect to their effectiveness given future dynamism and uncertainty. Reinforcement learning (RL) is a promising tool for evaluating actions but it is not designed for searching the complex and combinatorial action space. Thus, past work on RL for SDVRP has either restricted the action space, that is solving only subproblems by RL and everything else by established heuristics, or focused on problems that reduce to resource allocation problems. For solving real-world SDVRPs, new strategies are required that address the combined challenge of combinatorial, constrained action space and future uncertainty, but as our findings suggest, such strategies are essentially non-existing. Our survey paper shows that past work relied either on action-space restriction or avoided routing actions entirely and highlights opportunities for more holistic solutions.

关键词： Stochastic dynamic vehicle routing Reinforcement learning approximate dynamic programming Mixed integer programming Combinatorial optimization Survey

来源：评论

学校读者我要写书评

暂无评论

dynamic multistage scheduling for patient-centered care plans

引用

HEALTH CARE MANAGEMENT SCIENCE 2021年第4期24卷 827-844页

作者： Diamant, Adam York Univ Schulich Sch Business 111 Ian Macdonald Blvd Toronto ON M3J 1P3 Canada

We investigate the scheduling practices of multistage outpatient health programs that offer care plans customized to the needs of their patients. We formulate the scheduling problem as a Markov decision process (MDP) where patients can reschedule their appointment, may fail to show up, and may become ineligible. The MDP has an exponentially large state space and thus, we introduce a linear approximation to the value function. We then formulate an approximate dynamic program (ADP) and implement a dual variable aggregation procedure. This reduces the size of the ADP while still producing dual cost estimates that can be used to identify favorable scheduling actions. We use our scheduling model to study the effectiveness of customized-care plans for a heterogeneous patient population and find that system performance is better than clinics that do not offer such plans. We also demonstrate that our scheduling approach improves clinic profitability, increases throughput, and decreases practitioner idleness as compared to a policy that mimics human schedulers and a policy derived from a deep neural network. Finally, we show that our approach is fairly robust to errors introduced when practitioners inadvertently assign patients to the wrong care plan.

关键词： Healthcare Appointment scheduling Multiple treatment stages Customized care plans approximate dynamic programming Dual variable aggregation Operations research

来源：评论

学校读者我要写书评

暂无评论

Optimization of cyclic air braking strategy for heavy haul trains: an ADP approach

Optimization of cyclic air braking strategy for heavy haul t...

引用

IEEE Intelligent Transportation Systems Conference (ITSC)

作者： Su, S. Liu, W. Huang, Y. Tang, T. Beijing Jiaotong Univ State Key Lab Traff Control & Safety Frontiers Sci Ctr Smart High Speed Railway Syst Beijing Peoples R China

ISBN: (纸本)9781728191423

The cyclic air braking strategy on the long steep downward slopes is one of the main challenges to the heavy haul train control in China. To overcome this dilemma, this paper proposes an optimization method of cyclic air braking strategy based on the approximate dynamic programming (ADP) algorithm which can achieve low maintenance costs and high running efficiency on the premise of safe operation. The optimization problem is described considering the characteristics of the heavy haul railways in China. Then the cyclic air braking strategy on the long steep downward slopes is formalized as a Markov decision process (MDP) and the critical elements in the ADP methodology are introduced according to the constraints and optimization objectives. Further, the value-iteration based ADP approach is proposed to solve the optimization problem of cyclic air braking strategy. The simulation experiments are carried out with the real-world data of the Shuohuang Line to illustrate the effectiveness of the proposed approach.

关键词： Heavy haul railway Cyclic air braking strategy approximate dynamic programming Driving strategy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：