检索结果-内蒙古大学图书馆

Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

引用

INTERNATIONAL JOURNAL OF CONTROL 2021年第5期94卷 1321-1333页

作者： Tang, Difan Chen, Lei Tian, Zhao Feng Hu, Eric Univ Adelaide Sch Mech Engn Adelaide SA Australia

This study proposes a modified value-function-approximation (MVFA) and investigates its use under a single-critic configuration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-affine continuous-time nonlinear systems. Existing single-critic algorithms require stabilising critic tuning laws while eliminating actor tuning. This paper thus studies alternative single-critic realisation aiming to relax the needs for stabilising mechanisms in the critic tuning law. Optimal control laws are determined from the Hamilton-Jacobi-Bellman equality by solving for the associated value function via SPI in a single-critic configuration. Different from other existing single-critic methods, an MVFA is proposed to deal with closed-loop stability during online learning. Gradient-descent tuning is employed to adjust the critic NN parameters in the interests of not complicating the problem. Parameters convergence and closed-loop system states stability are examined. The proposed MVFA approach yields an alternative single-critic SPI method with uniformly ultimately bounded NN parameter convergence and asymptotic closed-loop system states stability throughout the process of online learning without the need for stabilising mechanisms in the tuning law for critic NN. The proposed approach is verified via simulations.

关键词： Adaptive dynamic programming approximate dynamic programming neural networks nonlinear control optimal control policy iteration

来源：评论

学校读者我要写书评

暂无评论

Nonsmooth Data-Based Reinforcement Learning for Online approximate Optimal Control

Nonsmooth Data-Based Reinforcement Learning for Online Appro...

引用

作者： Greene, Max Lewis University of Florida

学位级别：Ph.D., Doctor of Philosophy

Autonomous systems are often constrained by time-critical mission constraints and limited power. Such constraints motivate optimality in mission execution. Reinforcement learning (RL) has become a tool to facilitate learning of a desired optimal control policies online, which achieve a desired objective. approximate dynamic programming (ADP) is a RL-based techniques that generates a forward-in-time approximation of the optimal optimal value function (and in-turn the control policy) for dynamical systems with continuous state and action spaces. Developments in regional model-based RL (R-MBRL) facilitate improved online approximation of the value function. R-MBRL approximates the value function over a compact set of the state space and facilitates learning by approximating and evaluating the optimal value function at multiple points on this compact set. This dissertation investigates numerous modifications to R-MBRL ADP to improve computational efficiency, for application to a broader class of dynamical systems, and to incorporate different function approximation techniques. These modifications introduce discontinuities into the otherwise smooth signals, which are analyzed via Lyapunov-based techniques. Chapter 3 presents a technique to reduce the computational expense of performing R-MBRL across an arbitrarily large number of points in the state space. Without modification, existing R-MBRL algorithms evaluate the quality of the value function approximation at many user-defined points on the state space using a conventional neural network (NN). The method presented in Chapter 3 improves on the existing techniques by segmenting the state space and using sparse neural networks (SNNs) to facilitate learning. By segmenting the state space, the cognitive agent can switch between different subsets of the state space over which to evaluate the optimal policy. Furthermore, using a SNN reduces the overall number of operations needed to evaluate the optimal policy. Combined, th

关键词： Reinforcement learning Autonomous systems approximate dynamic programming Barrier function transformations Switched systems

来源：评论

学校读者我要写书评

暂无评论

Application of machine learning to assess the value of information in polymer flooding

引用

Petroleum Research 2021年第4期6卷 309-320页

作者： Amine Tadjer Reidar B.Bratvold Aojie Hong Remus Hanea University of Stavanger Norway Equinor Norway

In this work,we provide a more consistent alternative for performing value of information(VOI)analyses to address sequential decision problems in reservoir management and generate insights on the process of reservoir *** sequential decision problems are often solved and modeled as stochastic dynamic programs,but once the state space becomes large and complex,traditional techniques,such as policy iteration and backward induction,quickly become computationally demanding and *** resolve these issues and utilize fewer computational resources,we instead make use of a viable alternative called approximate dynamic programming(ADP),which is a powerful solution technique that can handle complex,large-scale problems and discover a near-optimal solution for intractable sequential decision *** compare and test the performance of several machine learning techniques that lie within the domain of ADP to determine the optimal time for beginning a polymer flooding process within a reservoir development *** approximate dynamic approach utilized here takes into account both the effect of the information obtained before a decision is made and the effect of the information that might be obtained to support future decisions while significantly improving both the timing and the value of the decision,thereby leading to a significant increase in economic performance.

关键词： Value of information Reservoir development plan approximate dynamic programming Machine learning

来源：评论

学校读者我要写书评

暂无评论

Temporal logic guided safe model-based reinforcement learning: A hybrid systems approach

引用

NONLINEAR ANALYSIS-HYBRID SYSTEMS 2023年 47卷

作者： Cohen, Max H. Serlin, Zachary Leahy, Kevin Belta, Calin Boston Univ Dept Mech Engn 110 Cummington Mall Boston MA 02215 USA MIT Lincoln Lab Lexington MA USA

This paper studies the problem of synthesizing control policies for uncertain continuous -time nonlinear systems from linear temporal logic (LTL) specifications using model-based reinforcement learning (MBRL). Rather than taking an abstraction-based approach, we view the interaction between the LTL formula's corresponding Buchi automaton and the nonlinear system as a hybrid automaton whose discrete dynamics match exactly those of the Buchi automaton. To find satisfying control policies, we pose a sequence of optimal control problems associated with states in the accepting run of the automaton and leverage control barrier functions (CBFs) to prevent specification violation. Since solving many optimal control problems for a nonlinear system is computationally intractable, we take a learning-based approach in which the value function of each problem is learned online in real-time. Specifically, we propose a novel off-policy MBRL algorithm that allows one to simultaneously learn the uncertain dynamics of the system and the value function of each optimal control problem online while adhering to CBF-based safety constraints. Unlike related approaches, the MBRL method presented herein decouples convergence, stability, and safety, allowing each aspect to be studied independently, leading to stronger safety guarantees than those developed in related works. Numerical results are presented to validate the efficacy of the proposed method.(c) 2022 Elsevier Ltd. All rights reserved.

关键词： Lyapunov methods Reinforcement learning Adaptive control approximate dynamic programming Temporal logics

来源：评论

学校读者我要写书评

暂无评论

Managing a Hybrid RDC-DC Inventory System

引用

PRODUCTION AND OPERATIONS MANAGEMENT 2021年第10期30卷 3679-3697页

作者： Wang, Tong Yan, Xiaoyue Yang, Chaolin Shanghai Jiao Tong Univ Antai Coll Econ & Management Shanghai Peoples R China Cornell Univ Samuel Curtis Johnson Grad Sch Management Ithaca NY 14853 USA Shanghai Univ Finance & Econ Sch Informat Management & Engn Res Inst Interdisciplinary Sci Shanghai Peoples R China

In this study, we study a hybrid RDC-DC serial inventory system where the regional distribution center (RDC) replenishes its stock from an outside supplier (OS), while the distribution center (DC) faces random demand and replenishes its stock from the RDC. Unlike in the traditional serial system, the DC itself can replenish its inventory from outside as well. We firstly derive structural properties for the optimal long-run average cost and the optimal stationary policy by vanishing discount approach, and then propose two simple and easy-to-implement policies. The first policy, which we call the three-index policy, combines the characteristics of the echelon-base-stock policy for the serial system (Clark and Scarf. 1960. Management Sci.6(4): 475-490) and the dual-index policy for the dual-sourcing system (Veeraraghavan and Scheller-Wolf. 2008. Oper. Res.56(4): 850-864). We show that the order-up-to level of the DC from the RDC can be computed by a newsboy fractile. A simulation-based optimization procedure for the policy is provided. We then develop the approximate linear programming (ALP) policy based on the three-index policy and the multimodularity of the problem. This policy applies the linear programming approach to approximately solve the value function of the dynamic programming formulation. Numerical results show that both the three-index policy and the ALP policy are comparable to the optimal policy computed via dynamic programming, and the latter performs slightly better. Moreover, the OS of the DC can draw considerable cost savings under both policies. We also conduct a numerical study with problem parameters calibrated using actual data from a consumer goods company in China to glen insights on the management of the system.

关键词： inventory management serial system dual sourcing approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Sequential learning based re-optimization approaches for less model-based dynamic pick-up routing problem

引用

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE-OPERATIONS & LOGISTICS 2024年第1期11卷

作者： Yu, Wu Southwest Jiaotong Univ Sch Transportat & Logist Chengdu 611756 Sichuan Peoples R China Natl United Engn Lab Integrated & Intelligent Tran Chengdu Peoples R China Natl Engn Lab Big Data Applicat Integrated Transpo Chengdu Peoples R China

We address a lessmodel-based dynamic routing problem arising from home parcel pick-up service, where lessmodel-based means existing customers who dynamically request services independently following Poisson process with a stochastic rate. Overall, through an extended application of re-optimization (RO) strategy, a Markov decision process formulation and approximation dynamic programming- and Bayes' theorem-based solution approaches are proposed. Specifically, first a pool of basic policies corresponding to all possible values of the rate are developed offline via approximate value iteration. Then, Bayes' theorem-based sequential learning is designed that can sequentially update the belief about the rate's probability distribution over its possible values. Third, coupled with the updated belief, basic policies are collectively implemented in two different ways, resulting in RO approaches which involve constructions of two different online policies, i.e. a belief-weighted deterministic policy and a belief-based random policy, and their re-optimizations at decision epochs. In the numerical study, through comparison with model-based (i.e. using full knowledge of the rate) and model-free heuristics, our approaches are examined and valuable insights are obtained. Important insights include that (i) the belief-weighted deterministic policy outperforms the belief-based random policy, and further, (ii) the former is better than the latter at preserving the improvement resulting from the improved model-based policy.

关键词： Vehicle routing less model-based approximate dynamic programming Bayes' theorem sequential learning

来源：评论

学校读者我要写书评

暂无评论

Real-time dispatch of integrated electricity and thermal system incorporating storages via a stochastic dynamic programming with imitation learning

引用

INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS 2023年第1期153卷

作者： Pan, Zhenning Yu, Tao Huang, Wenqi Wu, Yufeng Chen, Junbin Zhu, Kedong Lu, Jidong South China Univ Technol Sch Elect Power Engn Guangzhou 510640 Peoples R China CSG Digital Grid Res Inst Co Ltd Guangzhou 510670 Peoples R China

Coordinated dispatch of integrated electricity and thermal system (IETS) provides extra operation flexibility which is further improved by integration of electrical and thermal storages. However, the problem non-convexity and multiple uncertainties hinder its optimal real-time dispatch. This paper proposes an approximate dynamic programming with imitation learning (ADP-IL) based real-time dispatch policy for IETS with electrical and thermal storages, which is computationally efficient and adaptive to uncertainties while satisfying complex networks constraints. First, real-time dispatch of IETS is reformulated as a multistage stochastic sequential optimization and non-convex terms are addressed by mixed integer programming. Next, an ADP which exploits value function monotonicity is employed to temporally decompose original problem and an off-line pre-learning is incorporated to address dimension complexity. Then, imitation learning, which allows the algorithm to learn from expert demonstration, is introduced to further accelerate off-line pre-learning. After sufficient learning, ADP-IL provides high quality real-time solution with remarkable computation efficiency. Comprehensive studies verify optimality, adaptability, efficiency, and scalability of ADP-IL.

关键词： Integrated electricity and thermal system Stochastic optimization Thermal storage Real-time optimization approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

An exposition of least square Monte Carlo approach for real options valuation

引用

GEOENERGY SCIENCE AND ENGINEERING 2023年 222卷

作者： Ahmadi, Rouholah Bratvold, Reidar Brumer Univ Stavanger Fac Sci & Technol Dept Energy Resources Stavanger Norway Natl IOR Centre Norway Bergen Norway

The least square Monte Carlo simulation (LSM) approach is a state-of-the-art approach built upon approximate dynamic programming for the selection of single or multiple exercise options, and it has been extensively used for sequential decision-making and real options valuation. Although it has been broadly discussed and employed in many real-world applications, relatively less attention has been given to some of its implementation details. This paper aims to contribute to an improved understanding of sequential decision-making and real options valuation using the LSM algorithm. In this paper, we illustrate and argue the impact of only including the in-the -money (ITM) paths and the choice of regression functions for a specific example. A simple oil production problem with two common embedded options (shrink versus expands) has been utilized for the analysis. The analyses conducted in this article for the considered decision situation confirm that when at least one of the options (decision alternatives) is relevant to the out-of-the-money (OOTM) paths, it is crucial to consider all the paths to benefit from all the flexibilities. Furthermore, the choice of regression function is shown to have minor effect on the optimal strategies and the expected project value as long as the options are deeply ITM or OOTM. However, when the options are poorly ITM or OOTM, we might reach a different solution than what would have been achieved had we used an exact dynamic programming approach. In addition, excluding OOTM paths from the analysis adversely affects the option valuation part of the LSM for the current application, leading to lower project values. In contrast, the impact of path exclusion on regression performance is less emphasized. This work also verifies the robustness of the LSM approach for the selection of the polynomial order used for tuning the regression function. While some of the findings herein are problem-specific, a similar methodology can be used to evaluate

关键词： Uncertainty analysis BDH problem approximate dynamic programming Regression Sequential decision-making Optimal decision policy

来源：评论

学校读者我要写书评

暂无评论

Value-gradient iteration with quadratic approximate value functions

引用

ANNUAL REVIEWS IN CONTROL 2023年 56卷

作者： Yang, Alan Boyd, Stephen Stanford Univ Dept Elect Engn Stanford CA 94305 USA

We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem, typically a quadratic program, which can be carried out reliably in real-time. Such policies often perform well even when the approximate value function is not a particularly good approximation of the true value function. We propose value-gradient iteration, which fits the gradient of value function, with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning. We find that the method can find a good policy with computational effort comparable to that required to just evaluate a control policy via simulation.

关键词： approximate dynamic programming Stochastic control Convex optimization Value function approximation Supply chain optimization

来源：评论

学校读者我要写书评

暂无评论

Critical chain based Proactive-Reactive scheduling for Resource-Constrained project scheduling under uncertainty

引用

EXPERT SYSTEMS WITH APPLICATIONS 2023年 214卷

作者： Peng, Wuliang Lin, Xuejun Li, Haitao Yantai Univ Sch Econ & Management Yantai Peoples R China Univ Missouri St Louis Coll Business Adm St Louis MO USA

Project scheduling problems under both resource constraints and uncertainty have been widely studied due to their real world relevance. In this paper, we design and implement a new integrated proactive-reactive solution approach based on the critical chain method (CCM) to proactively generate a robust and reliable baseline schedule for the class of resource-constrained project scheduling problem (RCPSP) under uncertainty. A discretetime Markov decision process model is applied for the reactive scheduling phase, which embeds the look-up table method in reinforcement learning to dynamically schedule and adjust schedule reactively using the baseline schedule during project execution. The cost values in the look-up table are calculated based on the occupation of a project buffer and feeding buffers in the baseline schedule generated by the CCM. We conduct computation experiments on the benchmark instances to test our algorithm. The results show that our approach is able to obtain quality solutions efficiently, and competitive with the benchmark algorithms for small- and medium-sized instances.

关键词： Stochastic resource-constrained project scheduling Critical chain method Proactive-reactive scheduling approximate dynamic programming Look-up table

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：