检索结果-内蒙古大学图书馆

value function approximation for dynamic multi-period vehicle routing

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2018年第3期269卷 883-899页

作者： Ulmer, Marlin W. Soeffker, Ninja Mattfeld, Dirk C. Tech Univ Carolo Wilhelmina Braunschweig Inst Wirtschaftsinformat Carl Friedrich Gauss Fak Muhlenpfordtstr 23 D-38106 Braunschweig Germany

In practical applications like parcel or technician services, customers request service during the day. Service providers decide whether to accept a customer for same-day service or to defer a customer due to resource limitations. Some requests are therefore postponed to the following day. To satisfy customer expectations, service providers aim on a high number of same-day services. Still, acceptance decisions not only affect the performance on the current, but also on the following day. Suitable acceptance, postponement, and routing decisions therefore should anticipate future routing and requests in both the current and the next day(s). The resulting decision problem is a dynamic multi-period vehicle routing problem with stochastic service requests. To approximately solve the Markov decision process of the presented problem, we present an anticipatory dynamic policy based on approximate dynamic programming. This policy estimates the potential of problem states with respect to future same-period services within and over the periods. Our policy draws on value function approximation, state space aggregation, and on a classification of the periods. We compare our policy to several policies from the literature. We analyze how and when multi-period anticipation improves the solution quality significantly and how the newly developed state space classification is essential to achieve anticipation. We finally show how multi-period anticipation changes the acceptance behavior to less discrimination of rural customers and to a fairer geographical distribution of same-day services in comparison to single-period anticipation. (C) 2018 Elsevier B.V. All rights reserved.

关键词： Routing Dynamic vehicle routing Multi-periodicity Stochastic requests value function approximation

来源：评论

学校读者我要写书评

暂无评论

value function approximation in the presence of uncertainty and inequality constraints - An application to the demand for credit cards

引用

JOURNAL OF ECONOMIC DYNAMICS & CONTROL 1996年第1-3期20卷 63-92页

作者： Hartley, PR AUSTRALIAN NATL UNIV DEPT ECONCANBERRAACT 0200AUSTRALIA

We present an algorithm for approximating the solution to discrete-time stochastic dynamic programs with inequality constraints. The algorithm exploits the state preference approach to choice under uncertainty to reduce the dimensionality of the state space, and polynomial approximation to improve computational efficiency. It is particularly useful in financial applications involving liquidity and credit constraints and more than one asset. The algorithm provides information about the derivatives of the value function, and therefore risk aversion, along with asset demands.

关键词： value function approximation credit constraints

来源：评论

学校读者我要写书评

暂无评论

A Clustering-Based Graph Laplacian Framework for value function approximation in Reinforcement Learning

引用

IEEE TRANSACTIONS ON CYBERNETICS 2014年第12期44卷 2613-2625页

作者： Xu, Xin Huang, Zhenhua Graves, Daniel Pedrycz, Witold Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China Univ Alberta Dept Elect & Comp Engn Edmonton AB T6G 2V4 Canada King Abdulaziz Univ Fac Engn Dept Elect & Comp Engn Jeddah 21589 Saudi Arabia Polish Acad Sci Syst Res Inst PL-01447 Warsaw Poland

In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

关键词： Approximate dynamic programming clustering learning control Markov decision processes reinforcement learning value function approximation

来源：评论

学校读者我要写书评

暂无评论

Adaptive importance sampling for value function approximation in off-policy reinforcement learning

引用

NEURAL NETWORKS 2009年第10期22卷 1399-1410页

作者： Hachiya, Hirotaka Akiyama, Takayuki Sugiayma, Masashi Peters, Jan Tokyo Inst Technol Dept Comp Sci Meguro Ku Tokyo 1528552 Japan Max Planck Inst Biol Cybernet Dept Scholkopf D-72076 Tubingen Germany

Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy that is different from the currently optimized policy. A common approach is to use importance sampling techniques for compensating for the bias of value function estimators caused by the difference between the data-sampling policy and the target policy. However, existing off-policy methods often do not take the variance of the value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a variant of cross-validation. We demonstrate the usefulness of the proposed approach through simulations. (C) 2009 Elsevier Ltd. All rights reserved.

关键词： Off-policy reinforcement learning value function approximation Policy iteration Adaptive importance sampling Importance-weighted cross-validation Efficient sample reuse

来源：评论

学校读者我要写书评

暂无评论

Geodesic Gaussian kernels for value function approximation

引用

AUTONOMOUS ROBOTS 2008年第3期25卷 287-304页

作者： Sugiyama, Masashi Hachiya, Hirotaka Towell, Christopher Vijayakumar, Sethu Tokyo Inst Technol Dept Comp Sci Meguro Ku Tokyo 1528552 Japan Univ Edinburgh Sch Informat Edinburgh EH9 3JZ Midlothian Scotland

The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.

关键词： reinforcement learning value function approximation Markov decision process least-squares policy iteration Gaussian kernel

来源：评论

学校读者我要写书评

暂无评论

A tutorial on value function approximation for stochastic and dynamic transportation

引用

4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH 2024年第1期22卷 145-173页

作者： Heinold, Arne Univ Kiel Sch Econ & Business Kiel Germany

This paper provides an introductory tutorial on value function approximation (VFA), a solution class from Approximate Dynamic Programming. VFA describes a heuristic way for solving sequential decision processes like a Markov Decision Process. Real-world problems in supply chain management (and beyond) containing dynamic and stochastic elements might be modeled as such processes, but large-scale instances are intractable to be solved to optimality by enumeration due to the curses of dimensionality. VFA can be a proper method for these cases and this tutorial is designed to ease its use in research, practice, and education. For this, the tutorial describes VFA in the context of stochastic and dynamic transportation and makes three main contributions. First, it gives a concise theoretical overview of VFA's fundamental concepts, outlines a generic VFA algorithm, and briefly discusses advanced topics of VFA. Second, the VFA algorithm is applied to the taxicab problem that describes an easy-to-understand transportation planning task. Detailed step-by-step results are presented for a small-scale instance, allowing readers to gain an intuition about VFA's main principles. Third, larger instances are solved by enhancing the basic VFA algorithm demonstrating its general capability to approach more complex problems. The experiments are done with artificial instances and the respective Python scripts are part of an electronic appendix. Overall, the tutorial provides the necessary knowledge to apply VFA to a wide range of stochastic and dynamic settings and addresses likewise researchers, lecturers, tutors, students, and practitioners.

关键词： Tutorial Markov decision process Approximate dynamic programming value function approximation Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Optimized ensemble value function approximation for dynamic programming

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2023年第2期309卷 719-730页

作者： Cervellera, Cristiano Natl Res Council Italy Inst Marine Engn Via Marini 6 I-16149 Genoa Italy

Approximate dynamic programming (ADP) is the standard tool for the solution of multistage dynamic optimization problems under general conditions, such as nonlinear state equation and cost, and continuous state and control spaces. In the typical ADP implementation, the value function is approximated by means of a single model trained over a suitable sampling of the state space. In this paper we investigate the ensemble learning paradigm in the ADP context, which consists in exploiting the outputs of many models trained for the value function approximation. To this purpose, we introduce an optimization scheme for the aggregation of the ensemble outputs, related to the supremum norm error on which the ADP accuracy depends. Furthermore, we show that the ensemble of value function approximations can be used to identify a-priori good state points used to train the approximating models, exploiting an ambiguity-like term tailored to the proposed ensemble optimization scheme. The advantages of ensembles in ADP are showcased both through error analysis and a simulation campaign involving various test problems. Our results show how ensembles obtained through the proposed output weights optimization scheme yield more accurate and robust value function approximations with respect to single elements. At the same time, we show how the ensembles can successfully be employed to select good state samples to be employed as training set for the value function approximations. (c) 2023 Elsevier B.V. All rights reserved.

关键词： Dynamic programming Ensemble models value function approximation

来源：评论

学校读者我要写书评

暂无评论

Robust Approximate Bilinear Programming for value function approximation

引用

JOURNAL OF MACHINE LEARNING RESEARCH 2011年第10期12卷 3027-3063页

作者： Petrik, Marek Zilberstein, Shlomo IBM Corp Thomas J Watson Res Ctr Yorktown Hts NY 10598 USA Univ Massachusetts Dept Comp Sci Amherst MA 01003 USA

value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems.

关键词： value function approximation approximate dynamic programming Markov decision processes

来源：评论

学校读者我要写书评

暂无评论

Meso-parametric value function approximation for dynamic customer acceptances in delivery routing

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2020年第1期285卷 183-195页

作者： Ulmer, Marlin W. Thomas, Barrett W. Tech Univ Carolo Wilhelmina Braunschweig Carl Friedrich Gauss Fak Muhlenpfordtstr 23 D-38106 Braunschweig Germany Univ Iowa Tippie Coll Business 108 John Pappajohn Business Bldg Iowa City IA 52242 USA

The rise of mobile communication, ample computing power, and Amazon's training of customers has led to last-mile delivery challenges and created struggles for companies seeking to budget their limited delivery resources efficiently to generate enough revenue. In this paper, we examine the capacitated customer acceptance problem with stochastic requests (CAPSR), a problem in which a company seeks to maximize expected revenue by accepting or rejecting requests. Each accepted request generates revenue and must be routed, consuming driver time and vehicle capacity. To solve the problem, we introduce a novel method of value function approximation (VFA). Conventionally, VFAs are either parametric (P-VFAs) or non-parametric (N-VFAs). Both VFAs have advantages and shortcomings and their performances rely significantly on the structure of the underlying problem. To combine the advantages and to alleviate the shortcomings of P-VFA and N-VFA used individually, we present a novel method, meso-parametric value function approximation (M-VFA). The results of computational experiments show that the M-VFA outperforms benchmarks for the CAPSR and show M-VFA offers the advantages of the individual VFAs while alleviating their shortcomings. Most importantly, we demonstrate that simultaneous approximations lead to better outcomes than either N- and P-VFA individually or some ex-post combination. (C) 2019 Published by Elsevier B.V.

关键词： Dynamic customer acceptances Dynamic vehicle routing Dynamic multi-dimensional knapsack problem Approximate dynamic programming value function approximation

来源：评论

学校读者我要写书评

暂无评论

High-Order Taylor Expansion-Based Nonlinear value function approximation for Stochastic Economic Dispatch of Active Distribution Network

引用

IEEE TRANSACTIONS ON SMART GRID 2024年第5期15卷 4511-4521页

作者： Luo, Yuhao Zhu, Jianquan Chen, Jiajun Wu, Ruibing Huang, Haojiang Liu, Wenhao Liu, Mingbo South China Univ Technol Sch Elect Power Engn Guangzhou 510640 Peoples R China

The stochastic economic dispatch (SED) problem of active distribution network (ADN) is computationally intractable for traditional algorithms due to the randomness, nonlinearity, and nonconvexity. To solve this problem, we decompose it into a sequence of tractable subproblems, and then employ the high-order Taylor expansion (HTE)-based value functions to estimate the interaction among these subproblems. Compared with traditional value function approximation (VFA) techniques, the proposed HTE-based VFA technique extends the approximate value function from the low-order form to the arbitrarily high-order form, which facilitates describing the nonlinear characteristics. Furthermore, different from commonly used Monte Carlo-based expectation calculation (EC) techniques, which require to re-execute calculation procedures in numerous scenarios, the proposed HTE-based EC technique leverages the nature of HTE to directly obtain the expectation of value function according to the distributions of random variables. Such that the computational burden can be reduced significantly. Finally, the aforementioned two techniques are combined to form an HTE-based economic dispatch algorithm, and then applied to several ADN systems. The numerical simulations fully demonstrate the effectiveness of the proposed algorithm.

关键词： value function approximation high-order Taylor expansion stochastic economic dispatch active distribution network value function approximation high-order Taylor expansion stochastic economic dispatch active distribution network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：