检索结果-内蒙古大学图书馆

Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

作者： Thomy Phan Lenz Belzner Thomas Gabor Kyrill Schmid LMU Munich Munich Germany

Making decisions is a great challenge in distributed autonomous environments due to enormous state spaces and uncertainty. Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions. However, planning often has to be performed under strict computational constraints making online planning in multi-agent systems highly limited, which could lead to poor system performance, especially in stochastic domains. In this paper, we propose Emergent value function approximation for Distributed Environments (EVADE), an approach to integrate global experience into multi-agent online planning in stochastic domains to consider global effects during local planning. For this purpose, a value function is approximated online based on the emergent system behaviour by using methods of reinforcement learning. We empirically evaluated EVADE with two statistical multi-agent online planning algorithms in a highly complex and stochastic smart factory environment, where multiple agents need to process various items at a shared set of machines. Our experiments show that EVADE can effectively improve the performance of multi-agent online planning while offering efficiency w.r.t. the breadth and depth of the planning process.

关键词： multi-agent planning online planning value function approximation

来源：评论

学校读者我要写书评

暂无评论

Improving value function approximation in Factored POMDPs by Exploiting Model Structure 15

Improving Value Function Approximation in Factored POMDPs by...

引用

International Conference on Autonomous Agents and Multiagent Systems

作者： Tiago S. Veiga Matthijs T. J. Spaan Pedro U. Lima Institute for Systems Robotics Instituto Superior Tecnico Universidade de Lisboa Delft University of Technology Delft The Netherlands

ISBN: (纸本)9781450337717

Linear value function approximation in Markov decision processes (MDPs) has been studied extensively, but there are several challenges when applying such techniques to partially observable MDPs (POMDPs). Furthermore, the system designer often has to choose a set of basis functions. We propose an automatic method to derive a suitable set of basis functions by exploiting the structure of factored models. We experimentally show that our approximation can reduce the solution size by several orders of magnitude in large problems.

关键词： POMDP value function approximation

来源：评论

学校读者我要写书评

暂无评论

NUMERICAL REALIZATION OF THE MORTENSEN OBSERVER VIA A HESSIAN-AUGMENTED POLYNOMIAL approximation OF THE value function

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2025年第1期47卷 A181-A206页

作者： Breiten, Tobias Kunisch, Karl k. Schroeder, Jesper Tech Univ Berlin Inst Math MA 44 D-10623 Berlin Germany Karl Franzens Univ Graz Inst Math & Sci Comp A-8010 Graz Austria Austrian Acad Sci Johann Radon Inst A-4040 Linz Austria

Two related numerical schemes for the realization of the Mortensen observer or minimum energy estimator for the state reconstruction of nonlinear dynamical systems subject to deterministic disturbances are proposed and compared. Both approaches rely on a polynomial approximation of the value function associated with the energy of the disturbances of the system. Such an approximation is obtained via interpolation considering not only the values but also first and second order derivatives of the value function in a set of sampling points. The scheme is applied to four examples, and the results are compared with the well-known extended Kalman filter.

关键词： nonlinear observer design minimum energy estimation Hamilton--Jacobi--Bellman equation value function approximation

来源：评论

学校读者我要写书评

暂无评论

Controlled approximation of the value function in stochastic dynamic programming for multi-reservoir systems

引用

COMPUTATIONAL MANAGEMENT SCIENCE 2015年第4期12卷 539-557页

作者： Zephyr, Luckny Lang, Pascal Lamond, Bernard F. Univ Laval Operat & Decis Syst Dept Pavillon Palasis Prince2325 Rue Terrasse Quebec City PQ G1V 0A6 Canada

We present a newapproach for adaptive approximation of the value function in stochastic dynamic programming. Under convexity assumptions, our method is based on a simplicial partition of the state space. Bounds on the value function provide guidance as to where refinement should be done, if at all. Thus, the method allows for a trade-off between solution time and accuracy. The proposed scheme is experimented in the particular context of hydroelectric production across multiple reservoirs.

关键词： value function approximation Stochastic dynamic programming Simplicial decomposition Regular grids Separable grids Reservoir networks

来源：评论

学校读者我要写书评

暂无评论

Tacit knowledge-informed approximate dynamic programming for last-mile delivery operations in online-to-offline pharmacies

引用

INDUSTRIAL MANAGEMENT & DATA SYSTEMS 2025年第3期125卷 1078-1109页

作者： Yang, Xuan Luo, Hao Nie, Xinyao Kong, Xiangtianrui Shenzhen Univ Coll Econ Postdoctoral Res Stn Theoret Econ Shenzhen Peoples R China Shenzhen Univ Coll Econ Dept Supply Chain Management Shenzhen Peoples R China

PurposeTacit knowledge in frontline operations is primarily reflected in the holders' intuition about dynamic systems. Despite the implicit nature of tacit knowledge, the understanding of complex systems it encapsulates can be displayed through formalization methods. This study seeks to develop a methodology for formalizing tacit knowledge in a dynamic delivery ***/methodology/approachThis study employs a structured survey to gather experiential knowledge from dispatchers engaged in last-mile delivery operations. This knowledge is then formalized using a value function approximation approach, which transforms tacit insights into structured inputs for dynamic decision-making. We apply this methodology to optimize delivery operations in an online-to-offline pharmacy *** raw system feature data are not strongly correlated with the system's development trends, making them ineffective for guiding dynamic decision-making. However, the system features obtained through preprocessing the raw data increase the predictiveness of dynamic decisions and improve the overall effectiveness of decision-making in delivery *** limitations/implicationsThis research provides a foundational framework for studying sequential dynamic decision problems, highlighting the potential for improved decision quality and system optimization through the formalization and integration of tacit *** implicationsThis approach proposed in this study offers a method to preserve and formalize critical operational expertise. By embedding tacit knowledge into decision-making systems, organizations can enhance real-time responsiveness and reduce operational ***/valueThis study presents a novel approach to integrating tacit knowledge into dynamic decision-making frameworks, demonstrated in a real-world last-mile delivery context. Unlike previous research that focuses primarily on explicit data-driven methods, our approach leverages the impli

关键词： Tacit knowledge Dynamic delivery value function approximation Last-mile delivery

来源：评论

学校读者我要写书评

暂无评论

A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests

引用

COMPUTERS & INDUSTRIAL ENGINEERING 2025年 200卷

作者： Akkerman, Fabian Mes, Martijn van Jaarsveld, Willem Univ Twente Ind Engn & Business Informat Syst NL-7500 AE Enschede Netherlands Eindhoven Univ Technol Operat Planning Accounting & Control NL-5600 MB Eindhoven Netherlands

This paper presents directions for using reinforcement learning with neural networks for dynamic vehicle routing problems (DVRPs). DVRPs involve sequential decision-making under uncertainty where the expected future consequences are ideally included in current decision-making. A frequently used framework for these problems is approximate dynamic programming (ADP) or reinforcement learning (RL), often in conjunction with a parametric value function approximation (VFA). A straightforward way to use VFA in DVRP is linear regression (LVFA), but more complex, non-linear predictors, e.g., neural network VFAs (NNVFA), are also widely used. Alternatively, we may represent the policy directly, using a linear policy function approximation (LPFA) or neural network PFA (NNPFA). The abundance of policies and design choices complicate the use of neural networks for DVRPs in research and practice. We provide a structured overview of the similarities and differences between the policy classes. Furthermore, we present an empirical comparison of LVFA, LPFA, NNVFA, and NNPFA policies. The comparison is conducted on several problem variants of the DVRP with stochastic customer requests. To validate our findings, we study realistic extensions of the stylized problem on (i) a same-day parcel pickup and delivery casein the city of Amsterdam, the Netherlands, and (ii) the routing of robots in an automated storage and retrieval system (AS/RS). Based on our empirical evaluation, we provide insights into the advantages and disadvantages of neural network policies compared to linear policies, and value-based approaches compared to policy-based approaches.

关键词： Dynamic vehicle routing Stochastic customer requests value function approximation Policy function approximation Neural networks Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

value function approximations via Kernel Embeddings for No-Regret Reinforcement Learning 14

Value Function Approximations via Kernel Embeddings for No-R...

引用

Asian Conference on Machine Learning (ACML)

作者： Chowdhury, Sayak Ray Oliveira, Rafael Microsoft Res Bengaluru India Univ Sydney Sydney NSW Australia

We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting. In many real-world RL environments, the state and action spaces are continuous or very large. Existing approaches establish regret guarantees by either a low-dimensional representation of the stochastic transition model or an approximation of the Q-functions. However, the understanding of function approximation schemes for state-value functions largely remains missing. In this paper, we propose an online model-based RL algorithm, namely the CME-RL, that learns embeddings of the state-transition distribution in a reproducing kernel Hilbert space while carefully balancing the exploitation-exploration tradeoff. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order (O) over tilde (H-gamma N root N)(1), where H is the episode length, N is the total number of time steps and gamma(N) is an information theoretic quantity relating the effective dimension of the state-action feature space. Our method bypasses the need for estimating transition probabilities and applies to any domain on which kernels can be defined. It also brings new insights into the general theory of kernel methods for approximate inference and RL regret minimization.

关键词： Model-based RL value function approximation Kernel mean embeddings

来源：评论

学校读者我要写书评

暂无评论

Turning from crime: A dynamic perspective

引用

JOURNAL OF ECONOMETRICS 2008年第1-2期145卷 158-173页

作者： Sickles, Robin C. Williams, Jenny Rice Univ Dept Econ Houston TX 77251 USA Univ Melbourne Melbourne Vic 3010 Australia

This paper examines criminal choice using a variant of the human capital model. The innovation of our approach is that it attempts to disaggregate individual capital, not unlike production-based studies which disaggregate physical capital into equipment and structures. We disaggregate an individual's capital stock into the standard human capital component as well as a utility generating component that we call social capital, In our set-up, social capital is used to account for the influence of social norms on the decision to participate in crime. This is done by modeling the stigma of arrest as a reduction in the individual's social capital stock. We also allow individuals to account for the impact of their criminal actions on their probability of arrest. In order to estimate the structural parameters underlying the model, we make use of computationally intensive methods involving simulated generalized method of moments and value function approximation. The empirical results, based on panel data from the Delinquency in a Birth Cohort II Study, support the social capital model of crime and reveal significant state dependence in the decision to participate in crime. (C) 2008 Published by Elsevier B.V.

关键词： social capital dynamic model panel data value function approximation simulated method of moments

来源：评论

学校读者我要写书评

暂无评论

Incentivized self-rebalancing fleet in electric vehicle sharing

引用

IISE TRANSACTIONS 2021年第2期54卷 173-185页

作者： Wu, Yuguang Chen, Minmin Wang, Xin Univ Wisconsin Madison Dept Ind & Syst Engn Madison WI 53706 USA Amazon Com Serv LLC Seattle WA USA Univ Wisconsin Madison Grainger Inst Engn Madison WI 53706 USA

With the rising need for efficient and flexible short-distance urban transportation, more vehicle sharing companies are offering one-way car-sharing services. Electrified vehicle sharing systems are even more effective in terms of reducing fuel consumption and carbon emission. In this article, we investigate a dynamic fleet management problem for an Electric Vehicle (EV) sharing system that faces time-varying random demand and electricity price. Demand is elastic in each time period, reacting to the announced price. To maximize the revenue, the EV fleet optimizes trip pricing and EV dispatching decisions dynamically. We develop a new value function approximation with input convex neural networks to generate high-quality solutions. Through a New York City case study, we compare it with standard dynamic programming methods and develop insights regarding the interaction between the EV fleet and the power grid.

关键词： Dynamic programming revenue management vehicle sharing electric vehicle value function approximation

来源：评论

学校读者我要写书评

暂无评论

Applying unweighted least-squares based techniques to stochastic dynamic programming: theory and application

引用

IET CONTROL THEORY AND APPLICATIONS 2019年第15期13卷 2387-2398页

作者： Forootani, Ali Iervolino, Raffaele Tipaldi, Massimo Univ Sannio Dept Engn Piazza Roma 21 I-82100 Benevento Italy Univ Naples Federico II Dept Elect Engn & Informat Technol Via Claudio 21 I-80125 Naples Italy

Big data and the curse of dimensionality are common vocabularies that researchers in different communities have recently been dealing with, e.g. dynamic programming (DP) in automatic control system society. A novel unweighted sampled based least square projection approach is proposed in this study to address the issue of the large state space in the DP optimisation problem. The method, in particular, takes into account both contraction mapping and monotonicity properties of the DP algorithm for value function approximation. Specifically, the batch of samples are gathered by uniform probability distribution at first, and an unweighted LS sub-problem in the subspace is solved. As the case study, a new Markov decision process model associated with a resource allocation problem is considered to illustrate the technique and evaluate its effectiveness. It is noted that the approach can be employed for different applications as well. Moreover, a MATLAB based software is developed to implement and examine different parts of the proposed method. Simulation examples are considered to support the results of the approach via developed software. The idea makes a connection between the recent advances in big data analysis and approximate DP as well.

关键词： mathematics computing stochastic processes function approximation Markov processes data analysis probability Big Data resource allocation dynamic programming least squares approximations stochastic dynamic programming automatic control system society square projection approach DP optimisation problem monotonicity properties DP algorithm value function approximation uniform probability distribution unweighted LS sub-problem Markov decision process model resource allocation problem MATLAB based software big data analysis contraction mapping unweighted least-squares based techniques

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：