检索结果-内蒙古大学图书馆

Algorithmic Survey of Parametric value function approximation

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013年第6期24卷 845-867页

作者： Geist, Matthieu Pietquin, Olivier Supelec IMS MaLIS Res Grp F-57070 Metz France

Reinforcement learning (RL) is a machine learning answer to the optimal control problem. It consists of learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. A recurrent subtopic of RL concerns computing an approximation of this value function when the system is too large for an exact representation. This survey reviews state-of-the-art methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual, and projected fixed-point approaches. Related algorithms are derived by considering one of the associated cost functions and a specific minimization method, generally a stochastic gradient descent or a recursive least-squares approach.

关键词： Reinforcement learning (RL) survey value function approximation

来源：评论

学校读者我要写书评

暂无评论

Local and soft feature selection for value function approximation in batch reinforcement learning for robot navigation

引用

JOURNAL OF SUPERCOMPUTING 2024年第8期80卷 10720-10745页

作者： Fathinezhad, Fatemeh Adibi, Peyman Shoushtarian, Bijan Chanussot, Jocelyn Univ Isfahan Fac Comp Engn Artificial Intelligence Dept Esfahan Iran Univ Grenoble Alpes Grenoble INP GIPSA Lab CNRS Grenoble France

This paper proposes a novel method for robot navigation in high-dimensional environments that reduce the dimension of the state space using local and soft feature selection. The algorithm selects relevant features based on local correlations between states, avoiding duplicate inappropriate information and adjusting sensor values accordingly. By optimizing the value function approximation based on the local weighted features of states in the reinforcement learning process, the method shows improvements in the robot's motion flexibility, learning time, the distance traveled to reach its goal, and the minimization of collisions with obstacles. This approach was tested on an E-puck robot using the Webots robot simulator in different test environments.

关键词： Reinforcement learning value function approximation Local relevance feature selection Robot navigation

来源：评论

学校读者我要写书评

暂无评论

Controller design and value function approximation for nonlinear dynamical systems

引用

AUTOMATICA 2016年 67卷 54-66页

作者： Korda, Milan Henrion, Didier Jones, Colin N. Ecole Polytech Fed Lausanne Lab Automat Stn 9 CH-1015 Lausanne Switzerland CNRS LAAS 7 Ave Colonel Roche F-31400 Toulouse France Univ Toulouse LAAS F-31400 Toulouse France Czech Tech Univ Fac Elect Engn CZ-16626 Prague Czech Republic

This work considers the infinite-time discounted optimal control problem for continuous time input-affine polynomial dynamical systems subject to polynomial state and box input constraints. We propose a sequence of sum-of-squares (SOS) approximations of this problem obtained by first lifting the original problem into the space of measures with continuous densities and then restricting these densities to polynomials. These approximations are tightenings, rather than relaxations, of the original problem and provide a sequence of rational controllers with value functions associated to these controllers converging (under some technical assumptions) to the value function of the original problem. In addition, we describe a method to obtain polynomial approximations from above and from below to the value function of the extracted rational controllers, and a method to obtain approximations from below to the optimal value function of the original problem, thereby obtaining a sequence of asymptotically optimal rational controllers with explicit estimates of suboptimality. Numerical examples demonstrate the approach. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： Optimal control Nonlinear control Sum-of-squares Semidefinite programming Occupation measures value function approximation

来源：评论

学校读者我要写书评

暂无评论

Least Absolute Policy Iteration-A Robust Approach to value function approximation

引用

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 2010年第9期E93D卷 2555-2565页

作者： Sugiyama, Masashi Hachiya, Hirotaka Kashima, Hisashi Morimura, Tetsuro Tokyo Inst Technol Dept Comp Sci Tokyo 1528552 Japan Japan Sci & Technol Agcy PRESTO Tokyo 1528552 Japan Univ Tokyo Dept Math Informat Tokyo 1138656 Japan IBM Res Tokyo Yamato 2428502 Japan

Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.

关键词： reinforcement learning value function approximation least-squares policy iteration outlier l(1)-loss function linear programming

来源：评论

学校读者我要写书评

暂无评论

Leveraging Statistical Multi-Agent Online Planning with Emergent value function approximation 17

Leveraging Statistical Multi-Agent Online Planning with Emer...

引用

17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)

作者： Phan, Thomy Belzner, Lenz Gabor, Thomas Schmid, Kyrill Ludwig Maximilians Univ Munchen Inst Informat Munich Germany

ISBN: (纸本)9781450356497

Making decisions is a great challenge in distributed autonomous environments due to enormous state spaces and uncertainty. Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions. However, planning often has to be performed under strict computational constraints making online planning in multi-agent systems highly limited, which could lead to poor system performance, especially in stochastic domains. In this paper, we propose Emergent value function approximation for Distributed Environments (EVADE), an approach to integrate global experience into multi-agent online planning in stochastic domains to consider global effects during local planning. For this purpose, a value function is approximated online based on the emergent system behaviour by using methods of reinforcement learning. We empirically evaluated EVADE with two statistical multi-agent online planning algorithms in a highly complex and stochastic smart factory environment, where multiple agents need to process various items at a shared set of machines. Our experiments show that EVADE can effectively improve the performance of multi-agent online planning while offering efficiency w.r.t. the breadth and depth of the planning process.

关键词： multi-agent planning online planning value function approximation

来源：评论

学校读者我要写书评

暂无评论

Improving value function approximation in Factored POMDPs by Exploiting Model Structure 14

Improving Value Function Approximation in Factored POMDPs by...

引用

14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)

作者： Veiga, Tiago S. Spaan, Matthijs T. J. Lima, Pedro U. Univ Lisbon Inst Super Tecn Inst Syst Robot Lisbon Portugal Delft Univ Technol Delft Netherlands

ISBN: (纸本)9781450334136

Linear value function approximation in Markov decision processes (MDPs) has been studied extensively, but there are several challenges when applying such techniques to partially observable MDPs (POMDPs). Furthermore, the system designer often has to choose a set of basis functions. We propose an automatic method to derive a suitable set of basis functions by exploiting the structure of factored models. We experimentally show that our approximation can reduce the solution size by several orders of magnitude in large problems.

关键词： POMDP value function approximation

来源：评论

学校读者我要写书评

暂无评论

Power System Maintenance Planning Using value function approximation

Power System Maintenance Planning Using Value Function Appro...

引用

International Conference on Probabilistic Methods Applied to Power Systems (PMAPS)

作者： Abeygunawardane, Saranga K. Jirutitijaroen, Panida Xu, Huan Univ Moratuwa Dept Elect Engn Moratuwa Sri Lanka Natl Univ Singapore Dept Elect & Comp Engn Singapore 117548 Singapore Natl Univ Singapore Dept Mech Engn Singapore 117548 Singapore

ISBN: (纸本)9781479935611

Power system maintenance planning is vital for conducting maintenance of power system equipment in an optimal manner. A maintenance model of a system with number of equipment has several states. Solving such a system model would be computationally intractable. This paper investigates the possibility of conducting system level maintenance planning by using a component specific Markov decision process (MDP) model developed for maintenance of equipment. In order to find optimal actions for the system, a value function is approximated using component specific MDP models. In a case study, the optimal actions obtained using the value function approximation (VFA) method are verified by comparing them with the optimal actions obtained using an MDP model developed for the full system.

关键词： dynamic programming asset management maintenance Markov decision processes transformers value function approximation

来源：评论

学校读者我要写书评

暂无评论

Feature Selection for value function approximation

Feature Selection for Value Function Approximation

引用

作者： Taylor, Gavin Duke University

学位级别：Ph.D.

The field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection. Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains.

关键词： Artificial Intelligence Computer Science Feature Selection Reinforcement Learning value function approximation

来源：评论

学校读者我要写书评

暂无评论

Integrating Symmetry of Environment by Designing Special Basis functions for value function approximation in Reinforcement Learning 14

Integrating Symmetry of Environment by Designing Special Bas...

引用

14th International Conference on Control, Automation, Robotics and Vision (ICARCV)

作者： Wang, Guo-fang Fang, Zhou Li, Bo Li, Ping Zhejiang Univ ZJU Sch Aeronaut & Astronaut Hangzhou Zhejiang Peoples R China Zhejiang Univ ZJU Dept Control Sci & Engn Hangzhou Zhejiang Peoples R China

ISBN: (纸本)9781509035496

Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.

关键词： Reinforcement Learning Symmetric basis functions value function approximation S-LSPI

来源：评论

学校读者我要写书评

暂无评论

Combining value function approximation and multiple scenario approach for the effective management of ride-hailing services

引用

EURO JOURNAL ON TRANSPORTATION AND LOGISTICS 2023年 12卷

作者： Heitmann, R. Julius O. Soeffker, Ninja Ulmer, Marlin W. Mattfeld, Dirk C. Tech Univ Carolo Wilhelmina Braunschweig Decis Support Grp Braunschweig Germany Univ Wien Dept Business Decis & Analyt Vienna Austria Otto von Guericke Univ Chair Management Sci Magdeburg Germany

The availability of various services for individual mobility is increasing, especially in urban areas. Dynamic ride-hailing services address these aspects and are gaining market share with providers such as MOIA, UberX Share, Sprinti or BerlKonig. To be able to offer competitive pricing for such a service and at the same time provide a high service quality (e.g. fast response times), effective capacity management is needed. In order to reach this goal, two challenges have to be met by the service provider. On the one hand, a proper demand control has to be installed, which optimizes the responses to transportation requests from customers. On the other hand, suitable fleet control needs to be set in place to optimize the route of the fleet so that the demand can be met. Papers in the literature do solve both but typically focus on one of these two challenges. As an example, value function approximation (VFA) can be used to learn a service offering decision while anticipating future incoming requests. A typical example of a routing-focused method is the multiple scenario approach (MSA) creating a routing which anticipates future requests using a sampling method. In this paper, we combine VFA and MSA to address the two challenges in an effective way. The resulting method is called anticipatory-routing-and-service-offering (ARS). We find that the combined method significantly outperforms the individual components, improving not only the total reward but also the accepted requests. It is found that this performance is particularly high with a heavy workload and thus resources are relatively scarce. We analyse how and under which conditions the components together or individually are particularly important.

关键词： Ride-sharing Dial-a-ride Dynamic vehicle routing value function approximation Multiple scenario approach

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：