检索结果-内蒙古大学图书馆

An adaptive actor-critic algorithm with multi-step simulated experiences for controlling nonholonomic mobile robots

SOFT COMPUTING 2007年第1期11卷 81-89页

作者： Syam, Rafiuddin Watanabe, Keigo Izumi, Kiyotaka Saga Univ Grad Sch Sci & Engn Dept Adv Syst Control Engn Saga 8408502 Japan

In this paper, we propose a new algorithm of an adaptive actor-critic method with multi-step simulated experiences, as a kind of temporal difference (TD) method. In our approach, the TD-error is composed of two value- functions and m utility functions, where m denotes the number of multi-steps in which the experience should be simulated. The value-function is constructed from the critic formulated by a radial basis function neural network (RBFNN), which has a simulated experience as an input, generated from a predictive model based on a kinematic model. Thus, since our approach assumes that the model is available to simulate the m-step experiences and to design a controller, such a kinematic model is also applied to construct the actor and the resultant model based actor (MBA) is also regarded as a network, i.e., it is just viewed as a resolved velocity control network. We implement this approach to control nonholonomic mobile robot, especially in a trajectory tracking control problem for the position coordinates and azimuth. Some simulations show the effectiveness of the proposed method for controlling a mobile robot with two-independent driving wheels.

关键词： actor-critic algorithms kinematic model multi-step prediction nonholonomic mobile robot nonlinear predictive model simulated experience

来源：评论

学校读者我要写书评

暂无评论

A Hessian actor-critic Algorithm 53

A Hessian Actor-Critic Algorithm

引用

53rd IEEE Annual Conference on Decision and Control (CDC)

作者： Wang, Jing Paschalidis, Ioannis Ch Boston Univ Div Syst Engn 8 St Marys St Boston MA 02215 USA Boston Univ Dept Elect & Comp Engn Boston MA 02215 USA

ISBN: (纸本)9781467360906

We consider Markov Decision Processes (MDPs) following a policy parametrized by a parsimonious set of parameters and seek to optimize the policy over these parameters. In this setting, optimization can be done using a gradient ascent method. If designed well, the parameterized policy can significantly reduce the problem complexity. Existing algorithms usually suffer from slow convergence because they search along the gradient direction in a steepest ascent way. In this paper, we first propose an estimate for the Hessian of the overall reward the decision maker receives. Based on this estimate, we then introduce a new Newton-like method of the actor-critic type. We compare the new algorithm with several existing algorithms in a robotics application and demonstrate that our method exhibits faster convergence.

关键词： actor-critic algorithms Newton's method Markov decision processes Autonomous robots

来源：评论

学校读者我要写书评

暂无评论

Deep intrinsically motivated exploration in continuous control

引用

MACHINE LEARNING 2023年第12期112卷 4959-4993页

作者： Saglam, Baturay Kozat, Suleyman S. Bilkent Univ Dept Elect & Elect Engn EE403 Bilkent TR-06800 Ankara Turkiye

In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. Although the deep setting of undirected exploration has been shown to improve the performance of on-policy methods, they introduce an excessive computational complexity and are known to fail in the off-policy setting. The intrinsically motivated exploration is an effective alsetup and hyper-parameterternative to the undirected strategies, but they are usually studied for discrete action domains. In this paper, we investigate how intrinsic motivation can effectively be combined with deep reinforcement learning in the control of continuous systems to obtain a directed exploratory behavior. We adapt the existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel and scalable directed exploration strategy. The introduced approach, motivated by the maximization of the value function's error, can benefit from a collected set of experiences by extracting useful information and unify the intrinsic exploration motivations in the literature under a single exploration objective. An extensive set of empirical studies demonstrate that our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.

关键词： Deep reinforcement learning Exploration Intrinsic motivation actor-critic algorithms Continuous control

来源：评论

学校读者我要写书评

暂无评论

Graph Representation Learning for Contention and Interference Management in Wireless Networks

引用

IEEE-ACM TRANSACTIONS ON NETWORKING 2024年第3期32卷 2479-2494页

作者： Gu, Zhouyou Vucetic, Branka Chikkam, Kishore Aliberti, Pasquale Hardjawana, Wibowo Univ Sydney Sch Elect & Informat Engn Sydney NSW 2006 Australia Morse Micro Sydney NSW 2010 Australia

Restricted access window (RAW) in Wi-Fi 802.11ah networks manages contention and interference by grouping users and allocating periodic time slots for each group's transmissions. We will find the optimal user grouping decisions in RAW to maximize the network's worst-case user throughput. We review existing user grouping approaches and highlight their performance limitations in the above problem. We propose formulating user grouping as a graph construction problem where vertices represent users and edge weights indicate the contention and interference. This formulation leverages the graph's max cut to group users and optimizes edge weights to construct the optimal graph whose max cut yields the optimal grouping decisions. To achieve this optimal graph construction, we design an actor-critic graph representation learning (AC-GRL) algorithm. Specifically, the actor neural network (NN) is trained to estimate the optimal graph's edge weights using path losses between users and access points. A graph cut procedure uses semidefinite programming to solve the max cut efficiently and return the grouping decisions for the given weights. The critic NN approximates user throughput achieved by the above-returned decisions and is used to improve the actor. Additionally, we present an architecture that uses the online-measured throughput and path losses to fine-tune the decisions in response to changes in user populations and their locations. Simulations show that our methods achieve 30%similar to 80% higher worst-case user throughput than the existing approaches and that the proposed architecture can further improve the worst-case user throughput by 5%similar to 30% while ensuring timely updates of grouping decisions.

关键词： User grouping graph constructions actor-critic algorithms

来源：评论

学校读者我要写书评

暂无评论

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

引用

JOURNAL OF MACHINE LEARNING RESEARCH 2018年第154期18卷 1-51页

作者： Chow, Yinlam Ghavamzadeh, Mohammad Janson, Lucas Pavone, Marco DeepMind Mountain View CA 94043 USA Stanford Univ Dept Stat Stanford CA 94305 USA Stanford Univ Aeronaut & Astronaut Stanford CA 94305 USA

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application.

关键词： Markov Decision Process Reinforcement Learning Conditional Value-at-Risk Chance-Constrained Optimization Policy Gradient algorithms actor-critic algorithms

来源：评论

学校读者我要写书评

暂无评论

A fuzzy reinforcement learning approach to control in wireless transmitters

引用

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2005年第4期35卷 768-778页

作者： Vengerov, D Bambos, N Berenji, H Sun Microsyst Labs Sunnyvale CA 94086 USA Stanford Univ Stanford CA 94305 USA Intelligent Inference Syst Corp Mountain View CA 94035 USA

We address the issue of power-controlled shared channel access in wireless networks supporting packetized data traffic. We formulate this problem using the dynamic programming framework and present a new distributed fuzzy reinforcement learning algorithm (ACFRL-2) capable of adequately solving a class of problems to which the power control problem belongs. Our experimental results show that the algorithm converges almost deterministically to a neighborhood of optimal parameter values, as opposed to a very noisy stochastic convergence of earlier algorithms. The main tradeoff facing a transmitter is to balance its current power level with future backlog in the presence of stochastically changing interference. Simulation experiments demonstrate that the ACFRL-2 algorithm achieves significant performance gains over the standard power control approach used in CDMA2000. Such a large improvement is explained by the fact that ACFRL-2 allows transmitters to learn implicit coordination policies, which back off under stressful channel conditions as opposed to engaging in escalating "power wars."

关键词： actor-critic algorithms fuzzy reinforcement learning wireless power control

来源：评论

学校读者我要写书评

暂无评论

Simulation-Based Optimization algorithms for Finite-Horizon Markov Decision Processes

引用

SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL 2008年第12期84卷 577-600页

作者： Bhatnagar, Shalabh Abdulla, Mohammed Shahid Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India Gen Motors India Sci Lab Bangalore Karnataka India

We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of these algorithms are developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Of the former two, one algorithm uses a linear parameterization for the policy, resulting in reduced memory complexity. Convergence analysis is briefly sketched and illustrative numerical experiments with the four algorithms are shown for a problem of flow control in communication networks.

关键词： Finite-horizon Markov decision processes simulation-based algorithms two-timescale stochastic approximation function approximation actor-critic algorithms normalized Hadamard matrices

来源：评论

学校读者我要写书评

暂无评论

Reinforcement learning based algorithms for average cost Markov Decision Processes

引用

DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS 2007年第1期17卷 23-52页

作者： Abdulla, Mohammed Shahid Bhatnagar, Shalabh Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India

This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over corresponding policy spaces using two different Simultaneous Perturbation Stochastic Approximation (SPSA) gradient estimates. On the faster timescale, the differential cost function corresponding to a given stationary policy is updated and an additional averaging is performed for enhanced performance. A proof of convergence to a locally optimal policy is presented. Next, we discuss a memory efficient implementation that uses a feature-based representation of the state-space and performs TD(0) learning along the faster timescale. The TD(0) algorithm does not follow an on-line sampling of states but is observed to do well on our setting. Numerical experiments on a problem of rate based flow control are presented using the proposed algorithms. We consider here the model of a single bottleneck node in the continuous time queueing framework. We show performance comparisons of our algorithms with the two-timescale actor-critic algorithms of Konda and Borkar (1999) and Bhatnagar and Kumar (2004). Our algorithms exhibit more than an order of magnitude better performance over those of Konda and Borkar (1999).

关键词： actor-critic algorithms two timescale stochastic approximation Markov Decision Processes policy iteration simultaneous perturbation stochastic approximation normalized Hadamard matrices reinforcement learning TD-learning

来源：评论

学校读者我要写书评

暂无评论

Deep Reinforcement Learning-Based Edge Caching in Wireless Networks

引用

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 2020年第1期6卷 48-61页

作者： Zhong, Chen Gursoy, M. Cenk Velipasalar, Senem Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA

With the purpose to offload data traffic in wireless networks, content caching techniques have recently been studied intensively. Using these techniques and caching a portion of the popular files at the local content servers, the users can be served with less delay. Most of the content replacement policies are based on the content popularity, that depends on the users' preferences. In practice, such information varies over time. Therefore, an approach to determine the file popularity patterns must be incorporated into caching policies. In this context, we study content caching at the wireless network edge using a deep reinforcement learning framework with Wolpertinger architecture. In particular, we propose deep actor-critic reinforcement learning based policies for both centralized and decentralized content caching. For centralized edge caching, we aim at maximizing the cache hit rate. In decentralized edge caching, we consider both the cache hit rate and transmission delay as performance metrics. The proposed frameworks are assumed to neither have any prior information on the file popularities nor know the potential variations in such information. Via simulation results, the superiority of the proposed frameworks is verified by comparing them with other policies, including least frequently used (LFU), least recently used (LRU), and first-in-first-out (FIFO) policies.

关键词： actor-critic algorithms edge caching deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Two-step gradient-based reinforcement learning for underwater robotics behavior learning

引用

ROBOTICS AND AUTONOMOUS SYSTEMS 2013年第3期61卷 271-282页

作者： El-Fakdi, Andres Carreras, Marc Univ Girona Dept Comp Engn Comp Vis & Robot Grp VICOROB Girona 17071 Spain

This article proposes a field application of a Reinforcement Learning (RL) control system for solving the action selection problem of an autonomous robot in a cable tracking task. The Ictineu Autonomous Underwater Vehicle (AUV) learns to perform a visual based cable tracking task in a two step learning process. First, a policy is computed by means of simulation where a hydrodynamic model of the vehicle simulates the cable following task. The identification procedure follows a specially designed Least Squares (LS) technique. Once the simulated results are accurate enough, in a second step, the learnt-in-simulation policy is transferred to the vehicle where the learning procedure continues in a real environment, improving the initial policy. The Natural actor-critic (NAC) algorithm has been selected to solve the problem. This actor-critic (AC) algorithm aims to take advantage of Policy Gradient (PG) and Value Function (VF) techniques for fast convergence. The work presented contains extensive real experimentation. The main objective of this work is to demonstrate the feasibility of RL techniques to learn autonomous underwater tasks, the selection of a cable tracking task is motivated by an increasing industrial demand in a technology to survey and maintain underwater structures. (c) 2012 Elsevier B.V. All rights reserved.

关键词： Reinforcement learning Underwater robotics Gradient descent algorithms actor-critic algorithms Model identification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：