咨询与建议

限定检索结果

文献类型

  • 31 篇 期刊文献
  • 10 篇 会议
  • 1 篇 学位论文

馆藏范围

  • 42 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 31 篇 工学
    • 17 篇 计算机科学与技术...
    • 16 篇 控制科学与工程
    • 7 篇 电气工程
    • 3 篇 信息与通信工程
    • 2 篇 软件工程
    • 1 篇 机械工程
    • 1 篇 动力工程及工程热...
    • 1 篇 化学工程与技术
    • 1 篇 石油与天然气工程
  • 10 篇 理学
    • 9 篇 数学
    • 1 篇 系统科学
    • 1 篇 统计学(可授理学、...
  • 5 篇 管理学
    • 5 篇 管理科学与工程(可...
  • 2 篇 经济学
    • 1 篇 理论经济学
    • 1 篇 应用经济学
  • 1 篇 军事学

主题

  • 42 篇 actor-critic alg...
  • 22 篇 reinforcement le...
  • 9 篇 markov decision ...
  • 5 篇 stochastic appro...
  • 4 篇 martingale
  • 4 篇 two timescale st...
  • 4 篇 policy gradient
  • 3 篇 risk-sensitive r...
  • 3 篇 normalized hadam...
  • 3 篇 markov decision ...
  • 3 篇 policy gradient ...
  • 3 篇 deep reinforceme...
  • 2 篇 continuous time ...
  • 2 篇 simultaneous per...
  • 2 篇 function approxi...
  • 2 篇 policy evaluatio...
  • 2 篇 nonholonomic mob...
  • 2 篇 mixed multi-agen...
  • 2 篇 conditional valu...
  • 2 篇 chance-constrain...

机构

  • 5 篇 indian inst sci ...
  • 3 篇 tata inst fundam...
  • 2 篇 mit informat & d...
  • 2 篇 boston univ div ...
  • 2 篇 syracuse univ de...
  • 2 篇 ibm research ban...
  • 2 篇 inria lille
  • 2 篇 boston univ ctr ...
  • 1 篇 inria
  • 1 篇 amazon-iisc post...
  • 1 篇 aeronautics and ...
  • 1 篇 fime
  • 1 篇 george washingto...
  • 1 篇 norwegian univ s...
  • 1 篇 boston univ dept...
  • 1 篇 univ paris cite
  • 1 篇 indian inst tech...
  • 1 篇 sun microsyst la...
  • 1 篇 univ ottawa dept...
  • 1 篇 edf r&d fime

作者

  • 4 篇 bhatnagar shalab...
  • 3 篇 abdulla mohammed...
  • 3 篇 ghavamzadeh moha...
  • 3 篇 borkar vs
  • 2 篇 wang jing
  • 2 篇 d. sai koti redd...
  • 2 篇 konda vr
  • 2 篇 velipasalar sene...
  • 2 篇 gursoy m. cenk
  • 2 篇 shalabh bhatnaga...
  • 2 篇 zhong chen
  • 2 篇 paschalidis ioan...
  • 2 篇 pham huyen
  • 2 篇 warin xavier
  • 2 篇 mohammad ghavamz...
  • 2 篇 paschalidis ioan...
  • 1 篇 srikanth g. tami...
  • 1 篇 mishra nidhi
  • 1 篇 saha amrita
  • 1 篇 kumar s

语言

  • 37 篇 英文
  • 5 篇 其他
检索条件"主题词=Actor-critic algorithms"
42 条 记 录,以下是1-10 订阅
排序:
actor-critic Learning algorithms for Mean-Field Control with Moment Neural Networks
收藏 引用
METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY 2025年 第1期27卷 1-20页
作者: Pham, Huyen Warin, Xavier Univ Paris Cite Paris France FiME Paris France Ecole Polytech CMAP F-91128 Palaiseau France EDF R&D FiME Paris France
We develop a new policy gradient and actor-critic algorithm for solving mean-field control problems within a continuous time reinforcement learning setting. Our approach leverages a gradient-based representation of th... 详细信息
来源: 评论
Bayesian Policy Gradient and actor-critic algorithms
收藏 引用
JOURNAL OF MACHINE LEARNING RESEARCH 2016年 第1期17卷 2319-2371页
作者: Ghavamzadeh, Mohammad Engel, Yaakov Valko, Michal Adobe Res Cambridge MA USA INRIA Paris France Rafael Adv Def Syst Tel Aviv Israel INRIA Lille SequeL Team Lille France
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Many conventional policy gradient methods use Monte-Carlo techniques to est... 详细信息
来源: 评论
On actor-critic algorithms
收藏 引用
SIAM JOURNAL ON CONTROL AND OPTIMIZATION 2003年 第4期42卷 1143-1166页
作者: Konda, VR Tsitsiklis, JN MIT Informat & Decis Syst Lab Cambridge MA 02139 USA
In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation archite... 详细信息
来源: 评论
Variance-constrained actor-critic algorithms for discounted and average reward MDPs
收藏 引用
MACHINE LEARNING 2016年 第3期105卷 367-417页
作者: Prashanth, L. A. Ghavamzadeh, Mohammad Univ Maryland Syst Res Inst College Pk MD 20742 USA Adobe Res San Jose CA USA INRIA Lille France
In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the ... 详细信息
来源: 评论
actor-critic algorithms with Online Feature Adaptation
收藏 引用
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION 2016年 第4期26卷 24-24页
作者: Prabuchandran, K. J. Bhatnagar, Shalabh Borkar, Vivek S. Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India Indian Inst Technol Dept Elect Engn Mumbai 400076 Maharashtra India
We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other w... 详细信息
来源: 评论
Parametrized actor-critic algorithms for finite-horizon MDPs
Parametrized actor-critic algorithms for finite-horizon MDPs
收藏 引用
26th American Control Conference
作者: Abdulla, Mohammed Shahid Bhatnagar, Shalabh Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India
Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probability transition matrix per stage. Thus the curse of dimensionality affects FH-MDPs more severely than infinite-horizon ... 详细信息
来源: 评论
actor-critic algorithms for Constrained Multi-agent Reinforcement Learning  19
Actor-Critic Algorithms for Constrained Multi-agent Reinforc...
收藏 引用
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems
作者: Raghuram Bharadwaj Diddigi D. Sai Koti Reddy Prabuchandran K.J. Shalabh Bhatnagar Indian Institute of Science Bangalore India IBM Research Bangalore India Amazon-IISc Postdoctoral Fellow Bangalore India
Multi-agent reinforcement learning has gained lot of popularity primarily owing to the success of deep function approximation architectures. However, many real-life multi-agent applications often impose constraints on... 详细信息
来源: 评论
Bayesian policy gradient and actor-critic algorithms
The Journal of Machine Learning Research
收藏 引用
The Journal of Machine Learning Research 2016年 第1期17卷
作者: Kevin Murphy Bernhard Schölkopf Mohammad Ghavamzadeh Yaakov Engel Michal Valko Google MPI for Intelligent Systems Adobe Research & INRIA and Adobe Research INRIA Lille Rafael Advanced Defence System Israel INRIA Lille France
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Many conventional policy gradient methods use Monte-Carlo techniques to est... 详细信息
来源: 评论
Policy Gradient and actor-critic Learning in Continuous Time and Space: Theory and algorithms
收藏 引用
JOURNAL OF MACHINE LEARNING RESEARCH 2022年 第1期23卷 1-50页
作者: Jia, Yanwei Zhou, Xun Yu Columbia Univ Dept Ind Engn & Operat Res New York NY 10027 USA Columbia Univ Data Sci Inst New York NY 10027 USA
We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with... 详细信息
来源: 评论
actor-critic Reinforcement Learning algorithms for Mean Field Games in Continuous Time, State and Action Spaces
收藏 引用
APPLIED MATHEMATICS AND OPTIMIZATION 2024年 第3期89卷 72-72页
作者: Liang, Hong Chen, Zhiping Jing, Kaili Xi An Jiao Tong Univ Sch Math & Stat Xian 710049 Shaanxi Peoples R China Xian Int Acad Math & Math Technol Ctr Optimizat Tech & Quantitat Finance Xian 710049 Shaanxi Peoples R China Univ Ottawa Dept Math & Stat Ottawa ON K1N 6N5 Canada
This paper investigates mean field games in continuous time, state and action spaces with an infinite number of agents, where each agent aims to maximize its expected cumulative reward. Using the technique of randomiz... 详细信息
来源: 评论