咨询与建议

限定检索结果

文献类型

  • 31 篇 期刊文献
  • 10 篇 会议
  • 1 篇 学位论文

馆藏范围

  • 42 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 31 篇 工学
    • 17 篇 计算机科学与技术...
    • 16 篇 控制科学与工程
    • 7 篇 电气工程
    • 3 篇 信息与通信工程
    • 2 篇 软件工程
    • 1 篇 机械工程
    • 1 篇 动力工程及工程热...
    • 1 篇 化学工程与技术
    • 1 篇 石油与天然气工程
  • 10 篇 理学
    • 9 篇 数学
    • 1 篇 系统科学
    • 1 篇 统计学(可授理学、...
  • 5 篇 管理学
    • 5 篇 管理科学与工程(可...
  • 2 篇 经济学
    • 1 篇 理论经济学
    • 1 篇 应用经济学
  • 1 篇 军事学

主题

  • 42 篇 actor-critic alg...
  • 22 篇 reinforcement le...
  • 9 篇 markov decision ...
  • 5 篇 stochastic appro...
  • 4 篇 martingale
  • 4 篇 two timescale st...
  • 4 篇 policy gradient
  • 3 篇 risk-sensitive r...
  • 3 篇 normalized hadam...
  • 3 篇 markov decision ...
  • 3 篇 policy gradient ...
  • 3 篇 deep reinforceme...
  • 2 篇 continuous time ...
  • 2 篇 simultaneous per...
  • 2 篇 function approxi...
  • 2 篇 policy evaluatio...
  • 2 篇 nonholonomic mob...
  • 2 篇 mixed multi-agen...
  • 2 篇 conditional valu...
  • 2 篇 chance-constrain...

机构

  • 5 篇 indian inst sci ...
  • 3 篇 tata inst fundam...
  • 2 篇 mit informat & d...
  • 2 篇 boston univ div ...
  • 2 篇 syracuse univ de...
  • 2 篇 ibm research ban...
  • 2 篇 inria lille
  • 2 篇 boston univ ctr ...
  • 1 篇 inria
  • 1 篇 amazon-iisc post...
  • 1 篇 aeronautics and ...
  • 1 篇 fime
  • 1 篇 george washingto...
  • 1 篇 norwegian univ s...
  • 1 篇 boston univ dept...
  • 1 篇 univ paris cite
  • 1 篇 indian inst tech...
  • 1 篇 sun microsyst la...
  • 1 篇 univ ottawa dept...
  • 1 篇 edf r&d fime

作者

  • 4 篇 bhatnagar shalab...
  • 3 篇 abdulla mohammed...
  • 3 篇 ghavamzadeh moha...
  • 3 篇 borkar vs
  • 2 篇 wang jing
  • 2 篇 d. sai koti redd...
  • 2 篇 konda vr
  • 2 篇 velipasalar sene...
  • 2 篇 gursoy m. cenk
  • 2 篇 shalabh bhatnaga...
  • 2 篇 zhong chen
  • 2 篇 paschalidis ioan...
  • 2 篇 pham huyen
  • 2 篇 warin xavier
  • 2 篇 mohammad ghavamz...
  • 2 篇 paschalidis ioan...
  • 1 篇 srikanth g. tami...
  • 1 篇 mishra nidhi
  • 1 篇 saha amrita
  • 1 篇 kumar s

语言

  • 37 篇 英文
  • 5 篇 其他
检索条件"主题词=Actor-critic algorithms"
42 条 记 录,以下是11-20 订阅
排序:
actor-critic-type learning algorithms for Markov decision processes
收藏 引用
SIAM JOURNAL ON CONTROL AND OPTIMIZATION 1999年 第1期38卷 94-123页
作者: Konda, VR Borkar, VS MIT Informat & Decis Syst Lab Cambridge MA 02139 USA Tata Inst Fundamental Res Sch Technol & Comp Sci Bombay 400005 Maharashtra India
algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptiv... 详细信息
来源: 评论
Control Randomisation Approach for Policy Gradient and Application to Reinforcement Learning in Optimal Switching
收藏 引用
APPLIED MATHEMATICS AND OPTIMIZATION 2025年 第1期91卷 1-33页
作者: Denkert, Robert Pham, Huyen Warin, Xavier Humboldt Univ Dept Math Berlin Germany Ecole Polytech CMAP Palaiseau France Lab Finance Marches Energie EDF R&D Palaiseau France Lab Finance Marches Energie FiME Palaiseau France
We propose a comprehensive framework for policy gradient methods tailored to continuous time reinforcement learning. This is based on the connection between stochastic control problems and randomised problems, enablin... 详细信息
来源: 评论
Policy gradient and actor-critic learning in continuous time and space: theory and algorithms
The Journal of Machine Learning Research
收藏 引用
The Journal of Machine Learning Research 2022年 第1期23卷 12603-12652页
作者: Yanwei Jia Xun Yu Zhou Department of Industrial Engineering and Operations Research Columbia University New York NY Department of Industrial Engineering and Operations Research & The Data Science Institute Columbia University New York NY
We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with... 详细信息
来源: 评论
An actor-critic algorithm for constrained Markov decision processes
收藏 引用
SYSTEMS & CONTROL LETTERS 2005年 第3期54卷 207-213页
作者: Borkar, VS Tata Inst Fundamental Res Sch Technol & Comp Sci Bombay 400005 Maharashtra India
An actor-critic type reinforcement learning algorithm is proposed and analyzed for constrained controlled Markov decision processes. The analysis uses multiscale stochastic approximation theory and the 'envelope t... 详细信息
来源: 评论
A simultaneous perturbation Stochastic approximation-based actor-critic algorithm for Markov decision processes
收藏 引用
IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2004年 第4期49卷 592-598页
作者: Bhatnagar, S Kumar, S Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India
A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed. The algor... 详细信息
来源: 评论
A Deep actor-critic Reinforcement Learning Framework for Dynamic Multichannel Access
收藏 引用
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 2019年 第4期5卷 1125-1139页
作者: Zhong, Chen Lu, Ziyang Gursoy, M. Cenk Velipasalar, Senem Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA
To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenar... 详细信息
来源: 评论
A sensitivity formula for risk-sensitive cost and the actor-critic algorithm
收藏 引用
SYSTEMS & CONTROL LETTERS 2001年 第5期44卷 339-346页
作者: Borkar, VS Tata Inst Fundamental Res Sch Technol & Comp Sci Bombay 400005 Maharashtra India
We propose for risk sensitive control of finite Markov chains a counterpart of the popular 'actor-critic' algorithm for classical Markov decision processes. The algorithm is based on a 'sensitivity formula... 详细信息
来源: 评论
A least squares temporal difference actor-critic algorithm with applications to warehouse management
收藏 引用
NAVAL RESEARCH LOGISTICS 2012年 第3-4期59卷 197-211页
作者: Estanjini, Reza Moazzez Li, Keyong Paschalidis, Ioannis Ch Boston Univ Dept Elect & Comp Engn Div Syst Engn Boston MA 02215 USA Boston Univ Ctr Informat & Syst Engn Boston MA 02215 USA
This article develops a new approximate dynamic programming (DP) algorithm for Markov decision problems and applies it to a vehicle dispatching problem arising in warehouse management. The algorithm is of the actor-cr... 详细信息
来源: 评论
Strengthening Acquiring Knowledge for Optimizing Dynamic Delivery Routes
Strengthening Acquiring Knowledge for Optimizing Dynamic Del...
收藏 引用
2025 International Conference on Intelligent Control, Computing and Communications, IC3 2025
作者: Mishra, Nidhi Tiwari, Ankita Kalinga University Department of Cs & It Raipur India
One of the most important challenges for delivery networks in logistics is the dynamic route optimization problem, which becomes increasingly important over time given the complexity of real-time constraints, such as ... 详细信息
来源: 评论
An actor-critic Algorithm With Second-Order actor and critic
收藏 引用
IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2017年 第6期62卷 2689-2703页
作者: Wang, Jing Paschalidis, Ioannis Ch. Boston Univ Ctr Informat & Syst Engn Boston MA 02215 USA Boston Univ Dept Elect & Comp Engn 8 St Marys St Boston MA 02215 USA Boston Univ Div Syst Engn 8 St Marys St Boston MA 02215 USA
actor-critic algorithms solve dynamic decision making problems by optimizing a performance metric of interest over a user-specified parametric class of policies. They employ a combination of an actor, making policy im... 详细信息
来源: 评论