咨询与建议

限定检索结果

文献类型

  • 31 篇 期刊文献
  • 9 篇 会议
  • 1 篇 学位论文

馆藏范围

  • 41 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 30 篇 工学
    • 16 篇 控制科学与工程
    • 16 篇 计算机科学与技术...
    • 7 篇 电气工程
    • 3 篇 信息与通信工程
    • 1 篇 机械工程
    • 1 篇 动力工程及工程热...
    • 1 篇 化学工程与技术
    • 1 篇 石油与天然气工程
    • 1 篇 软件工程
  • 9 篇 理学
    • 8 篇 数学
    • 1 篇 系统科学
    • 1 篇 统计学(可授理学、...
  • 5 篇 管理学
    • 5 篇 管理科学与工程(可...
  • 2 篇 经济学
    • 1 篇 理论经济学
    • 1 篇 应用经济学
  • 1 篇 军事学

主题

  • 41 篇 actor-critic alg...
  • 21 篇 reinforcement le...
  • 9 篇 markov decision ...
  • 5 篇 stochastic appro...
  • 4 篇 martingale
  • 4 篇 two timescale st...
  • 4 篇 policy gradient
  • 3 篇 risk-sensitive r...
  • 3 篇 normalized hadam...
  • 3 篇 markov decision ...
  • 3 篇 policy gradient ...
  • 3 篇 deep reinforceme...
  • 2 篇 continuous time ...
  • 2 篇 simultaneous per...
  • 2 篇 function approxi...
  • 2 篇 policy evaluatio...
  • 2 篇 nonholonomic mob...
  • 2 篇 mixed multi-agen...
  • 2 篇 conditional valu...
  • 2 篇 chance-constrain...

机构

  • 5 篇 indian inst sci ...
  • 3 篇 tata inst fundam...
  • 2 篇 mit informat & d...
  • 2 篇 boston univ div ...
  • 2 篇 syracuse univ de...
  • 2 篇 ibm research ban...
  • 2 篇 inria lille
  • 2 篇 boston univ ctr ...
  • 1 篇 inria
  • 1 篇 amazon-iisc post...
  • 1 篇 aeronautics and ...
  • 1 篇 fime
  • 1 篇 george washingto...
  • 1 篇 norwegian univ s...
  • 1 篇 boston univ dept...
  • 1 篇 univ paris cite
  • 1 篇 indian inst tech...
  • 1 篇 sun microsyst la...
  • 1 篇 univ ottawa dept...
  • 1 篇 edf r&d fime

作者

  • 4 篇 bhatnagar shalab...
  • 3 篇 abdulla mohammed...
  • 3 篇 ghavamzadeh moha...
  • 3 篇 borkar vs
  • 2 篇 wang jing
  • 2 篇 d. sai koti redd...
  • 2 篇 konda vr
  • 2 篇 velipasalar sene...
  • 2 篇 gursoy m. cenk
  • 2 篇 shalabh bhatnaga...
  • 2 篇 zhong chen
  • 2 篇 paschalidis ioan...
  • 2 篇 pham huyen
  • 2 篇 warin xavier
  • 2 篇 mohammad ghavamz...
  • 2 篇 paschalidis ioan...
  • 1 篇 srikanth g. tami...
  • 1 篇 saha amrita
  • 1 篇 kumar s
  • 1 篇 bernhard schölko...

语言

  • 37 篇 英文
  • 4 篇 其他
检索条件"主题词=Actor-critic algorithms"
41 条 记 录,以下是21-30 订阅
排序:
An adaptive actor-critic algorithm with multi-step simulated experiences for controlling nonholonomic mobile robots
收藏 引用
SOFT COMPUTING 2007年 第1期11卷 81-89页
作者: Syam, Rafiuddin Watanabe, Keigo Izumi, Kiyotaka Saga Univ Grad Sch Sci & Engn Dept Adv Syst Control Engn Saga 8408502 Japan
In this paper, we propose a new algorithm of an adaptive actor-critic method with multi-step simulated experiences, as a kind of temporal difference (TD) method. In our approach, the TD-error is composed of two value-... 详细信息
来源: 评论
A Hessian actor-critic Algorithm  53
A Hessian Actor-Critic Algorithm
收藏 引用
53rd IEEE Annual Conference on Decision and Control (CDC)
作者: Wang, Jing Paschalidis, Ioannis Ch Boston Univ Div Syst Engn 8 St Marys St Boston MA 02215 USA Boston Univ Dept Elect & Comp Engn Boston MA 02215 USA
We consider Markov Decision Processes (MDPs) following a policy parametrized by a parsimonious set of parameters and seek to optimize the policy over these parameters. In this setting, optimization can be done using a... 详细信息
来源: 评论
Deep intrinsically motivated exploration in continuous control
收藏 引用
MACHINE LEARNING 2023年 第12期112卷 4959-4993页
作者: Saglam, Baturay Kozat, Suleyman S. Bilkent Univ Dept Elect & Elect Engn EE403 Bilkent TR-06800 Ankara Turkiye
In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. Although the deep setting of undirected explor... 详细信息
来源: 评论
Graph Representation Learning for Contention and Interference Management in Wireless Networks
收藏 引用
IEEE-ACM TRANSACTIONS ON NETWORKING 2024年 第3期32卷 2479-2494页
作者: Gu, Zhouyou Vucetic, Branka Chikkam, Kishore Aliberti, Pasquale Hardjawana, Wibowo Univ Sydney Sch Elect & Informat Engn Sydney NSW 2006 Australia Morse Micro Sydney NSW 2010 Australia
Restricted access window (RAW) in Wi-Fi 802.11ah networks manages contention and interference by grouping users and allocating periodic time slots for each group's transmissions. We will find the optimal user grou... 详细信息
来源: 评论
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
收藏 引用
JOURNAL OF MACHINE LEARNING RESEARCH 2018年 第154期18卷 1-51页
作者: Chow, Yinlam Ghavamzadeh, Mohammad Janson, Lucas Pavone, Marco DeepMind Mountain View CA 94043 USA Stanford Univ Dept Stat Stanford CA 94305 USA Stanford Univ Aeronaut & Astronaut Stanford CA 94305 USA
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. A... 详细信息
来源: 评论
A fuzzy reinforcement learning approach to control in wireless transmitters
收藏 引用
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2005年 第4期35卷 768-778页
作者: Vengerov, D Bambos, N Berenji, H Sun Microsyst Labs Sunnyvale CA 94086 USA Stanford Univ Stanford CA 94305 USA Intelligent Inference Syst Corp Mountain View CA 94035 USA
We address the issue of power-controlled shared channel access in wireless networks supporting packetized data traffic. We formulate this problem using the dynamic programming framework and present a new distributed f... 详细信息
来源: 评论
Simulation-Based Optimization algorithms for Finite-Horizon Markov Decision Processes
收藏 引用
SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL 2008年 第12期84卷 577-600页
作者: Bhatnagar, Shalabh Abdulla, Mohammed Shahid Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India Gen Motors India Sci Lab Bangalore Karnataka India
We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of these algorithms are developed for finite state and compact action spaces while the other two are for finite state and f... 详细信息
来源: 评论
Reinforcement learning based algorithms for average cost Markov Decision Processes
收藏 引用
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS 2007年 第1期17卷 23-52页
作者: Abdulla, Mohammed Shahid Bhatnagar, Shalabh Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algo... 详细信息
来源: 评论
Deep Reinforcement Learning-Based Edge Caching in Wireless Networks
收藏 引用
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 2020年 第1期6卷 48-61页
作者: Zhong, Chen Gursoy, M. Cenk Velipasalar, Senem Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA
With the purpose to offload data traffic in wireless networks, content caching techniques have recently been studied intensively. Using these techniques and caching a portion of the popular files at the local content ... 详细信息
来源: 评论
Two-step gradient-based reinforcement learning for underwater robotics behavior learning
收藏 引用
ROBOTICS AND AUTONOMOUS SYSTEMS 2013年 第3期61卷 271-282页
作者: El-Fakdi, Andres Carreras, Marc Univ Girona Dept Comp Engn Comp Vis & Robot Grp VICOROB Girona 17071 Spain
This article proposes a field application of a Reinforcement Learning (RL) control system for solving the action selection problem of an autonomous robot in a cable tracking task. The Ictineu Autonomous Underwater Veh... 详细信息
来源: 评论