咨询与建议

限定检索结果

文献类型

  • 140 篇 会议
  • 7 篇 期刊文献

馆藏范围

  • 147 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 71 篇 工学
    • 66 篇 计算机科学与技术...
    • 15 篇 软件工程
    • 11 篇 电气工程
    • 9 篇 控制科学与工程
    • 2 篇 仪器科学与技术
    • 2 篇 信息与通信工程
    • 1 篇 力学(可授工学、理...
    • 1 篇 机械工程
    • 1 篇 建筑学
  • 11 篇 理学
    • 10 篇 数学
    • 2 篇 系统科学
    • 2 篇 统计学(可授理学、...
  • 5 篇 管理学
    • 4 篇 管理科学与工程(可...
    • 3 篇 工商管理
    • 1 篇 图书情报与档案管...
  • 3 篇 经济学
    • 3 篇 应用经济学

主题

  • 76 篇 dynamic programm...
  • 39 篇 learning
  • 26 篇 optimal control
  • 25 篇 reinforcement le...
  • 15 篇 function approxi...
  • 15 篇 control systems
  • 14 篇 approximation al...
  • 14 篇 equations
  • 13 篇 neural networks
  • 13 篇 stochastic proce...
  • 12 篇 convergence
  • 10 篇 state-space meth...
  • 10 篇 cost function
  • 9 篇 mathematical mod...
  • 8 篇 trajectory
  • 8 篇 approximation me...
  • 7 篇 approximate dyna...
  • 7 篇 algorithm design...
  • 7 篇 adaptive control
  • 7 篇 heuristic algori...

机构

  • 4 篇 school of inform...
  • 4 篇 department of in...
  • 3 篇 department of el...
  • 3 篇 northeastern uni...
  • 3 篇 univ texas autom...
  • 3 篇 arizona state un...
  • 3 篇 robotics institu...
  • 3 篇 univ illinois de...
  • 2 篇 princeton univ d...
  • 2 篇 national science...
  • 2 篇 college of mecha...
  • 2 篇 key laboratory o...
  • 2 篇 univ utrecht dep...
  • 2 篇 department of op...
  • 1 篇 inria
  • 1 篇 computational le...
  • 1 篇 school of automa...
  • 1 篇 univ cincinnati ...
  • 1 篇 toyota technol c...
  • 1 篇 neuroinformatics...

作者

  • 5 篇 liu derong
  • 4 篇 xu xin
  • 4 篇 martin riedmille...
  • 4 篇 huaguang zhang
  • 4 篇 marco a. wiering
  • 4 篇 zhang huaguang
  • 4 篇 si jennie
  • 4 篇 derong liu
  • 3 篇 hado van hasselt
  • 3 篇 lewis frank l.
  • 3 篇 dongbin zhao
  • 3 篇 powell warren b.
  • 3 篇 warren b. powell
  • 3 篇 riedmiller marti...
  • 2 篇 manuel loth
  • 2 篇 van hasselt hado
  • 2 篇 preux philippe
  • 2 篇 hu dewen
  • 2 篇 jennie si
  • 2 篇 philippe preux

语言

  • 142 篇 英文
  • 5 篇 其他
检索条件"任意字段=2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007"
147 条 记 录,以下是131-140 订阅
排序:
Short-term Stock Market Timing Prediction under reinforcement learning Schemes
Short-term Stock Market Timing Prediction under Reinforcemen...
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Hailin Li Cihan H. Dagli David Enke Department of Engineering Management and Systems Engineering University of Missouri Rolla Rolla MO USA
There are fundamental difficulties when only using a supervised learning philosophy to predict financial stock short-term movements. We present a reinforcement-oriented forecasting framework in which the solution is c... 详细信息
来源: 评论
Dual Representations for dynamic programming and reinforcement learning
Dual Representations for Dynamic Programming and Reinforceme...
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Tao Wang Michael Bowling Dale Schuurmans Department of Computing Science University of Alberta Edmonton Canada
We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of... 详细信息
来源: 评论
A dynamic programming Approach to Viability Problems
A Dynamic Programming Approach to Viability Problems
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Pierre-Arnaud Coquelin Sophie Martin Remi Munos Centre de Mathématiques Appliquées Ecole Polytechnique Palaiseau France Laboratoire dIngénierie pour les Systémes Complexes Cemagref de Clermont-Ferrand Aubiere France INRIA Futurs Universite de Lille 3 France
Viability theory considers the problem of maintaining a system under a set of viability constraints. The main tool for solving viability problems lies in the construction of the viability kernel, defined as the set of... 详细信息
来源: 评论
A Recurrent Control Neural Network for Data Efficient reinforcement learning
A Recurrent Control Neural Network for Data Efficient Reinfo...
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Anton Maximilian Schaefer Steffen Udluft Hans-Georg Zimmermann Department of Optimisation and Operations Research University of Ulm (EBS) Germany Department of Learning Systems Information & Communications Siemens AG Munich Germany
In this paper we introduce a new model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Our architecture is based on a recurrent neural network (RNN) with ... 详细信息
来源: 评论
Leader-Follower semi-Markov Decision Problems: Theoretical Framework and approximate Solution
Leader-Follower semi-Markov Decision Problems: Theoretical F...
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Kurian Tharakunnel Siddhartha Bhattacharyya Department of Information and Decision Sciences University of Illinois Chicago Chicago IL USA
Leader-follower problems are hierarchical decision problems in which a leader uses incentives to induce certain desired behavior among a set of self-interested followers. dynamic leader-follower problems extend this s... 详细信息
来源: 评论
Robust dynamic programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice
Robust Dynamic Programming for Discounted Infinite-Horizon M...
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Baohua Li Jennie Si Department of Electrical Engineering Arizona State University Tempe AZ USA
In this paper, finite-state, finite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain st... 详细信息
来源: 评论
Continuous-time adaptive critics
收藏 引用
ieee TRANSACTIONS ON NEURAL NETWORKS 2007年 第3期18卷 631-647页
作者: Hanselmann, Thomas Noakes, Lyle Zaknich, Anthony Univ Melbourne Dept Elect & Elect Engn Parkville Vic 3010 Australia Univ Western Australia Sch Math & Stat Crawley WA 6009 Australia Murdoch Univ Sch Engn Sci Perth WA 6150 Australia
A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are preval... 详细信息
来源: 评论
Online reinforcement learning Neural Network Controller Design for Nanomanipulation
Online Reinforcement Learning Neural Network Controller Desi...
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Qinmin Yang S. Jagannathan Department of Electrical & Computer Engineering University of Missouri Rolla MO USA
In this paper, a novel reinforcement learning neural network (NN)-based controller, referred to adaptive critic controller, is proposed for affine nonlinear discrete-time systems with applications to nanomanipulation.... 详细信息
来源: 评论
A Scalable Model-Free Recurrent Neural Network Framework for Solving POMDPs
A Scalable Model-Free Recurrent Neural Network Framework for...
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Zhenzhen Liu Itamar Elhanany Department of Electrical & Computer Engineering University of Tennessee Knoxville TN USA
This paper presents a framework for obtaining an optimal policy in model-free partially observable Markov decision problems (POMDPs) using a recurrent neural network (RNN), A Q-function approximation approach is taken... 详细信息
来源: 评论
Opposition-Based Q(λ) with Non-Markovian Update
Opposition-Based Q(λ) with Non-Markovian Update
收藏 引用
ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)
作者: Maryam Shokri Hamid R. Tizhoosh Mohamed S. Kamel Pattern Analysis and Machine Intelligence Laboratory Department of Systems Design Engineering University of Waterloo ONT Canada Department of Electrical and Computer Engineering University of Waterloo ONT Canada
The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space... 详细信息
来源: 评论