咨询与建议

限定检索结果

文献类型

  • 81 篇 期刊文献
  • 28 篇 会议
  • 2 篇 学位论文

馆藏范围

  • 111 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 87 篇 工学
    • 53 篇 计算机科学与技术...
    • 36 篇 电气工程
    • 30 篇 控制科学与工程
    • 8 篇 交通运输工程
    • 7 篇 石油与天然气工程
    • 5 篇 软件工程
    • 4 篇 信息与通信工程
    • 3 篇 动力工程及工程热...
    • 2 篇 仪器科学与技术
    • 2 篇 土木工程
    • 1 篇 电子科学与技术(可...
    • 1 篇 化学工程与技术
    • 1 篇 船舶与海洋工程
    • 1 篇 环境科学与工程(可...
  • 28 篇 管理学
    • 28 篇 管理科学与工程(可...
    • 3 篇 工商管理
  • 24 篇 理学
    • 22 篇 数学
    • 4 篇 系统科学
    • 1 篇 物理学
    • 1 篇 统计学(可授理学、...
  • 11 篇 经济学
    • 7 篇 理论经济学
    • 3 篇 应用经济学
  • 3 篇 医学
    • 3 篇 临床医学
    • 2 篇 基础医学(可授医学...

主题

  • 111 篇 value function a...
  • 37 篇 reinforcement le...
  • 18 篇 approximate dyna...
  • 12 篇 dynamic programm...
  • 7 篇 dynamic vehicle ...
  • 7 篇 temporal differe...
  • 6 篇 q-learning
  • 5 篇 function approxi...
  • 5 篇 markov decision ...
  • 4 篇 markov decision ...
  • 4 篇 neural networks
  • 4 篇 optimal control
  • 4 篇 policy iteration
  • 3 篇 rate of converge...
  • 3 篇 actor-critic
  • 3 篇 policy evaluatio...
  • 3 篇 polynomial basis...
  • 3 篇 reinforcement le...
  • 3 篇 energy managemen...
  • 3 篇 off-policy learn...

机构

  • 2 篇 beijing univ che...
  • 2 篇 hefei univ techn...
  • 2 篇 missouri univ sc...
  • 2 篇 univ massachuset...
  • 2 篇 tokyo inst techn...
  • 2 篇 northeastern uni...
  • 2 篇 univ sci & techn...
  • 2 篇 tech univ carolo...
  • 2 篇 natl univ def te...
  • 2 篇 georgia inst tec...
  • 2 篇 chinese acad sci...
  • 2 篇 otto von guerick...
  • 2 篇 rice univ dept e...
  • 1 篇 polish acad sci ...
  • 1 篇 shanghai engn re...
  • 1 篇 tsinghua univ de...
  • 1 篇 univ sydney sch ...
  • 1 篇 inria nancy gran...
  • 1 篇 univ southern ca...
  • 1 篇 univ twente ind ...

作者

  • 6 篇 ulmer marlin w.
  • 5 篇 song tianheng
  • 5 篇 li dazi
  • 4 篇 xu xin
  • 4 篇 mattfeld dirk c.
  • 3 篇 soeffker ninja
  • 3 篇 hachiya hirotaka
  • 2 篇 tutsoy onder
  • 2 篇 huang zhenhua
  • 2 篇 savelsbergh mart...
  • 2 篇 montoya juan m.
  • 2 篇 lewis frank l.
  • 2 篇 pietquin olivier
  • 2 篇 jin qibing
  • 2 篇 sickles robin c.
  • 2 篇 geist matthieu
  • 2 篇 li ping
  • 2 篇 chapman archie c...
  • 2 篇 zuo lei
  • 2 篇 cervellera crist...

语言

  • 109 篇 英文
  • 2 篇 其他
检索条件"主题词=Value function approximation"
111 条 记 录,以下是71-80 订阅
排序:
DYNAMIC PRODUCT POSITIONING IN DIFFERENTIATED PRODUCT MARKETS: THE EFFECT OF FEES FOR MUSICAL PERFORMANCE RIGHTS ON THE COMMERCIAL RADIO INDUSTRY
收藏 引用
ECONOMETRICA 2013年 第5期81卷 1763-1803页
作者: Sweeting, Andrew Univ Maryland Dept Econ College Pk MD 20742 USA Duke Univ Durham NC 27706 USA NBER Cambridge MA 02138 USA
This article predicts how radio station formats would change if, as was recently proposed, music stations were made to pay fees for musical performance rights. It does so by estimating and solving, using parametric ap... 详细信息
来源: 评论
Adaptive Critic Design with Local Gaussian Process Models
收藏 引用
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2016年 第7期20卷 1135-1140页
作者: Wang, Wei Chen, Xin He, Jianxin China Univ Geosci Sch Automat 388 Lumo Rd Wuhan 430074 Peoples R China
In this paper, local Gaussian process (GP) approximation is introduced to build the critic network of adaptive dynamic programming (ADP). The sample data are partitioned into local regions, and for each region, an ind... 详细信息
来源: 评论
Stochastic home energy management system via approximate dynamic programmingInspec keywordsOther keywords
IET ENERGY SYSTEMS INTEGRATION
收藏 引用
IET ENERGY SYSTEMS INTEGRATION 2020年 第4期2卷 382-392页
作者: Liu, Xuebo Wu, Hongyu Wang, Li Faqiry, M. Nazif Kansas State Univ Mike Wiegers Dept Elect & Comp Engn Manhattan KS 66506 USA
This study proposes an approximate dynamic programming (ADP) method for a stochastic home energy management system (HEMS) that aims to minimise the electricity cost and discomfort of a household under uncertainties. T... 详细信息
来源: 评论
Balancing resources for dynamic vehicle routing with stochastic customer requests
收藏 引用
OR SPECTRUM 2024年 第2期46卷 331-373页
作者: Soeffker, Ninja Ulmer, Marlin W. Mattfeld, Dirk C. Univ Vienna Dept Business Decis & Analyt Vienna Austria Otto von Guericke Univ Chair Management Sci Magdeburg Germany Tech Univ Carolo Wilhelmina Braunschweig Decis Support Grp Braunschweig Germany
We consider a service provider performing pre-planned service for initially known customers with a fleet of vehicles, e.g., parcel delivery. During execution, new dynamic service requests occur, e.g., for parcel picku... 详细信息
来源: 评论
Methods for approximating value functions for the Dominion card game
收藏 引用
EVOLUTIONARY INTELLIGENCE 2014年 第4期6卷 195-204页
作者: Winder, Ransom K. Mitre Corp 7525 Colshire Dr Mclean VA 22102 USA
Artificial neural networks have been successfully used to approximate value functions for tasks involving decision making. In domains where decisions require a shift in judgment as the overall state changes, it is hyp... 详细信息
来源: 评论
Reducing reinforcement learning to KWIK online regression
收藏 引用
ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE 2010年 第3-4期58卷 217-237页
作者: Li, Lihong Littman, Michael L. Yahoo Res Santa Clara CA 95054 USA Rutgers State Univ Rutgers Lab Real Life Reinforcement Learning RL3 Dept Comp Sci Piscataway NJ 08854 USA
One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be... 详细信息
来源: 评论
Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence  21st
Distributed Gradient Temporal Difference Off-policy Learning...
收藏 引用
21st IFAC World Congress on Automatic Control - Meeting Societal Challenges
作者: Stankovic, Milos S. Beko, Marko Stankovic, Srdjan S. Univ Belgrade Innovat Ctr Sch Elect Engn Belgrade Serbia Vlatacom Inst Belgrade Serbia Singidunum Univ Belgrade Serbia Univ Lusafona Humanidades & Tecnol COPELABS Lisbon Portugal Univ Belgrade Sch Elect Engn Belgrade Serbia
In this paper we propose two novel distributed algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes. The algorithms differ in the way of how distrib... 详细信息
来源: 评论
An Exemplar Test Problem on Parameter Convergence Analysis of Temporal Difference Algorithms
An Exemplar Test Problem on Parameter Convergence Analysis o...
收藏 引用
10th World Congress on Intelligent Control and Automation (WCICA)
作者: Brown, Martin Tutsoy, Onder Univ Manchester Control Syst Grp Sch Elect & Elect Engn Manchester M13 9PL Lancs England
Reinforcement learning techniques have been developed to solve difficult learning control problems having small amount of a priori knowledge about the system dynamics. In this paper, a simple unstable exemplar test pr... 详细信息
来源: 评论
Sustainable l2-Regularized Actor-Critic based on Recursive Least-Squares Temporal Difference Learning
Sustainable <i>l</i><sub>2</sub>-Regularized Actor-Critic ba...
收藏 引用
IEEE International Conference on Systems, Man, and Cybernetics (SMC)
作者: Li, Luntong Li, Dazi Song, Tianheng Beijing Univ Chem Technol Inst Automat Beijing 100029 Peoples R China
Least-squares temporal difference learning (LSTD) has been used mainly for improving the data efficiency of the critic in actor-critic (AC). However, convergence analysis of the resulted algorithms is difficult when p... 详细信息
来源: 评论
Online Support Vector Regression based Actor-Critic Method
Online Support Vector Regression based Actor-Critic Method
收藏 引用
36th Annual Conference of the IEEE Industrial-Electronics-Society (IECON)/4th IEEE International Conference on E-Learning in Industrial Electronics/IES Industry Forum
作者: Lee, Dong-Hyun Kim, Jeong-Jung Lee, Ju-Jang Korea Adv Inst Sci & Technol Robot Program Taejon 305701 South Korea Korea Adv Inst Sci & Technol Dept Elect Engn Taejon 305701 South Korea
This paper proposes a new algorithm for actor-critic method using online support vector regression(SVR), which can do incremental learning and automatically track variation of environment with time-varying characteris... 详细信息
来源: 评论