咨询与建议

限定检索结果

文献类型

  • 81 篇 期刊文献
  • 28 篇 会议
  • 2 篇 学位论文

馆藏范围

  • 111 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 87 篇 工学
    • 53 篇 计算机科学与技术...
    • 36 篇 电气工程
    • 30 篇 控制科学与工程
    • 8 篇 交通运输工程
    • 7 篇 石油与天然气工程
    • 5 篇 软件工程
    • 4 篇 信息与通信工程
    • 3 篇 动力工程及工程热...
    • 2 篇 仪器科学与技术
    • 2 篇 土木工程
    • 1 篇 电子科学与技术(可...
    • 1 篇 化学工程与技术
    • 1 篇 船舶与海洋工程
    • 1 篇 环境科学与工程(可...
  • 28 篇 管理学
    • 28 篇 管理科学与工程(可...
    • 3 篇 工商管理
  • 24 篇 理学
    • 22 篇 数学
    • 4 篇 系统科学
    • 1 篇 物理学
    • 1 篇 统计学(可授理学、...
  • 11 篇 经济学
    • 7 篇 理论经济学
    • 3 篇 应用经济学
  • 3 篇 医学
    • 3 篇 临床医学
    • 2 篇 基础医学(可授医学...

主题

  • 111 篇 value function a...
  • 37 篇 reinforcement le...
  • 18 篇 approximate dyna...
  • 12 篇 dynamic programm...
  • 7 篇 dynamic vehicle ...
  • 7 篇 temporal differe...
  • 6 篇 q-learning
  • 5 篇 function approxi...
  • 5 篇 markov decision ...
  • 4 篇 markov decision ...
  • 4 篇 neural networks
  • 4 篇 optimal control
  • 4 篇 policy iteration
  • 3 篇 rate of converge...
  • 3 篇 actor-critic
  • 3 篇 policy evaluatio...
  • 3 篇 polynomial basis...
  • 3 篇 reinforcement le...
  • 3 篇 energy managemen...
  • 3 篇 off-policy learn...

机构

  • 2 篇 beijing univ che...
  • 2 篇 hefei univ techn...
  • 2 篇 missouri univ sc...
  • 2 篇 univ massachuset...
  • 2 篇 tokyo inst techn...
  • 2 篇 northeastern uni...
  • 2 篇 univ sci & techn...
  • 2 篇 tech univ carolo...
  • 2 篇 natl univ def te...
  • 2 篇 georgia inst tec...
  • 2 篇 chinese acad sci...
  • 2 篇 otto von guerick...
  • 2 篇 rice univ dept e...
  • 1 篇 polish acad sci ...
  • 1 篇 shanghai engn re...
  • 1 篇 tsinghua univ de...
  • 1 篇 univ sydney sch ...
  • 1 篇 inria nancy gran...
  • 1 篇 univ southern ca...
  • 1 篇 univ twente ind ...

作者

  • 6 篇 ulmer marlin w.
  • 5 篇 song tianheng
  • 5 篇 li dazi
  • 4 篇 xu xin
  • 4 篇 mattfeld dirk c.
  • 3 篇 soeffker ninja
  • 3 篇 hachiya hirotaka
  • 2 篇 tutsoy onder
  • 2 篇 huang zhenhua
  • 2 篇 savelsbergh mart...
  • 2 篇 montoya juan m.
  • 2 篇 lewis frank l.
  • 2 篇 pietquin olivier
  • 2 篇 jin qibing
  • 2 篇 sickles robin c.
  • 2 篇 geist matthieu
  • 2 篇 li ping
  • 2 篇 chapman archie c...
  • 2 篇 zuo lei
  • 2 篇 cervellera crist...

语言

  • 109 篇 英文
  • 2 篇 其他
检索条件"主题词=Value function approximation"
111 条 记 录,以下是41-50 订阅
排序:
Approximate dynamic programming for pickup and delivery problem with crowd-shipping
收藏 引用
TRANSPORTATION RESEARCH PART B-METHODOLOGICAL 2024年 187卷
作者: Mousavi, Kianoush Bodur, Merve Cevik, Mucahit Roorda, Matthew J. Univ Toronto Dept Civil & Mineral Engn Toronto ON Canada Univ Edinburgh Sch Math Edinburgh Scotland Toronto Metropolitan Univ Dept Mech Ind & Mechatron Engn Toronto ON Canada
We study a variant of dynamic pickup and delivery crowd-shipping operation for delivering online orders within a few hours from a brick-and-mortar store. This crowd-shipping operation is subject to a high degree of un... 详细信息
来源: 评论
Reinforcement learning with automatic basis construction based on isometric feature mapping
收藏 引用
INFORMATION SCIENCES 2014年 286卷 209-227页
作者: Huang, Zhenhua Xu, Xin Zuo, Lei Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China
value function approximation (VFA) has been a major research topic in reinforcement learning. Although various reinforcement learning algorithms with VFA have been proposed, the performance of most previous algorithms... 详细信息
来源: 评论
Actor-Critic Learning Control Based on l2-Regularized Temporal-Difference Prediction With Gradient Correction
收藏 引用
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018年 第12期29卷 5899-5909页
作者: Li, Luntong Li, Dazi Song, Tianheng Xu, Xin Beijing Univ Chem Technol Dept Automat Beijing 100029 Peoples R China Natl Univ Def Technol Coll Mechatron & Automat Inst Unmanned Syst Changsha 410073 Hunan Peoples R China
Actor-critic based on the policy gradient (PG-based AC) methods have been widely studied to solve learning control problems. In order to increase the data efficiency of learning prediction in the critic of PG-based AC... 详细信息
来源: 评论
Recursive Least-Squares Temporal Difference With Gradient Correction
收藏 引用
IEEE TRANSACTIONS ON CYBERNETICS 2021年 第8期51卷 4251-4264页
作者: Song, Tianheng Li, Dazi Yang, Weimin Hirasawa, Kotaro Beijing Univ Chem Technol Coll Informat Sci & Technol Dept Automat Beijing 100029 Peoples R China Beijing Univ Chem Technol Coll Mech & Elect Engn Dept Mech Engn Beijing 100029 Peoples R China
Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence ... 详细信息
来源: 评论
Operational planning and optimal sizing of microgrid considering multi-scale wind uncertainty
收藏 引用
APPLIED ENERGY 2017年 第Jun.1期195卷 616-633页
作者: Shin, Joohyun Lee, Jay H. Realff, Matthew J. Korea Adv Inst Sci & Technol Chem & Biomol Engn Dept Daejeon South Korea Georgia Inst Technol Chem & Biomol Engn Dept Atlanta GA 30332 USA
Distributed and on-site energy generation and distribution systems employing renewable energy sources and energy storage devices (referred to as microgrids) have been proposed as a new design approach to meet our ener... 详细信息
来源: 评论
Generalized attention-weighted reinforcement learning
收藏 引用
NEURAL NETWORKS 2022年 145卷 10-21页
作者: Bramlage, Lennart Cortese, Aurelio Bielefeld Univ Fac Technol D-33615 Bielefeld Germany ATR Inst Int Computat Neurosci Labs Seika 6190288 Japan
In neuroscience, attention has been shown to bidirectionally interact with reinforcement learning (RL) to reduce the dimensionality of task representations, restricting computations to relevant features. In machine le... 详细信息
来源: 评论
Dynamic pricing for managed lanes with multiple entrances and exits
收藏 引用
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES 2018年 第Nov.期96卷 304-320页
作者: Pandey, Venktesh Boyles, Stephen D. Univ Texas Austin Dept Civil Architectural & Environm Engn Austin TX 78712 USA
Priced managed lanes are increasingly being used to better utilize the existing capacity of the roadway to relieve congestion and offer reliable travel time to road users. In this paper, we investigate the optimizatio... 详细信息
来源: 评论
An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction
收藏 引用
IEEE ACCESS 2018年 6卷 7515-7525页
作者: Li, Dazi Wang, Yuting Song, Tianheng Jin, Qibing Beijing Univ Chem Technol Coll Informat Sci & Technol Beijing 100029 Peoples R China
Reinforcement learning (RL) is an important machine learning paradigm that can be used for learning from the data obtained by the human-computer interface and the interaction in human-centered smart systems. One of th... 详细信息
来源: 评论
Network Effects and Multinetwork Sellers' Dynamic Pricing in the US Smartphone Market
收藏 引用
MANAGEMENT SCIENCE 2023年 第6期69卷 3297-3318页
作者: Liu, Yue Luo, Rong Cent Univ Finance & Econ Sch Int Trade & Econ Beijing 102206 Peoples R China Renmin Univ China Sch Econ Beijing 100872 Peoples R China
Although the literature on network effects has focused on single-network firms, many industries feature multinetwork firms that play more complex dynamic pricing games. In this paper, we estimate the network effect at... 详细信息
来源: 评论
Workforce Scheduling in the Era of Crowdsourced Delivery
收藏 引用
TRANSPORTATION SCIENCE 2020年 第4期54卷 1113-1133页
作者: Ulmer, Marlin Savelsbergh, Martin Tech Univ Carolo Wilhelmina Braunschweig Carl Friedrich Gauss Fak D-38106 Braunschweig Germany Georgia Inst Technol H Milton Stewart Sch Ind & Syst Engn Atlanta GA 30332 USA
Using crowdsourced delivery capacity, that is, individuals offering their vehicle and their time to perform deliveries, can allow companies to provide faster delivery options and more easily accommodate fluctuations i... 详细信息
来源: 评论