咨询与建议

限定检索结果

文献类型

  • 81 篇 期刊文献
  • 28 篇 会议
  • 2 篇 学位论文

馆藏范围

  • 111 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 87 篇 工学
    • 53 篇 计算机科学与技术...
    • 36 篇 电气工程
    • 30 篇 控制科学与工程
    • 8 篇 交通运输工程
    • 7 篇 石油与天然气工程
    • 5 篇 软件工程
    • 4 篇 信息与通信工程
    • 3 篇 动力工程及工程热...
    • 2 篇 仪器科学与技术
    • 2 篇 土木工程
    • 1 篇 电子科学与技术(可...
    • 1 篇 化学工程与技术
    • 1 篇 船舶与海洋工程
    • 1 篇 环境科学与工程(可...
  • 28 篇 管理学
    • 28 篇 管理科学与工程(可...
    • 3 篇 工商管理
  • 24 篇 理学
    • 22 篇 数学
    • 4 篇 系统科学
    • 1 篇 物理学
    • 1 篇 统计学(可授理学、...
  • 11 篇 经济学
    • 7 篇 理论经济学
    • 3 篇 应用经济学
  • 3 篇 医学
    • 3 篇 临床医学
    • 2 篇 基础医学(可授医学...

主题

  • 111 篇 value function a...
  • 37 篇 reinforcement le...
  • 18 篇 approximate dyna...
  • 12 篇 dynamic programm...
  • 7 篇 dynamic vehicle ...
  • 7 篇 temporal differe...
  • 6 篇 q-learning
  • 5 篇 function approxi...
  • 5 篇 markov decision ...
  • 4 篇 markov decision ...
  • 4 篇 neural networks
  • 4 篇 optimal control
  • 4 篇 policy iteration
  • 3 篇 rate of converge...
  • 3 篇 actor-critic
  • 3 篇 policy evaluatio...
  • 3 篇 polynomial basis...
  • 3 篇 reinforcement le...
  • 3 篇 energy managemen...
  • 3 篇 off-policy learn...

机构

  • 2 篇 beijing univ che...
  • 2 篇 hefei univ techn...
  • 2 篇 missouri univ sc...
  • 2 篇 univ massachuset...
  • 2 篇 tokyo inst techn...
  • 2 篇 northeastern uni...
  • 2 篇 univ sci & techn...
  • 2 篇 tech univ carolo...
  • 2 篇 natl univ def te...
  • 2 篇 georgia inst tec...
  • 2 篇 chinese acad sci...
  • 2 篇 otto von guerick...
  • 2 篇 rice univ dept e...
  • 1 篇 polish acad sci ...
  • 1 篇 shanghai engn re...
  • 1 篇 tsinghua univ de...
  • 1 篇 univ sydney sch ...
  • 1 篇 inria nancy gran...
  • 1 篇 univ southern ca...
  • 1 篇 univ twente ind ...

作者

  • 6 篇 ulmer marlin w.
  • 5 篇 song tianheng
  • 5 篇 li dazi
  • 4 篇 xu xin
  • 4 篇 mattfeld dirk c.
  • 3 篇 soeffker ninja
  • 3 篇 hachiya hirotaka
  • 2 篇 tutsoy onder
  • 2 篇 huang zhenhua
  • 2 篇 savelsbergh mart...
  • 2 篇 montoya juan m.
  • 2 篇 lewis frank l.
  • 2 篇 pietquin olivier
  • 2 篇 jin qibing
  • 2 篇 sickles robin c.
  • 2 篇 geist matthieu
  • 2 篇 li ping
  • 2 篇 chapman archie c...
  • 2 篇 zuo lei
  • 2 篇 cervellera crist...

语言

  • 109 篇 英文
  • 2 篇 其他
检索条件"主题词=Value function approximation"
111 条 记 录,以下是31-40 订阅
排序:
A Reinforcement Learning Approach to Autonomous Decision Making of Intelligent Vehicles on Highways
收藏 引用
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2020年 第10期50卷 3884-3897页
作者: Xu, Xin Zuo, Lei Li, Xin Qian, Lilin Ren, Junkai Sun, Zhenping Natl Univ Def Technol Inst Unmanned Syst Coll Intelligence Sci & Technol Changsha 410073 Peoples R China
Autonomous decision making is a critical and difficult task for intelligent vehicles in dynamic transportation environments. In this paper, a reinforcement learning approach with value function approximation and featu... 详细信息
来源: 评论
Transfer learning via inter-task mappings for temporal difference learning
收藏 引用
JOURNAL OF MACHINE LEARNING RESEARCH 2007年 第9期8卷 2125-2167页
作者: Taylor, Matthew E. Stone, Peter Liu, Yaxin Univ Texas Austin Dept Comp Sci Austin TX 78712 USA
Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situation... 详细信息
来源: 评论
Two-Stream Fused Fuzzy Deep Neural Network for Multiagent Learning
收藏 引用
IEEE TRANSACTIONS ON FUZZY SYSTEMS 2023年 第2期31卷 511-520页
作者: Fang, Baofu Zheng, Caiming Wang, Hao Yu, Tingting Hefei Univ Technol Sch Comp Sci & Informat Engn Hefei 230601 Peoples R China
In multiagent reinforcement learning (RL), multilayer fully connected neural network is used for value function approximation, which solves large-scale or continuous space problems. However, it is easy to fall into a ... 详细信息
来源: 评论
Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control
收藏 引用
OPTIMAL CONTROL APPLICATIONS & METHODS 2016年 第1期37卷 108-126页
作者: Tutsoy, Onder Brown, Martin Univ Manchester Control Syst Ctr Sch Elect & Elect Engn Manchester M13 9PL Lancs England Adana Sci & Technol Univ Elect & Elect Engn Dept Seyhan Adana Turkey
Reinforcement learning is a powerful tool used to obtain optimal control solutions for complex and difficult sequential decision making problems where only a minimal amount of a priori knowledge exists about the syste... 详细信息
来源: 评论
Gaussian Process Approximate Dynamic Programming for Energy-Optimal Supervisory Control of Parallel Hybrid Electric Vehicles
收藏 引用
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 2022年 第8期71卷 8367-8380页
作者: Bae, Jin Woo Kim, Kwang-Ki K. Texas A&M Univ Dept Comp Sci & Engn College Stn TX 77843 USA Inha Univ Dept Elect & Comp Engn Incheon 22212 South Korea
We propose an energy-efficient supervisory control method for the power management of parallel hybrid electric vehicles (HEVs) to improve the fuel economy and reduce exhaust gas emissions. Plug-in HEVs ((P)HEVs) have ... 详细信息
来源: 评论
Time-of-Day Pricing in Taxi Markets
收藏 引用
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2017年 第6期18卷 1610-1622页
作者: Qian, Xinwu Ukkusuri, Satish V. Purdue Univ Dept Civil Engn W Lafayette IN 47906 USA
For a regular weekday in New York City, the number of taxi trips at 8 P. M. may be 10 times greater than that at 5 A. M., while passengers are charged under the same pricing scheme. Motivated by temporally non-station... 详细信息
来源: 评论
A Q-learning algorithm for task scheduling based on improved SVM in wireless sensor networks
收藏 引用
COMPUTER NETWORKS 2019年 161卷 138-149页
作者: Wei, Zhenchun Liu, Fei Zhang, Yan Xu, Juan Ji, Jianjun Lyu, Zengwei Hefei Univ Technol Sch Comp Sci & Informat Engn Hefei Anhui Peoples R China Minist Educ Engn Res Ctr Safety Crit Ind Measurement & Contro Hefei Anhui Peoples R China Key Lab Ind Safety & Emergency Technol Hefei Anhui Peoples R China
Application performance and energy consumption deep exposed to task scheduling of nodes in wireless sensor networks (WSNs). Unreasonable task scheduling of nodes leads to excessive network energy consumption. Thus, a ... 详细信息
来源: 评论
Optimal control of nonlinear discrete time-varying systems using a new neural network approximation structure
收藏 引用
NEUROCOMPUTING 2015年 156卷 157-165页
作者: Kiumarsi, Bahare Lewis, Frank L. Levine, Daniel S. Univ Texas Arlington UTA Res Inst Ft Worth TX 76118 USA Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang Peoples R China Univ Texas Arlington Arlington TX 76019 USA
In this paper motivated by recently discovered neurocognitive models of mechanisms in the brain, a new reinforcement learning (RL) method is presented based on a novel critic neural network (NN) structure to solve the... 详细信息
来源: 评论
A DEEP Q-LEARNING NETWORK FOR SHIP STOWAGE PLANNING PROBLEM
收藏 引用
POLISH MARITIME RESEARCH 2017年 第s3期24卷 102-109页
作者: Shen, Yifan Zhao, Ning Xia, Mengjue Du, Xueqiang Shanghai Maritime Univ Sci Res Acad Shanghai Peoples R China Shanghai Maritime Univ Logist Engn Coll Shanghai 201306 Peoples R China
Ship stowage plan is the management connection of quae crane scheduling and yard crane scheduling. The quality of ship stowage plan affects the productivity greatly. Previous studies mainly focuses on solving stowage ... 详细信息
来源: 评论
Dynamic Lookahead Policies for Stochastic-Dynamic Inventory Routing in Bike Sharing Systems
收藏 引用
COMPUTERS & OPERATIONS RESEARCH 2019年 106卷 260-279页
作者: Brinkmann, Jan Ulmer, Marlin W. Mattfeld, Dirk C. Tech Univ Carolo Wilhelmina Braunschweig D-38106 Braunschweig Germany
We present the stochastic-dynamic inventory routing problem for bike sharing systems (SDIRP). The objective of the SDIRP is to avoid unsatisfied demand by dynamically relocating bikes during the day. To anticipate pot... 详细信息
来源: 评论