咨询与建议

限定检索结果

文献类型

  • 1 篇 会议

馆藏范围

  • 1 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 1 篇 工学
    • 1 篇 计算机科学与技术...

主题

  • 1 篇 pareto knowledge...
  • 1 篇 probability dist...
  • 1 篇 annealing-pareto...
  • 1 篇 ucb1-policy
  • 1 篇 pareto front
  • 1 篇 pareto dominance...
  • 1 篇 pareto upper con...
  • 1 篇 cloning vectors
  • 1 篇 vectors
  • 1 篇 reward vectors
  • 1 篇 stochastic model...
  • 1 篇 arms
  • 1 篇 annealing
  • 1 篇 pareto optimal a...
  • 1 篇 pareto-ucb1
  • 1 篇 nickel
  • 1 篇 stochastic rewar...
  • 1 篇 kg-policy
  • 1 篇 sampling methods
  • 1 篇 pareto-kg

机构

  • 1 篇 vrije univ bruss...

作者

  • 1 篇 yahyaa saba q.
  • 1 篇 drugan madalina ...
  • 1 篇 manderick bernar...

语言

  • 1 篇 英文
检索条件"主题词=annealing-Pareto multiobjective multiarmed bandit algorithm"
1 条 记 录,以下是1-10 订阅
排序:
annealing-pareto Multi-Objective Multi-Armed bandit algorithm
Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorith...
收藏 引用
IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)
作者: Yahyaa, Saba Q. Drugan, Madalina M. Manderick, Bernard Vrije Univ Brussel Dept Comp Sci Pl Laan 2 B-1050 Brussels Belgium
In the stochastic multi-objective multi-armed bandit (or MOMAB), arms generate a vector of stochastic rewards, one per objective, instead of a single scalar reward. As a result, there is not only one optimal arm, but ... 详细信息
来源: 评论