检索结果-内蒙古大学图书馆

Dynamic Learning and Decision Making via basis Weight Vectors

OPERATIONS RESEARCH 2022年第3期70卷 1835-1853页

作者： Zhang, Hao Univ British Columbia Sauder Sch Business Vancouver BC V6T IZ2 Canada

This paper presents a new methodology to solve a general model of dynamic decision making with a continuous unknown parameter or state. The methodology centers on the "continuation-value functions" (mappings from the parameter space to the continuation-value space), created by feasible continuation policies. When the model primitives can be described through a family of basis functions (e.g., polynomials), a continuation-value function retains that property and can be represented by a basis weight vector. The set of efficient basis weight vectors can be constructed through backward induction, which leads to a significant reduction of problem complexity and enables an exact solution for small-sized problems. A set of approximation methods based on the new methodology is developed to tackle larger problems. The methodology is also extended to the multidimensional (multiparameter) setting, which features the problem of contextual multiarmed bandits with linear expected rewards. The approximation algorithm developed in this paper outperforms three benchmark algorithms (epsilon-greedy, Thompson sampling, and LinUCB) in learning situations with many actions and short horizons.

关键词： learning and doing dynamic pricing with learning linear contextual bandits approximate dynamic programming basis representation of functions

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：