咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Precise Asymptotics and Refine... 收藏
arXiv

Precise Asymptotics and Refined Regret of Variance-Aware UCB

作     者:Fan, Yingying Han, Yuxuan Lv, Jinchi Xu, Xiaocong Zhou, Zhengyuan 

作者机构:Data Sciences and Operations Department University of Southern California Los AngelesCA90089 United States Stern School of Business New York University New YorkNY10003 United States 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Consensus algorithm 

摘      要:In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates into its decision-making process. More precisely, we provide an asymptotic characterization of the arm-pulling rates for UCB-V, extending recent results for the canonical UCB in Kalvit and Zeevi (2021) and Khamaru and Zhang (2024). In an interesting contrast to the canonical UCB, our analysis reveals that the behavior of UCB-V can exhibit instability, meaning that the arm-pulling rates may not always be asymptotically deterministic. Besides the asymptotic characterization, we also provide non-asymptotic bounds for the arm-pulling rates in the high probability regime, offering insights into the regret analysis. As an application of this high probability result, we establish that UCB-V can achieve a more refined regret bound, previously unknown even for more complicate and advanced variance-aware online decision-making algorithms. Copyright © 2024, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分