检索结果-内蒙古大学图书馆

作者： Yijia Zeng University of California San DiegoMath Department

The classical Upper Confidence Bound(ucb) algorithm implemented overestimates of the true mean of reward distributions based on the sample mean and the number of times such arms were chosen to decide the best *** this case,the variances,the essential components of any distributions,of reward distributions are ***,in real-world applications,arms with relatively high means and small variances sometimes are preferable to arms with the highest mean and large *** concerns are considered risk-aversion in the Multi-Armed Bandits ***,since the combination of the variance and mean could help estimate the range of the distribution,proper utilization of the sample variance might help people to construct a tighter upper confidence bound to perform the ucb algorithm,leading to a smaller *** this paper,the author investigates and summarizes the current algorithm regarding risk aversion based on the estimations and implementations of variances in constructing the upper confidence bound in the ucb algorithm.

关键词： variance ucb algorithm risk aversion Multi-armed Bandit

来源：评论

学校读者我要写书评

暂无评论

Secure protocols for cumulative reward maximization in stochastic multi-armed bandits

引用

JOURNAL OF COMPUTER SECURITY 2023年第1期31卷 1-27页

作者： Ciucanu, Radu Lafourcade, Pascal Lombard-Platet, Marius Soare, Marta Univ Grenoble Alpes France Mails Raduciucanu Univ Grenoble Alpesfr LIG Grenoble France Univ Clermont Auvergne LIMOS Clermont Ferrand France Univ PSL DIENS Be Studys France

We consider the problem of cumulative reward maximization in multi-armed bandits. We address the security concerns that occur when data and computations are outsourced to an honest-but-curious cloud i.e., that executes tasks dutifully, but tries to gain as much information as possible. We consider situations where data used in bandit algorithms is sensitive and has to be protected e.g., commercial or personal data. We rely on cryptographic schemes and propose ucb-MS, a secure multi-party protocol based on the ucb algorithm. We prove that ucb-MS computes the same cumulative reward as ucb while satisfying desirable security properties. In particular, cloud nodes cannot learn the cumulative reward or the sum of rewards for more than one arm. Moreover, by analyzing messages exchanged among cloud nodes, an external observer cannot learn the cumulative reward or the sum of rewards produced by some arm. We show that the overhead due to cryptographic primitives is linear in the size of the input. Our implementation confirms the linear-time behavior and the practical feasibility of our protocol, on both synthetic and real-world data.

关键词： Security in machine learning cumulative reward maximization honest-but-curious cloud ucb algorithm AES-GCM symmetric encryption scheme Paillier's additive homomorphic asymmetric encryption scheme

来源：评论

学校读者我要写书评

暂无评论

A Fast Signal Integrity Design Model of Printed Circuit Board based on Monte-Carlo Tree 13

A Fast Signal Integrity Design Model of Printed Circuit Boar...

引用

13th IEEE International Conference on ASIC

作者： Zhang, Tingrui Chen, Siyu Wei, Shuwu Chen, Jienan Univ Elect Sci & Technol China Chengdu 611731 Sichuan Peoples R China

ISBN: (纸本)9781728107356

In this paper, we discuss the signal integrity of the printed circuit board(PCB) and put forward a new method of designing PCB parameters based on monte-carlo tree search(MCTS). First, effects and measurements of the signal integrity are analysed and presented. Next, the paper introduces the algorithm MCTS on how to conduct parameters assignments and presents the improved upper confidence bound(ucb) algorithm that adapts to the PCB parameter design. We simulate and collect eye metrics data on the software and accelerate the process of designing a signal integrated PCB by employing MCTS. Compared to randomly searching for optimal parameters combination, the method generates an optimal global consequence rather than working out a suboptimum result. The total time cost is within ten minutes, while the random search takes hours to get the result.

关键词： signal integrity PCB MCTS ucb algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm

引用

BIOSYSTEMS 2017年 160卷 25-32页

作者： Kamiura, Moto Sano, Kohei Tokyo Denki Univ Grad Sch Sci & Engn Tokyo Japan Tokyo Denki Univ Sch Sci & Engn Tokyo Japan

The principle of optimism in the face of uncertainty is known as a heuristic in sequential decision-making problems. Overtaking method based on this principle is an effective algorithm to solve multi-armed bandit problems. It was defined by a set of some heuristic patterns of the formulation in the previous study. The objective of the present paper is to redefine the value functions of Overtaking method and to unify the formulation of them. The unified Overtaking method is associated with upper bounds of confidence intervals of expected rewards on statistics. The unification of the formulation enhances the universality of Overtaking method. Consequently we newly obtain Overtaking method for the exponentially distributed rewards, numerically analyze it, and show that it outperforms ucb algorithm on average. The present study suggests that the principle of optimism in the face of uncertainty should be regarded as the statistics-based consequence of the law of large numbers for the sample mean of rewards and estimation of upper bounds of expected rewards, rather than as a heuristic, in the context of multi-armed bandit problems. (C) 2017 Elsevier B.V. All rights reserved.

关键词： The principle of optimism in the face of uncertainty Multi-armed bandit problem Confidence interval ucb algorithm

来源：评论

学校读者我要写书评

暂无评论

Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?

引用

BIOSYSTEMS 2015年 135卷 55-65页

作者： Ochi, Kento Kamiura, Moto Tokyo Denki Univ Grad Sch Sci & Engn Hiki Saitama Japan Tokyo Denki Univ Sch Sci & Engn Hiki Saitama Japan Tohoku Univ Elect Commun Res Inst Sendai Miyagi 980 Japan

A multi-armed bandit problem is a search problem on which a learning agent must select the optimal arm among multiple slot machines generating random rewards. ucb algorithm is one of the most popular methods to solve multi-armed bandit problems. It achieves logarithmic regret performance by coordinating balance between exploration and exploitation. Since ucb algorithms, researchers have empirically known that optimistic value functions exhibit good performance in multi-armed bandit problems. The terms optimistic or optimism might suggest that the value function is sufficiently larger than the sample mean of rewards. The first definition of ucb algorithm is focused on the optimization of regret, and it is not directly based on the optimism of a value function. We need to think the reason why the optimism derives good performance in multi-armed bandit problems. In the present article, we propose a new method, which is called Overtaking method, to solve multi-armed bandit problems. The value function of the proposed method is defined as an upper bound of a confidence interval with respect to an estimator of expected value of reward: the value function asymptotically approaches to the expected value of reward from the upper bound. If the value function is larger than the expected value under the asymptote, then the learning agent is almost sure to be able to obtain the optimal arm. This structure is called sand-sifter mechanism, which has no regrowth of value function of suboptimal arms. It means that the learning agent can play only the current best arm in each time step. Consequently the proposed method achieves high accuracy rate and low regret and some value functions of it can outperform ucb algorithms. This study suggests the advantage of optimism of agents in uncertain environment by one of the simplest frameworks. (C) 2015 The Authors. Published by Elsevier Ireland Ltd.

关键词： Exploration-exploitation dilemma Multi-armed bandit problem Confidence interval ucb algorithm Optimism

来源：评论

学校读者我要写书评

暂无评论

Pareto Upper Confidence Bounds algorithms: an empirical study

Pareto Upper Confidence Bounds algorithms: an empirical stud...

引用

IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

作者： Drugan, Madalina M. Nowe, Ann Manderick, Bernard Vrije Univ Brussel Artificial Intelligence Lab Ixelles Belgium

ISBN: (纸本)9781479945528

Many real-world stochastic environments are inherently multi-objective environments with conflicting objectives. The multi-objective multi-armed bandits (MOMAB) are extensions of the classical, i.e. single objective, multi-armed bandits to reward vectors and multi-objective optimisation techniques are often required to design mechanisms with an efficient exploration / exploitation trade-off. In this paper, we propose the improved Pareto Upper Confidence Bound (iPucb) algorithm that straightforwardly extends the single objective improved ucb algorithm to reward vectors by deleting the suboptimal arms. The goal of the improved Pareto ucb algorithm, i.e. iPucb, is to identify the set of best arms, or the Pareto front, in a fixed budget of arm pulls. We experimentally compare the performance of the proposed Pareto upper confidence bound algorithm with the Pareto ucb1 algorithm and the Hoeffding race on a bi-objective example coming from an industrial control applications, i.e. the engagement of wet clutches. We propose a new regret metric based on the Kullback-Leibler divergence to measure the performance of a multi-objective multi-armed bandit algorithm. We show that iPucb outperforms the other two tested algorithms on the given multi-objective environment.

关键词： Pareto optimisation learning (artificial intelligence) stochastic processes Hoeffding race Kullback-Leibler divergence MOMAB Pareto ucb1 algorithm Pareto upper confidence bounds algorithms ucb algorithm bi-objective example industrial control applications multiobjective environments multiobjective multiarmed bandit algorithm multiobjective multiarmed bandits multiobjective optimisation techniques real-world stochastic environments wet clutches algorithm design and analysis Electronic mail Hypercubes Measurement Pareto optimization Upper bound Vectors Pareto optimisation wet clutch algorithm design and analysis electronic mail Hypercube Markov chain Upper bound industrial control Cloning Vectors Stochastic Processes algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：