检索结果-内蒙古大学图书馆

multi-agent Gradient-Based Off-Policy actor-critic algorithm for Distributed Reinforcement Learning

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS 2024年第1期17卷 1-18页

作者： Ren, Jineng Wenzhou Univ Chashan Univ Town Sch Comp Sci & Artificial Intelligence Wenzhou 325035 Zhejiang Peoples R China Wenzhou Univ Chashan Univ Town Artificial Intelligence & Adv Mfg Inst Yongjia Wenzhou 325035 Zhejiang Peoples R China

This paper proposes a gradient-based multi-agent actor-critic algorithm for off-policy reinforcement learning using importance sampling. Our algorithm is incremental with full gradients, and its complexity per iteration scales linearly with the size of approximation features. Previous multi-agent actor-critic algorithms are limited to the on-policy setting or off-policy emphatic temporal difference (TD) learning and they do not take advantage of the advances in off-policy gradient temporal difference learning (GTD). As a theoretical contribution, we establish that the critic step of the proposed algorithm converges to the TD solution of the projected Bellman equation and the actor step converges to the set of asymptotically stable fixed points. Numerical experiments on the multi-agent generalization of the Boyan's chain problem show that the proposed approach provides improved performances in terms of stability and convergence rate as compared with the state-of-the-art baseline algorithm.

关键词： multi-agent actor-critic algorithm Off policy Gradient temporal difference Distributed reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Dynamic Spectrum Sharing Based on Federated Learning and multi-agent actor-critic Reinforcement Learning 19

Dynamic Spectrum Sharing Based on Federated Learning and Mul...

引用

19th IEEE International Wireless Communications and Mobile Computing (IEEE IWCMC)

作者： Yang, Tongtong Zhang, Wensheng Bo, Yulian Sun, Jian Wang, Cheng-Xiang Shandong Univ Shandong Prov Key Lab Wireless Commun Sch Informat Sci & Engn Qingdao 266237 Peoples R China Southeast Univ Sch Informat Sci & Engn Natl Mobile Commun Res Lab Nanjing 210096 Peoples R China Purple Mt Labs Nanjing 211111 Peoples R China

ISBN: (纸本)9798350333398

In order to improve spectrum efficiency in emergency communications, a dynamic spectrum sharing (DSS) scheme based on federated learning (FL) and deep reinforcement learning (DRL) is proposed. The operation model follows the paradigm of cognitive radio networks (CRNs), in which multiple secondary users (SUs) with different bandwidth requirements, spectrum sensing and access capabilities randomly access idle frequency bands that primary users (PUs) do not occupy. Different users in emergency communications are considered as SUs or PUs according to their communication priorities. A maximum entropy based multi-agent actor-critic (ME-MAAC) algorithm is used to realize an optimal spectrum sharing strategy by updating varying rewards to SUs. During the learning process, the FL algorithm is used to assign appropriate weights to SUs. Simulation results show that the performance of proposed scheme is better in terms of reward value, access rate, and convergence speed.

关键词： Dynamic spectrum sharing federated learning deep reinforcement learning multi-agent actor-critic algorithm CRNs

来源：评论

学校读者我要写书评

暂无评论

Dynamics of market making algorithms in dealer markets: Learning and tacit collusion

引用

MATHEMATICAL FINANCE 2024年第2期34卷 467-521页

作者： Cont, Rama Xiong, Wei Univ Oxford Math Inst Oxford England

The widespread use of market-making algorithms in electronic over-the-counter markets may give rise to unexpected effects resulting from the autonomous learning dynamics of these algorithms. In particular the possibility of "tacit collusion" among market makers has increasingly received regulatory scrutiny. We model the interaction of market makers in a dealer market as a stochastic differential game of intensity control with partial information and study the resulting dynamics of bid-ask spreads. Competition among dealers is modeled as a Nash equilibrium, while collusion is described in terms of Pareto optima. Using a decentralized multi-agent deep reinforcement learning algorithm to model how competing market makers learn to adjust their quotes, we show that the interaction of market making algorithms via market prices, without any sharing of information, may give rise to tacit collusion, with spread levels strictly above the competitive equilibrium level.

关键词： differential games decentralized learning intensity control learning Market microstructure market making multi-agent actor-critic algorithm Nash equilibrium reinforcement tacit collusion

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：