检索结果-内蒙古大学图书馆

Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

作者： Eric Mazumdar Lillian J. Ratliff Michael I. Jordan S. Shankar Sastry University of California Berkeley Berkeley CA USA University of Washington Seattle WA USA

ISBN: (纸本)9781450375184

We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum linear quadratic games, a classic game setting which is recently emerging as a benchmark in the field of multi-agent learning. In such games the state and action spaces are continuous and global Nash equilibria can be found be solving coupled Ricatti equations. Further, gradient-play in LQ games is equivalent to multi-agent policy-gradient. We first show that these games are surprisingly not convex games. Despite this, we are still able to show that the only critical points of the gradient dynamics are global Nash equilibria. We then give sufficient conditions under which policy-gradient will avoid the Nash equilibria, and generate a large number of general-sum linear quadratic games that satisfy these conditions. The existence of such games indicates that one of the most popular approaches to solving reinforcement learning problems in the classic reinforcement learning setting has no local guarantee of convergence in multi-agent settings. Further, the ease with which we can generate these counterexamples suggests that such situations are not mere edge cases and are in fact quite common.

关键词： multi-agent reinforcement learning policy gradient algorithms multi-agent learning

来源：评论

学校读者我要写书评

暂无评论

Satellite fault tolerant attitude control based on expert guided exploration of reinforcement learning agent

引用

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE 2024年

作者： Henna, Hicham Toubakh, Houari Kafi, Mohamed Redouane Gursoy, Oemer Sayed-Mouchaweh, Moamar Djemai, Mohamed Kasdi Merbah Univ LAGE Lab Ouargla Algeria Yildiz Tech Univ Istanbul Turkiye IMT Nord Europe CERI SN Douai France INSA Hauts Defrance UPHF INSA Hauts de France Valenciennes France Quartz Lab ENSEA Cergy France

This research provides a method that accelerates learning and avoids local minima to improve the policy gradient algorithm's learning process. Reinforcement learning has the advantage of not requiring a model. Consequently, it can improve control performance, mainly when a model is generally unavailable, such as when an error occurs. The proposed method efficiently and expeditiously investigates the action space. First, it quantifies the resemblance between agents' and traditional controllers' actions. Then, the principal reward function is modified to reflect this similarity. This reward-shaping mechanism guides the agent to maximize its return via an attractive force during the gradient ascent. To validate our concept, we establish a satellite attitude control environment with a similarity subsystem. The outcomes demonstrate the effectiveness and robustness of our method.

关键词： Fault-tolerant control machine learning reinforcement learning policy gradient algorithms reward shaping

来源：评论

学校读者我要写书评

暂无评论

Proximal policy optimization based hybrid recommender systems for large scale recommendations

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第13期82卷 20079-20100页

作者： Padhye, Vaibhav Lakshmanan, Kailasam Chaturvedi, Amrita IIT BHU Varanasi BHU Varanasi India

Recommender systems have become increasingly popular due to the significant rise in digital information over the internet in recent users. They help provide personalized recommendations to the user by selecting a few items out of a large set of items. However, with the growing size of item space and users, scalability remains a key issue for recommender systems. However, most existing policy gradient approaches in recommendations suffer from high variance leading to an increase in instability during the learning process. policy gradient algorithms such as PPO are proven to be effective in large action spaces (a large number of items) as they learn the optimal policy directly from the samples. We use the PPO algorithm to train our Reinforcement Learning agent modeling the collaborative filtering process as a Markov Decision Process. PPO utilizes the actor-critic framework and thus mitigates the high variance in policy gradient algorithms. Further, we address the cold start issue in Collaborative filtering with autoencoder-based content filtering. Proximal policy Optimization (PPO) methods are today considered among the most effective reinforcement learning methods, achieving state-of-the-art performance and even outperforming Deep Q learning methods. In this paper, we propose a switching hybrid recommender system using the two different recommender system techniques. A switching hybrid system can switch between recommendation techniques depending on some criterion and can tackle its constituent recommender system's shortfall using the other counterpart in a particular situation. We show that our method outperforms various baseline methods on the popular Movielens datasets for different evaluation metrics. On Movielens 1m, our method outperforms the baseline by 9.19% in terms of R@10 and 3.86% and 6.58% in terms of P@10 and P@20, respectively. For the Movielens 100k dataset, our method improves on the baseline methods by 4.10% in terms of P@10 and 3.90% and 2.40% in t

关键词： Proximal policy optimization Hybrid recommender system Reinforcement learning policy gradient algorithms

来源：评论

学校读者我要写书评

暂无评论

Safe Reinforcement Learning with Chance-constrained Model Predictive Control 4

Safe Reinforcement Learning with Chance-constrained Model Pr...

引用

4th Annual Conference on Learning for Dynamics and Control (L4DC)

作者： Pfrommer, Samuel Gautam, Tanmay Zhou, Alec Sojoudi, Somayeh Univ Calif Berkeley Dept Elect Engn & Comp Sci Berkeley CA 94720 USA

Real-world reinforcement learning (RL) problems often demand that agents behave safely by obeying a set of designed constraints. We address the challenge of safe RL by coupling a safety guide based on model predictive control (MPC) with a modified policy gradient framework in a linear setting with continuous actions. The guide enforces safe operation of the system by embedding safety requirements as chance constraints in the MPC formulation. The policy gradient training step then includes a safety penalty which trains the base policy to behave safely. We show theoretically that this penalty allows for a provably safe optimal base policy and illustrate our method with a simulated linearized quadrotor experiment.

关键词： Safe reinforcement learning model predictive control chance programming policy gradient algorithms

来源：评论

学校读者我要写书评

暂无评论

Risk-sensitive reinforcement learning: a martingale approach to reward uncertainty 20

Risk-sensitive reinforcement learning: a martingale approach...

引用

Proceedings of the First ACM International Conference on AI in Finance

作者： Nelson Vadori Sumitra Ganesh Prashant Reddy Manuela Veloso J.P. Morgan AI Research

ISBN: (纸本)9781450375849

We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the chaotic variation - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.

关键词： policy gradient algorithms doob decomposition risk-sensitive martingale reward uncertainty risk-sensitive reinforcement learning actor-critic algorithms reinforcement learning markov decision process

来源：评论

学校读者我要写书评

暂无评论

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

引用

JOURNAL OF MACHINE LEARNING RESEARCH 2018年 18卷 1-51页

作者： Chow, Yinlam Ghavamzadeh, Mohammad Janson, Lucas Pavone, Marco DeepMind Mountain View CA 94043 USA Stanford Univ Dept Stat Stanford CA 94305 USA Stanford Univ Aeronaut & Astronaut Stanford CA 94305 USA

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application.

关键词： Markov Decision Process Reinforcement Learning Conditional Value-at-Risk Chance-Constrained Optimization policy gradient algorithms Actor-Critic algorithms

来源：评论

学校读者我要写书评

暂无评论

Risk-constrained reinforcement learning with percentile risk criteria

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2017年第1期18卷

作者： Yinlam Chow Mohammad Ghavamzadeh Lucas Janson Marco Pavone DeepMind Mountain View CA Department of Statistics Stanford University Stanford CA Aeronautics and Astronautics Stanford University Stanford CA

关键词： Markov decision process actor-critic algorithms chance-constrained optimization conditional value-at-risk policy gradient algorithms reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：