咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Safe Reinforcement Learning vi... 收藏
arXiv

Safe Reinforcement Learning via Probabilistic Logic Shields

作     者:Yang, Wen-Chi Marra, Giuseppe Rens, Gavin De Raedt, Luc 

作者机构:Leuven AI KU Leuven Belgium Centre for Applied Autonomous Sensor Systems Örebro University Sweden 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2023年

核心收录:

主  题:Reinforcement learning 

摘      要:Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions. Therefore, PLPG can be seamlessly applied to any policy gradient algorithm while still providing the same convergence guarantees. In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques. © 2023, CC BY.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分