Optimization and sampling based algorithms are two branches of methods in machine learning, while existing safe reinforcement learning (RL) algorithms are mainly based on optimization, it is still unclear whether samp...
详细信息
Optimization and sampling based algorithms are two branches of methods in machine learning, while existing safe reinforcement learning (RL) algorithms are mainly based on optimization, it is still unclear whether sampling based methods can lead to desirable performance with safe policy. This paper formulates the Langevin policy for safe RL, and proposes Langevin Actor-Critic (LAC) to accelerate the process of policy inference. Concretely, instead of parametric policy, the proposed Langevin policy provides a stochastic process that directly infers actions, which is the numerical solver to the Langevin dynamic of actions on the continuous time. Furthermore, to make Langevin policy practical on RL tasks, the proposed LAC accumulates the transitions induced by Langevin policy and reproduces them with a generator. Finally, extensive empirical results show the effectiveness and superiority of LAC on the MuJoCo-based and Safety Gym tasks. Our implementation is available at https://***/Lfh404/LAC. Copyright 2024 by the author(s)
Optimization and sampling based algorithms are two branches of methods in machine learning, while existing safe reinforcement learning (RL) algorithms are mainly based on optimization, it is still unclear whether samp...
Optimization and sampling based algorithms are two branches of methods in machine learning, while existing safe reinforcement learning (RL) algorithms are mainly based on optimization, it is still unclear whether sampling based methods can lead to desirable performance with safe policy. This paper formulates the Langevin policy for safe RL, and proposes Langevin Actor-Critic (LAC) to accelerate the process of policy inference. Concretely, instead of parametric policy, the proposed Langevin policy provides a stochastic process that directly infers actions, which is the numerical solver to the Langevin dynamic of actions on the continuous time. Furthermore, to make Langevin policy practical on RL tasks, the proposed LAC accumulates the transitions induced by Langevin policy and reproduces them with a generator. Finally, extensive empirical results show the effectiveness and superiority of LAC on the MuJoCo-based and Safety Gym tasks. Our implementation is available at https://***/Lfh404/LAC.
With the rapid development of the mobile Internet,users generate massive data in different forms in social network every day,and different characteristics of users are reflected by these social media *** to integrate ...
详细信息
With the rapid development of the mobile Internet,users generate massive data in different forms in social network every day,and different characteristics of users are reflected by these social media *** to integrate multiple heterogeneous information and establish user profiles from multiple perspectives plays an important role in providing personalized services,marketing,and recommendation *** this paper,we propose Multi-source&Multi-task Learning for User Profiles in Social Network which integrates multiple social data sources and contains a multi-task learning framework to simultaneously predict various attributes of a ***,we design their own feature extraction models for multiple heterogeneous data ***,we design a shared layer to fuse multiple heterogeneous data sources as general shared representation for multi-task ***,we design each task’s own unique presentation layer for discriminant output of ***,we design a weighted loss function to improve the learning efficiency and prediction accuracy of each *** experimental results on more than 5000 Sina Weibo users demonstrate that our approach outperforms state-of-the-art baselines for inferring gender,age and region of social media users.
暂无评论