Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy ***,there still exist some limitations in current PE methods,su...
详细信息
Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy ***,there still exist some limitations in current PE methods,such as low sample efficiency and local convergence,especially on complex *** this study,a novel PE algorithm called Least-Squares Truncated Temporal-Difference learning(LST2D)is *** LST2D,an adaptive truncation mechanism is designed,which effectively takes advantage of the fast convergence property of Least-Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning(TD).Then,two feature pre-training methods are utilised to improve the approximation ability of ***,an Actor-Critic algorithm based on LST2D and pre-trained feature representations(ACLPF)is proposed,where LST2D is integrated into the critic network to improve learning-prediction *** simulation studies were conducted on four robotic tasks,and the corresponding results illustrate the effectiveness of *** proposed ACLPF algorithm outperformed DQN,ACER and PPO in terms of sample efficiency and stability,which demonstrated that LST2D can be applied to online learning control problems by incorporating it into the actor-critic architecture.
The vast majority of published event-triggered mechanisms (ETMs) are constructed based on measurement errors, which introduces a problem naturally that they are updated when the measurement errors exceed the threshold...
详细信息
TSK (Takagi-Sugeno-Kang) fuzzy system is widely used because of its good approximation performance. In order to effectively optimize the regression problem of TSK fuzzy system, based on the recently proposed MBGD-RDA ...
详细信息
Phase equilibrium plays a crucial role in various industries such as chemical, petroleum, and pharmaceutical. However, conventional methods for predicting gas-liquid equilibrium have limited applicability and require ...
详细信息
As important working equipment on the sea, offshore cranes play an important role in a series of activities at sea. Offshore crane is a kind of mechanical equipment specially used for lifting materials at sea. It is w...
详细信息
In response to the intelligent needs of the development of nuclear power technology in recent years, this article proposes an application exploration of a nuclear power plant secondary control system based on digital ...
详细信息
This paper focuses on the finite-time time-varying formation tracking (TVFT) problem for multi-agent systems. Input saturation is taken into account, which can severely limit the performance of TVFT systems. The propo...
详细信息
The typical underactuated system two-dimensional translational oscillator with rotational actuator (2D TORA) that consist of two unactuated translational carts and an actuated rotational eccentric ball which acts as i...
详细信息
This paper studies the finite-time consensus of discrete-time multi-agent systems (MASs). First, a regionconditional switching controller is developed to realize finitetime consensus problem of the discrete-time syste...
详细信息
This paper is concerned with stabilization for Takagi-Sugeno (T-S) fuzzy systems with the unmatched disturbance. In this paper, an integral fuzzy switching surface function (IFSSF) containing state-dependent input mat...
详细信息
暂无评论