To alleviate the extrapolation error and instability inherent in Q-function directly learned by off-policy Q-learning(QL-style)on static datasets,this article utilizes the on-policy state-action-reward-state-action(SA...
详细信息
To alleviate the extrapolation error and instability inherent in Q-function directly learned by off-policy Q-learning(QL-style)on static datasets,this article utilizes the on-policy state-action-reward-state-action(SARSA-style)to develop an offline reinforcement learning(RL)method termed robust offline Actor-Critic with on-policy regularized policy evaluation(OPRAC).With the help of SARSA-style bootstrap actions,a conservative on-policy Q-function and a penalty term for matching the on-policy and off-policy actions are jointly constructed to regularize the optimal Q-function of off-policy *** naturally equips the off-policy QL-style policy evaluation with the intrinsic pessimistic conservatism of on-policy SARSA-style,thus facilitating the acquisition of stable estimated *** with limited data sampling errors,the convergence of Q-function learned by OPRAC and the controllability of bias upper bound between the learned Q-function and its true Q-value can be theoretically *** addition,the sub-optimality of learned optimal policy merely stems from sampling *** on the well-known D4RL Gym-MuJoCo benchmark demonstrate that OPRAC can rapidly learn robust and effective tasksolving policies owing to the stable estimate of Q-value,outperforming state-of-the-art offline RLs by at least 15%.
Dear Editor,This letter is concerned with stability analysis and stabilization design for sampled-data based load frequency control(LFC) systems via a data-driven method. By describing the dynamic behavior of LFC syst...
详细信息
Dear Editor,This letter is concerned with stability analysis and stabilization design for sampled-data based load frequency control(LFC) systems via a data-driven method. By describing the dynamic behavior of LFC systems based on a data-based representation, a stability criterion is derived to obtain the admissible maximum sampling interval(MSI) for a given controller and a design condition of the PI-type controller is further developed to meet the required MSI. Finally, the effectiveness of the proposed methods is verified by a case study.
Offline-online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy ...
详细信息
Dear Editor,This letter presents a class of saturated sliding mode control (SMC)strategy for linear systems subject to impulsive disturbance and input saturation. To ensure the feasibility of proposed SMC under satura...
Dear Editor,This letter presents a class of saturated sliding mode control (SMC)strategy for linear systems subject to impulsive disturbance and input saturation. To ensure the feasibility of proposed SMC under saturation, a relationship is established among attraction domain, saturation structure and control gain.
Over the past few decades, numerous adaptive Kalman filters(AKFs) have been proposed. However, achieving online estimation with both high estimation accuracy and fast convergence speed is challenging, especially when ...
详细信息
Over the past few decades, numerous adaptive Kalman filters(AKFs) have been proposed. However, achieving online estimation with both high estimation accuracy and fast convergence speed is challenging, especially when both the process noise and measurement noise covariance matrices are relatively inaccurate. Maximum likelihood estimation(MLE) possesses the potential to achieve this goal, since its theoretical accuracy is guaranteed by asymptotic optimality and the convergence speed is fast due to weak dependence on accurate state ***, the maximum likelihood cost function is so intricate that the existing MLE methods can only simply ignore all historical measurement information to achieve online estimation,which cannot adequately realize the potential of MLE. In order to design online MLE-based AKFs with high estimation accuracy and fast convergence speed, an online exploratory MLE approach is proposed, based on which a mini-batch coordinate descent noise covariance matrix estimation framework is developed. In this framework, the maximum likelihood cost function is simplified for online estimation with fewer and simpler terms which are selected in a mini-batch and calculated with a backtracking method. This maximum likelihood cost function is sidestepped and solved by exploring possible estimated noise covariance matrices adaptively while the historical measurement information is adequately utilized. Furthermore, four specific algorithms are derived under this framework to meet different practical requirements in terms of convergence speed, estimation accuracy,and calculation load. Abundant simulations and experiments are carried out to verify the validity and superiority of the proposed algorithms as compared with existing state-of-the-art AKFs.
作者:
Xiongbo WanChaoling ZhangFan WeiChuan-Ke ZhangMin WuIEEEthe School of Automation
China University of Geosciencesthe Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systemsand the Engineering Research Center of Intelligent Technology for Geo-ExplorationMinistry of EducationWuhan 430074China
This article focuses on dynamic event-triggered mechanism(DETM)-based model predictive control(MPC) for T-S fuzzy systems.A hybrid dynamic variables-dependent DETM is carefully devised,which includes a multiplicative ...
详细信息
This article focuses on dynamic event-triggered mechanism(DETM)-based model predictive control(MPC) for T-S fuzzy systems.A hybrid dynamic variables-dependent DETM is carefully devised,which includes a multiplicative dynamic variable and an additive dynamic *** addressed DETM-based fuzzy MPC issue is described as a “min-max” optimization problem(OP).To facilitate the co-design of the MPC controller and the weighting matrix of the DETM,an auxiliary OP is proposed based on a new Lyapunov function and a new robust positive invariant(RPI) set that contain the membership functions and the hybrid dynamic variables.A dynamic event-triggered fuzzy MPC algorithm is developed accordingly,whose recursive feasibility is analysed by employing the RPI *** the designed controller,the involved fuzzy system is ensured to be asymptotically *** examples show that the new DETM and DETM-based MPC algorithm have the advantages of reducing resource consumption while yielding the anticipated performance.
Traditional gyrocompasses,while capable of providing autonomous directional guidance and path correction,face limitations in widespread applications due to their large size,making them unsuitable for compact *** syste...
详细信息
Traditional gyrocompasses,while capable of providing autonomous directional guidance and path correction,face limitations in widespread applications due to their large size,making them unsuitable for compact *** system(MEMS)gyrocompasses offer a promising alternative for ***,current MEMS gyrocompasses require the integration of motor rotation modulation technology to achieve high-precision north-finding,whereas conventional motors in previous research introduce large volume and residual magnetism,thus undermining their size ***,we innovatively propose a miniature MEMS gyrocompass based on a MEMS traveling-wave micromotor,featuring the first integration of a chip-scale rotational actuator and combined with a precise multi-position braking control system,enabling high accuracy and fast *** proposed gyrocompass made significant advancements,reducing its size to 50×42.5×24.5 mm^(3)and achieving an azimuth accuracy of 0.199°within 2 min,which is half the volume of the smallest existing similar devices while offering twice the *** improvements indicate that the proposed gyrocompass is suitable for applications in indoor industrial robotics,autonomous driving,and other related fields requiring precise directional guidance.
In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy updat...
详细信息
Real open network environments include the traffic generated by known applications or protocols, which have been previously identified and labeled, and unknown network traffic that cannot be identified based on existi...
详细信息
This paper focuses on the challenge of fixed-time control for spatiotemporal neural networks(SNNs) with discontinuous activations and time-varying coefficients. A novel fixed-time convergence lemma is proposed, which ...
详细信息
This paper focuses on the challenge of fixed-time control for spatiotemporal neural networks(SNNs) with discontinuous activations and time-varying coefficients. A novel fixed-time convergence lemma is proposed, which facilitates the handling of time-varying coefficients of SNNs and relaxes the restriction on the non-positive definiteness of the derivative of the Lyapunov function. Besides, a more flexible and economical aperiodically switching control technique is presented to stabilize SNNs within a fixed time,efectively reducing the amount of information transmission and control costs. Under the newly established fixed-time convergence lemma and aperiodically switching controller, many more general algebraic conditions are deduced to ensure the fixed-time stabilization of SNNs. Numerical examples are provided to manifest the validity of the results.
暂无评论