To alleviate the extrapolation error and instability inherent in Q-function directly learned by off-policy Q-learning(QL-style)on static datasets,this article utilizes the on-policy state-action-reward-state-action(SA...
详细信息
To alleviate the extrapolation error and instability inherent in Q-function directly learned by off-policy Q-learning(QL-style)on static datasets,this article utilizes the on-policy state-action-reward-state-action(SARSA-style)to develop an offline reinforcement learning(RL)method termed robust offline Actor-Critic with on-policy regularized policy evaluation(OPRAC).With the help of SARSA-style bootstrap actions,a conservative on-policy Q-function and a penalty term for matching the on-policy and off-policy actions are jointly constructed to regularize the optimal Q-function of off-policy *** naturally equips the off-policy QL-style policy evaluation with the intrinsic pessimistic conservatism of on-policy SARSA-style,thus facilitating the acquisition of stable estimated *** with limited data sampling errors,the convergence of Q-function learned by OPRAC and the controllability of bias upper bound between the learned Q-function and its true Q-value can be theoretically *** addition,the sub-optimality of learned optimal policy merely stems from sampling *** on the well-known D4RL Gym-MuJoCo benchmark demonstrate that OPRAC can rapidly learn robust and effective tasksolving policies owing to the stable estimate of Q-value,outperforming state-of-the-art offline RLs by at least 15%.
Dear Editor,This letter is concerned with stability analysis and stabilization design for sampled-data based load frequency control(LFC) systems via a data-driven method. By describing the dynamic behavior of LFC syst...
详细信息
Dear Editor,This letter is concerned with stability analysis and stabilization design for sampled-data based load frequency control(LFC) systems via a data-driven method. By describing the dynamic behavior of LFC systems based on a data-based representation, a stability criterion is derived to obtain the admissible maximum sampling interval(MSI) for a given controller and a design condition of the PI-type controller is further developed to meet the required MSI. Finally, the effectiveness of the proposed methods is verified by a case study.
Offline-online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy ...
详细信息
Dear Editor,This letter presents a class of saturated sliding mode control (SMC)strategy for linear systems subject to impulsive disturbance and input saturation. To ensure the feasibility of proposed SMC under satura...
Dear Editor,This letter presents a class of saturated sliding mode control (SMC)strategy for linear systems subject to impulsive disturbance and input saturation. To ensure the feasibility of proposed SMC under saturation, a relationship is established among attraction domain, saturation structure and control gain.
Over the past few decades, numerous adaptive Kalman filters(AKFs) have been proposed. However, achieving online estimation with both high estimation accuracy and fast convergence speed is challenging, especially when ...
详细信息
Over the past few decades, numerous adaptive Kalman filters(AKFs) have been proposed. However, achieving online estimation with both high estimation accuracy and fast convergence speed is challenging, especially when both the process noise and measurement noise covariance matrices are relatively inaccurate. Maximum likelihood estimation(MLE) possesses the potential to achieve this goal, since its theoretical accuracy is guaranteed by asymptotic optimality and the convergence speed is fast due to weak dependence on accurate state ***, the maximum likelihood cost function is so intricate that the existing MLE methods can only simply ignore all historical measurement information to achieve online estimation,which cannot adequately realize the potential of MLE. In order to design online MLE-based AKFs with high estimation accuracy and fast convergence speed, an online exploratory MLE approach is proposed, based on which a mini-batch coordinate descent noise covariance matrix estimation framework is developed. In this framework, the maximum likelihood cost function is simplified for online estimation with fewer and simpler terms which are selected in a mini-batch and calculated with a backtracking method. This maximum likelihood cost function is sidestepped and solved by exploring possible estimated noise covariance matrices adaptively while the historical measurement information is adequately utilized. Furthermore, four specific algorithms are derived under this framework to meet different practical requirements in terms of convergence speed, estimation accuracy,and calculation load. Abundant simulations and experiments are carried out to verify the validity and superiority of the proposed algorithms as compared with existing state-of-the-art AKFs.
作者:
Xiongbo WanChaoling ZhangFan WeiChuan-Ke ZhangMin WuIEEEthe School of Automation
China University of Geosciencesthe Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systemsand the Engineering Research Center of Intelligent Technology for Geo-ExplorationMinistry of EducationWuhan 430074China
This article focuses on dynamic event-triggered mechanism(DETM)-based model predictive control(MPC) for T-S fuzzy systems.A hybrid dynamic variables-dependent DETM is carefully devised,which includes a multiplicative ...
详细信息
This article focuses on dynamic event-triggered mechanism(DETM)-based model predictive control(MPC) for T-S fuzzy systems.A hybrid dynamic variables-dependent DETM is carefully devised,which includes a multiplicative dynamic variable and an additive dynamic *** addressed DETM-based fuzzy MPC issue is described as a “min-max” optimization problem(OP).To facilitate the co-design of the MPC controller and the weighting matrix of the DETM,an auxiliary OP is proposed based on a new Lyapunov function and a new robust positive invariant(RPI) set that contain the membership functions and the hybrid dynamic variables.A dynamic event-triggered fuzzy MPC algorithm is developed accordingly,whose recursive feasibility is analysed by employing the RPI *** the designed controller,the involved fuzzy system is ensured to be asymptotically *** examples show that the new DETM and DETM-based MPC algorithm have the advantages of reducing resource consumption while yielding the anticipated performance.
In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy updat...
详细信息
Real open network environments include the traffic generated by known applications or protocols, which have been previously identified and labeled, and unknown network traffic that cannot be identified based on existi...
详细信息
This paper focuses on the challenge of fixed-time control for spatiotemporal neural networks(SNNs) with discontinuous activations and time-varying coefficients. A novel fixed-time convergence lemma is proposed, which ...
详细信息
This paper focuses on the challenge of fixed-time control for spatiotemporal neural networks(SNNs) with discontinuous activations and time-varying coefficients. A novel fixed-time convergence lemma is proposed, which facilitates the handling of time-varying coefficients of SNNs and relaxes the restriction on the non-positive definiteness of the derivative of the Lyapunov function. Besides, a more flexible and economical aperiodically switching control technique is presented to stabilize SNNs within a fixed time,efectively reducing the amount of information transmission and control costs. Under the newly established fixed-time convergence lemma and aperiodically switching controller, many more general algebraic conditions are deduced to ensure the fixed-time stabilization of SNNs. Numerical examples are provided to manifest the validity of the results.
The slow phase transformation of microalloyed dual phase steel makes the nonuniform stress and temperature fields during the post rolling cooling process have a significant impact on the phase transformation *** the r...
详细信息
The slow phase transformation of microalloyed dual phase steel makes the nonuniform stress and temperature fields during the post rolling cooling process have a significant impact on the phase transformation *** the relatively slow phase transformation of DP780 steel within the microalloyed dual phase steel series,the influence of stress on the phase transformation behavior of DP780 steel was *** quantify the nonuniform thermal and stress conditions in the steel coil,a thermo-mechanical coupled finite element model of the hot-rolled strip cooling process was *** on the simulation data,DP780 steel was chosen as the research material,and Gleeble 3500 thermal simulation equipment was used for experimental *** thermal expansion curves were analyzed through regression to establish the dynamic model of DP780 steel phase transformation under ***,metallographic analysis was conducted to determine phase transformation type and grain size of DP780 *** results confirmed that the stress promotes the occurrence of semi-diffusion-type bainite ***,an appropriate level of stress facilitates the growth of bainitic grains,while the increased stress inhibits the growth of ferritic grains.
暂无评论