检索结果-内蒙古大学图书馆

Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation

IEEE/CAA Journal of Automatica Sinica 2024年第12期11卷 2497-2511页

作者： Shuo Cao Xuesong Wang Yuhu Cheng the Engineering Research Center of Intelligent Control for Underground Space Ministry of Educationand the School of Information and Control EngineeringChina University of Mining and TechnologyXuzhou 221116China IEEE

To alleviate the extrapolation error and instability inherent in Q-function directly learned by off-policy Q-learning(QL-style)on static datasets,this article utilizes the on-policy state-action-reward-state-action(SARSA-style)to develop an offline reinforcement learning(RL)method termed robust offline Actor-Critic with on-policy regularized policy evaluation(OPRAC).With the help of SARSA-style bootstrap actions,a conservative on-policy Q-function and a penalty term for matching the on-policy and off-policy actions are jointly constructed to regularize the optimal Q-function of off-policy *** naturally equips the off-policy QL-style policy evaluation with the intrinsic pessimistic conservatism of on-policy SARSA-style,thus facilitating the acquisition of stable estimated *** with limited data sampling errors,the convergence of Q-function learned by OPRAC and the controllability of bias upper bound between the learned Q-function and its true Q-value can be theoretically *** addition,the sub-optimality of learned optimal policy merely stems from sampling *** on the well-known D4RL Gym-MuJoCo benchmark demonstrate that OPRAC can rapidly learn robust and effective tasksolving policies owing to the stable estimate of Q-value,outperforming state-of-the-art offline RLs by at least 15%.

关键词： Offline reinforcement learning off-policy QL-style on-policy SARSA-style policy evaluation(PE) Q-value estimation

来源：评论

学校读者我要写书评

暂无评论

Stability and Stabilization of Sampled-Data Based LFC for Power Systems:A Data-Driven Method

引用

IEEE/CAA Journal of Automatica Sinica 2025年第1期12卷 291-293页

作者： Yu-Long Fan Chuan-Ke Zhang Yong He the School of Automation China University of Geosciences the Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems the Engineering Research Center of Intelligent Technology for Geo-Exploration Ministry of Education IEEE

Dear Editor,This letter is concerned with stability analysis and stabilization design for sampled-data based load frequency control(LFC) systems via a data-driven method. By describing the dynamic behavior of LFC systems based on a data-based representation, a stability criterion is derived to obtain the admissible maximum sampling interval(MSI) for a given controller and a design condition of the PI-type controller is further developed to meet the required MSI. Finally, the effectiveness of the proposed methods is verified by a case study.

关键词： Sampled data control systems

来源：评论

学校读者我要写书评

暂无评论

Offline-Online Actor-Critic

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2024年第1期5卷 61-69页

作者： Wang, Xuesong Hou, Diyuan Huang, Longyang Cheng, Yuhu China University of Mining and Technology Engineering Research Center of Intelligent Control for Underground Space Ministry of Education and School of Information and Control Engineering Xuzhou221116 China

Offline-online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy may degrade when an agent moves from offline to online training phases. In this article, we first analyze the problems of distribution shift and policy performance degradation in offline-online RL. Then, in order to alleviate these problems, we propose a novel RL algorithm offline-online actor-critic (O2AC) algorithm. In O2AC, a behavior clone constraint term is introduced into the policy objective function to address the distribution shift in offline training phase. In addition, in online training phase, the influence of the behavior clone constraint term is gradually reduced, which alleviates the policy performance degradation. Experiments show that O2AC outperforms existing offline-online RL algorithms. © 2020 IEEE.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Exponential Stability of Impulsive System via Saturated Sliding Mode control

引用

IEEE/CAA Journal of Automatica Sinica 2025年第2期12卷 469-471页

作者： Miaomiao Yu Xiaodi Li the Shandong Provincial Engineering Research Center of System Control and Intelligent Technology Shandong Normal University School of Mathematics and Statistics Shandong Normal University IEEE

Dear Editor,This letter presents a class of saturated sliding mode control (SMC)strategy for linear systems subject to impulsive disturbance and input saturation. To ensure the feasibility of proposed SMC under saturation, a relationship is established among attraction domain, saturation structure and control gain.

关键词：

来源：评论

学校读者我要写书评

暂无评论

An Online Exploratory Maximum Likelihood Estimation Approach to Adaptive Kalman Filtering

引用

IEEE/CAA Journal of Automatica Sinica 2025年第1期12卷 228-254页

作者： Jiajun Cheng Haonan Chen Zhirui Xue Yulong Huang Yonggang Zhang the College of Intelligent Systems Science and Engineering Harbin Engineering University the Engineering Research Center of Navigation Instruments Ministry of Education the College of Intelligent Systems Science and Engineering the College of Future Technology Harbin Engineering University IEEE

Over the past few decades, numerous adaptive Kalman filters(AKFs) have been proposed. However, achieving online estimation with both high estimation accuracy and fast convergence speed is challenging, especially when both the process noise and measurement noise covariance matrices are relatively inaccurate. Maximum likelihood estimation(MLE) possesses the potential to achieve this goal, since its theoretical accuracy is guaranteed by asymptotic optimality and the convergence speed is fast due to weak dependence on accurate state ***, the maximum likelihood cost function is so intricate that the existing MLE methods can only simply ignore all historical measurement information to achieve online estimation,which cannot adequately realize the potential of MLE. In order to design online MLE-based AKFs with high estimation accuracy and fast convergence speed, an online exploratory MLE approach is proposed, based on which a mini-batch coordinate descent noise covariance matrix estimation framework is developed. In this framework, the maximum likelihood cost function is simplified for online estimation with fewer and simpler terms which are selected in a mini-batch and calculated with a backtracking method. This maximum likelihood cost function is sidestepped and solved by exploring possible estimated noise covariance matrices adaptively while the historical measurement information is adequately utilized. Furthermore, four specific algorithms are derived under this framework to meet different practical requirements in terms of convergence speed, estimation accuracy,and calculation load. Abundant simulations and experiments are carried out to verify the validity and superiority of the proposed algorithms as compared with existing state-of-the-art AKFs.

关键词： Adaptive Kalman filtering coordinate descent maximum likelihood estimation mini-batch optimization unknown noise covariance matrix

来源：评论

学校读者我要写书评

暂无评论

Hybrid Dynamic Variables-Dependent Event-Triggered Fuzzy Model Predictive control

引用

IEEE/CAA Journal of Automatica Sinica 2024年第3期11卷 723-733页

作者： Xiongbo Wan Chaoling Zhang Fan Wei Chuan-Ke Zhang Min Wu IEEE the School of Automation China University of Geosciencesthe Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systemsand the Engineering Research Center of Intelligent Technology for Geo-ExplorationMinistry of EducationWuhan 430074China

This article focuses on dynamic event-triggered mechanism(DETM)-based model predictive control(MPC) for T-S fuzzy systems.A hybrid dynamic variables-dependent DETM is carefully devised,which includes a multiplicative dynamic variable and an additive dynamic *** addressed DETM-based fuzzy MPC issue is described as a “min-max” optimization problem(OP).To facilitate the co-design of the MPC controller and the weighting matrix of the DETM,an auxiliary OP is proposed based on a new Lyapunov function and a new robust positive invariant(RPI) set that contain the membership functions and the hybrid dynamic variables.A dynamic event-triggered fuzzy MPC algorithm is developed accordingly,whose recursive feasibility is analysed by employing the RPI *** the designed controller,the involved fuzzy system is ensured to be asymptotically *** examples show that the new DETM and DETM-based MPC algorithm have the advantages of reducing resource consumption while yielding the anticipated performance.

关键词： Dynamic event-triggered mechanism(DETM) hybrid dynamic variables model predictive control(MPC) robust positive invariant(RPI)set T-S fuzzy systems

来源：评论

学校读者我要写书评

暂无评论

Proximal Policy Optimization With Advantage Reuse Competition

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2024年第8期5卷 3915-3925页

作者： Cheng, Yuhu Guo, Qingbang Wang, Xuesong Engineering Research Center of Intelligent Control for Underground Space Ministry of Education Xuzhou221116 China China University of Mining and Technology School of Information and Control Engineering Xuzhou221116 China

In recent years, reinforcement learning (RL) has made great achievements in artificial intelligence. Proximal policy optimization (PPO) is a representative RL algorithm, which limits the magnitude of each policy update to achieve monotonic policy improvement. However, as an on-policy algorithm, PPO suffers from sample inefficiency and poor policy exploratory. To solve above problems, the off-policy advantage is proposed, which calculates the advantage function through the reuse of previous policy, and the proximal policy optimization with advantage reuse (PPO-AR) is proposed. Furthermore, to improve the sampling efficiency of policy update, the proximal policy optimization with advantage reuse competition (PPO-ARC) is proposed, which introduces PPO-AR into the policy calculation and uses the parallel competitive optimization, and it is shown to improve the performance of policy. Moreover, to improve the exploratory of policy update, the proximal policy optimization with generalized clipping (PPO-GC) is proposed, which relaxes the limits of policy update by changing the policy flat clipping boundary. Experimental results on OpenAI Gym demonstrate the effectiveness of our proposed PPO-ARC and PPO-GC. © 2020 IEEE.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Deep-Learning-Based Uncertainty-Estimation Approach for Unknown Traffic Identification

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2025年第3期6卷 533-548页

作者： Le, Siqi Lai, Yingxu Wang, Yipeng He, Huijie Beijing University of Technology Faculty of Information Technology Beijing100124 China Ministry of Education Engineering Research Center of Intelligent Perception and Autonomous Control Beijing100124 China

Real open network environments include the traffic generated by known applications or protocols, which have been previously identified and labeled, and unknown network traffic that cannot be identified based on existing knowledge. Accurately identifying unknown traffic is critical to network management and security, not only to help managers allocate bandwidth appropriately for all types of applications and ensure quality of service but also to prevent security breaches that may result from unknown applications or protocols. Notably, the unknown network traffic has been increasing with the emergence of new applications or protocols, which further increases the difficulty in identifying them. Existing unknown traffic classification methods based on Softmax output confidence values cause bias in the prediction probability due to overconfidence of the model during the training process, thus decreasing the identification accuracy. Thus, for unknown traffic identification, this study proposes a deep-learning-based uncertainty-estimation (EUE) approach. EUE introduces the theory of evidence to the task of identifying unknown traffic by inferring traffic uncertainty directly from traffic evidence without the need for a Softmax layer, thus avoiding overconfidence in the model. Thus, the EUE can accurately identify unknown traffic while classifying known traffic at the application level. We construct two experimental scenarios simulating the real network environments with different proportions of unknown traffic to evaluate EUE. The experimental results show that the proposed approach EUE exhibits excellent classification accuracy. © 2024 IEEE. All rights reserved.

关键词： Internet protocols

来源：评论

学校读者我要写书评

暂无评论

Fixed-time stabilization of discontinuous spatiotemporal neural networks with time-varying coefficients via aperiodically switching control

引用

Science China(Information Sciences) 2023年第5期66卷 182-195页

作者： Xiaofang HU Leimin WANG Chuan-Ke ZHANG Xiongbo WAN Yong HE School of Automation China University of Geosciences Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems Engineering Research Center of Intelligent Technology for Geo-Exploration Ministry of Education

This paper focuses on the challenge of fixed-time control for spatiotemporal neural networks(SNNs) with discontinuous activations and time-varying coefficients. A novel fixed-time convergence lemma is proposed, which facilitates the handling of time-varying coefficients of SNNs and relaxes the restriction on the non-positive definiteness of the derivative of the Lyapunov function. Besides, a more flexible and economical aperiodically switching control technique is presented to stabilize SNNs within a fixed time,efectively reducing the amount of information transmission and control costs. Under the newly established fixed-time convergence lemma and aperiodically switching controller, many more general algebraic conditions are deduced to ensure the fixed-time stabilization of SNNs. Numerical examples are provided to manifest the validity of the results.

关键词： spatiotemporal neural networks discontinuous activations time-varying coefficients fixed-time stabilization aperiodically switching control

来源：评论

学校读者我要写书评

暂无评论

Effect of post rolling stress on phase transformation behavior of microalloyed dual phase steel

引用

Journal of Iron and Steel research International 2024年第3期31卷 688-699页

作者： Wen-quan Sun Sheng-yi Yong Tie-heng Yuan Ting-song Yang An-rui He Chao Liu Rui-chun Guo National Engineering Research Center for Advanced Rolling and Intelligent Manufacturing University of Science and Technology BeijingBeijing 100083China

The slow phase transformation of microalloyed dual phase steel makes the nonuniform stress and temperature fields during the post rolling cooling process have a significant impact on the phase transformation *** the relatively slow phase transformation of DP780 steel within the microalloyed dual phase steel series,the influence of stress on the phase transformation behavior of DP780 steel was *** quantify the nonuniform thermal and stress conditions in the steel coil,a thermo-mechanical coupled finite element model of the hot-rolled strip cooling process was *** on the simulation data,DP780 steel was chosen as the research material,and Gleeble 3500 thermal simulation equipment was used for experimental *** thermal expansion curves were analyzed through regression to establish the dynamic model of DP780 steel phase transformation under ***,metallographic analysis was conducted to determine phase transformation type and grain size of DP780 *** results confirmed that the stress promotes the occurrence of semi-diffusion-type bainite ***,an appropriate level of stress facilitates the growth of bainitic grains,while the increased stress inhibits the growth of ferritic grains.

关键词： Microalloyed dual phase steel Stress Bainitic transformation Grain Finite element method

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：