检索结果-内蒙古大学图书馆

An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics

引用

Journal of Systems Science & Complexity 2024年第5期37卷 1907-1922页

作者： ZHANG Bao-Qiang WANG Bing-Chang CAO Ying School of Control Science and Engineering Shandong UniversityJinan 250000China

In this paper,the authors design a reinforcement learning algorithm to solve the adaptive linear-quadratic stochastic n-players non-zero sum differential game with completely unknown *** each player,a critic network is used to estimate the Q-function,and an actor network is used to estimate the control input.A model-free online Q-learning algorithm is obtained for solving this kind of *** is proved that under some mild conditions the system state and weight estimation errors can be uniformly ultimately bounded.A simulation with five players is given to verify the effectiveness of the algorithm.

关键词： actor-critic algorithm model-free adaptive control nonzero-sum stochastic game reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Adaptive fault-tolerant control for affine non-linear systems based on approximate dynamic programming

引用

IET CONTROL THEORY AND APPLICATIONS 2016年第6期10卷 655-663页

作者： Fan, Quan-Yong Yang, Guang-Hong Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China

This study investigates the fault-tolerant control problem for affine nonlinear systems with time-varying actuator gain and bias faults. In order to handle the actuator faults and guarantee the approximate optimal performance of the nominal non-linear dynamics, the approximate dynamic programming method is used to design a sliding mode fault-tolerant control policy. First, the actuator faults are estimated using a disturbance observer and a novel adaptive scheme. Based on the fault estimations, an integral sliding function is constructed and the reachability condition is derived. Then, an actor-critic algorithm with new weight tuning laws is given to learn the bounded nearly optimal control policy for the nominal dynamics. The convergence of the neural network weights is presented based on a Lyapunov analysis method. Finally, the simulation results are given to verify the efficacy of the developed method.

关键词： adaptive control fault tolerant control affine transforms nonlinear control systems dynamic programming time-varying systems actuators nonlinear dynamical systems variable structure systems observers reachability analysis optimal control neurocontrollers Lyapunov methods adaptive fault-tolerant control problem affine nonlinear systems time-varying actuator gain bias faults approximate optimal performance nonlinear dynamics approximate dynamic programming method sliding mode fault-tolerant control policy actuator faults disturbance observer integral sliding function reachability condition actor-critic algorithm optimal control policy neural network weights Lyapunov analysis method

来源：评论

学校读者我要写书评

暂无评论

Adaptive TTL-Based Caching for Content Delivery

引用

IEEE-ACM TRANSACTIONS ON NETWORKING 2018年第3期26卷 1063-1077页

作者： Basu, Soumya Sundarrajan, Aditya Ghaderi, Javad Shakkottai, Sanjay Sitaraman, Ramesh Univ Texas Austin Dept Elect & Comp Engn Austin TX 78712 USA Univ Massachusetts Coll Informat & Comp Sci Amherst MA 01003 USA CUNY Dept Elect Engn New York NY 10027 USA

Content delivery networks (CDNs) cache and serve a majority of the user-requested content on the Internet. Designing caching algorithms that automatically adapt to the heterogeneity, burstiness, and non-stationary nature of real-world content requests is a major challenge and is the focus of our work. While there is much work on caching algorithms for stationary request traffic, the work on non-stationary request traffic is very limited. Consequently, most prior models are inaccurate for non-stationary production CDN traffic. We propose two TTL-based caching algorithms that provide provable performance guarantees for request traffic that is bursty and non-stationary. The first algorithm called d-TTL dynamically adapts a TTL parameter using stochastic approximation. Given a feasible target hit rate, we show that d-TTL converges to its target value for a general class of bursty traffic that allows Markov dependence over time and non-stationary arrivals. The second algorithm called f-TTL uses two caches, each with its own TTL. The first-level cache adaptively filters out non-stationary traffic, while the second-level cache stores frequently-accessed stationary traffic. Given feasible targets for both the hit rate and the expected cache size, f-TTL asymptotically achieves both targets. We evaluate both d-TTL and f-TTL using an extensive trace containing more than 500 million requests from a production CDN server. We show that both d-TTL and f-TTL converge to their hit rate targets with an error of about 1.3%. But, f-TTL requires a significantly smaller cache size than d-TTL to achieve the same hit rate, since it effectively filters out non-stationary content.

关键词： TTL caches content delivery network adaptive caching actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Adaptive Deep Reinforcement Learning for Efficient 3D Navigation of Autonomous Underwater Vehicles

引用

IEEE ACCESS 2024年 12卷 178209-178221页

作者： Politi, Elena Stefanidou, Artemis Chronis, Christos Dimitrakopoulos, George Varlamis, Iraklis Harokopio Univ Athens Dept Informat & Telemat Athens 17779 Greece

The exploration of the underwater environments has recently accelerated with the development of the Autonomous Underwater Vehicle (AUV). One of the key elements for enhancing the autonomy of AUVs navigation across various applications is efficient path planning. Reinforcement Learning (RL) methods have been successfully introduced for path planning of AUVs, particularly in high-dimensional state spaces, where prior knowledge of the environment is unfeasible. In this work, we propose a Deep Reinforcement Learning (DRL) method for efficient AUV navigation in 3 Dimension (3D) environments, utilizing input from vision sensors to obtain information about the motion of the AUV and the surrounding space. We adopt a multi-tier approach in order to validate the performance of the proposed DRL approach in three different neural network architectures leveraging on adaptation and accuracy, with path length, execution time and success of operation being considered as the optimization objectives. Finally, a simulation platform is built to evaluate the performance of the proposed method, with experimental results showcasing enhanced decision-making capability for the AUV navigation, which translates to a higher level of autonomy for the vehicle in unknown environments.

关键词： AUV underwater actor-critic algorithm actor-critic algorithm AUV navigation AUV navigation dynamic path planning dynamic path planning proximal policy optimization proximal policy optimization reinforcement learning reinforcement learning reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Maximum Divergence Approach to Optimal Policy in Deep Reinforcement Learning

引用

IEEE TRANSACTIONS ON CYBERNETICS 2023年第3期53卷 1499-1510页

作者： Yang, Zhiyou Qu, Hong Fu, Mingsheng Hu, Wang Zhao, Yongze Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 610054 Peoples R China

Model-free reinforcement learning algorithms based on entropy regularized have achieved good performance in control tasks. Those algorithms consider using the entropy-regularized term for the policy to learn a stochastic policy. This work provides a new perspective that aims to explicitly learn a representation of intrinsic information in state transition to obtain a multimodal stochastic policy, for dealing with the tradeoff between exploration and exploitation. We study a class of Markov decision processes (MDPs) with divergence maximization, called divergence MDPs. The goal of the divergence MDPs is to find an optimal stochastic policy that maximizes the sum of both the expected discounted total rewards and a divergence term, where the divergence function learns the implicit information of state transition. Thus, it can provide better-off stochastic policies to improve both in robustness and performance in a high-dimension continuous setting. Under this framework, the optimality equations can be obtained, and then a divergence actor-critic algorithm is developed based on the divergence policy iteration method to address large-scale continuous problems. The experimental results, compared to other methods, show that our approach achieved better performance and robustness in the complex environment particularly. The code of DivAC can be found in https://***/yzyvl/DivAC.

关键词： Entropy Reinforcement learning Task analysis Robustness Robots Predictive models Markov processes actor-critic algorithm continuous domains divergence Markov decision processes (MDPs) optimality conditions reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Dynamic Navigation in Unconstrained Environments Using Reinforcement Learning algorithms

引用

IEEE ACCESS 2023年 11卷 117984-118001页

作者： Chronis, Christos Anagnostopoulos, Georgios Politi, Elena Dimitrakopoulos, George Varlamis, Iraklis Harokopio Univ Athens Dept Informat & Telemat Athens 17779 Greece

The potential for the use of drones in logistics and transportation is continuously growing, with multiple applications both in urban and rural environments. The safe navigation of drones in such environments is a major challenge that requires sophisticated algorithms and systems, that can quickly and efficiently estimate the situation, find the shortest path to the target, and detect and avoid obstacles. Traditional path planning algorithms are unable to handle the dynamic and uncertain nature of real environments, while traditional machine learning models are insufficient due to the constantly changing conditions that affect the drone location and the location of obstacles. Reinforcement learning (RL) algorithms have been widely used for autonomous navigation problems, however, computational complexity and energy demands of such methods can become a bottleneck to the execution of UAV flights. In this paper, we propose the use of a minimum set of sensors and reinforcement learning (RL) algorithms for the safe and efficient navigation of drones in urban and rural environments. Our approach considers the complex and dynamic nature of such environments by incorporating real-time data from low-cost onboard sensors. After performing a thorough review of the existing solutions in drone path planning and navigation in 3-D environments, we experimentally evaluate our proposed approach in a simulated environment, and in various scenarios. The test results demonstrate the effectiveness of the proposed RL-based approach in the navigation of drones in complex and unconstrained environments. The implemented approach can serve as a basis for the development of advanced and robust navigation systems for drones, which can improve safety and efficiency in transportation applications in the near future.

关键词： actor-critic algorithm drone navigation dynamic path planning proximal policy optimization reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

An experimental study on the application of reinforcement learning in injection molding in the spirit of Industry 4.0

引用

APPLIED SOFT COMPUTING 2024年 167卷

作者： Parizs, Richard Dominik Torok, Daniel Budapest Univ Technol & Econ Dept Polymer Engn Fac Mech Engn Muegyet Rkp 3 H-1111 Budapest Hungary MTA BME Lendulet Lightweight Polymer Composites Re Muegyet Rkp 3 H-1111 Budapest Hungary

The use of reinforcement learning in the injection molding process is a little-researched area in the era of Industry 4.0. The use of a smart decision-making algorithm is necessary for such a complex production method. Therefore, our research aims to extend the knowledge of the practical use of reinforcement learning in injection molding. In our study, we examined the effect of the parameters of the actor-critic algorithm to give a broader picture of the learning process. In addition, we show how to use simulation data, as prior knowledge, to set up the injection molding process for the production of an unknown part.

关键词： Injection molding Reinforcement learning actor-critic algorithm Industry 4.0 Self-adjustment

来源：评论

学校读者我要写书评

暂无评论

Simultaneous locomotion and manipulation control of quadruped robots using reinforcement learning-based adaptive fractional-order sliding-mode control

引用

TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL 2023年第13期45卷 2459-2476页

作者： Farid, Yousef Tarbiat Modares Univ Sch Elect & Comp Engn POB 14115-111 Tehran Iran

This paper investigates a model-free reinforcement learning-based approach that enables the quadruped robot to manipulate objects while maintaining its balance and dynamic stability during walking. At first, the dynamics of quadruped robots in two sub-spaces, position control space and force control space, are developed. Then, a new long-term performance index is introduced, and a radial basis function neural network as a critic network is presented to estimate the unobtainable long-term performance index. Based on the exported reinforcement signal, the actor neural network is introduced to generate the feedforward compensation term to cope with the nonlinear dynamics and the system uncertainties. The robustness of the actor-critic reinforcement learning algorithm is enhanced by using a fractional-order sliding-mode controller in the closed-loop system. The online adaptive laws for both the critic and actor-network weights are obtained using the Lyapunov stability theory. As a result, the uniformly ultimately boundedness of the position and the force tracking errors are proven. Finally, numerical simulations are conducted to illustrate the feasibility and effectiveness of the proposed adaptive actor-critic learning-based control scheme.

关键词： Quadruped robots reinforcement learning actor-critic algorithm fractional terminal sliding surface radial basis networks

来源：评论

学校读者我要写书评

暂无评论

Multi-agent graphical games with input constraints:an online learning solution

引用

Control Theory and Technology 2020年第2期18卷 148-159页

作者： Tianxiang WANG Bingchang WANG Yong LIANG School of Control Science and Engineering Shandong UniversityJinan Shandong 250061China

This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input *** order to obtain the optimal strategy of each agent,it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman(HJB)*** is very difficult to solve HJB equations by the traditional *** relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is *** this paper,an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of ***,this algorithm is to find the optimal solution of Bellman equations *** solution employs a distributed policy iteration process,using only the local information available to each *** can be proved that under certain conditions,when each agent updates its own strategy simultaneously,the whole multi-agent system will reach Nash *** the process of algorithm implementation,for each agent,two layers of neural networks are used to fit the value function and control strategy,***,a simulation example is given to show the effectiveness of our method.

关键词： actor-critic algorithm differential games input constraints neural network(NN) reinforcement learning(RL)

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming method for the optimal control of Alkai-Surfactant-Polymer flooding

引用

JOURNAL OF PROCESS CONTROL 2018年 64卷 15-26页

作者： Ge, Yulei Li, Shurong Chan, Peng China Univ Petr East China Coll Informat & Control Engn Qingdao 266580 Peoples R China Beijing Univ Posts & Telecommun Automat Sch Beijing 100876 Peoples R China Jiangsu Automat Res Inst Lianyungang 222006 Peoples R China

Since the complexity, coupling, distributed parameter, etc. of alkali-surfactant-polymer (ASP) flooding, common optimization methods cannot acquire the optimal solutions well. This paper brings an optimal control method for ASP flooding based on approximate dynamic programming (ADP). At first, take the net present value (NPV) as the performance index. Then the actor-critic algorithm based on gradient descent method is adopted to get the optimal injection strategy, in which actor and critic are used to approximate the control and value function, respectively. To improve the approximation performance, the linear approximation basis function based on system characteristic is constructed. Furthermore, to train and predict the control and value function in next step, a temporal difference (TD) learning algorithm is introduced to update the weight coefficients. Then, the control in ADP is generated according to the Gauss function and its weight is updated according to the sigmoid function of TD error, so that the optimal control can be searched. At last, the enhanced oil recovery problem of ASP flooding with four injection wells and nine production wells is solved by the proposed method to test the effect of proposed method. (C) 2018 Elsevier Ltd. All rights reserved.

关键词： ASP flooding Approximate dynamic programming actor-critic algorithm Temporal difference learning algorithm TD error Linear basis function

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：