检索结果-内蒙古大学图书馆

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

SYSTEMS & CONTROL LETTERS 2010年第12期59卷 760-766页

作者： Bhatnagar, Shalabh Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India

We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.

关键词： Constrained Markov decision processes Infinite horizon discounted cost criterion Function approximation actor-critic algorithm Simultaneous perturbation stochastic approximation

来源：评论

学校读者我要写书评

暂无评论

A novel semi-supervised generative adversarial network based on the actor-critic algorithm for compound fault recognition

引用

NEURAL COMPUTING & APPLICATIONS 2022年第13期34卷 10787-10805页

作者： Wang, Zisheng Xuan, Jianping Shi, Tielin Huazhong Univ Sci & Technol Sch Mech Sci & Engn Wuhan 430074 Peoples R China

Vibration signals can be used to extract effective fault features for fault diagnosis. However, traditional supervised learning requires considerable manpower and time to mark samples manually, and this process is difficult to apply to practical fault diagnosis. Deep reinforcement learning which combines the perception ability of deep learning with the decision-making ability of reinforcement learning, can independently extract hidden fault features and effectively improve the accuracy of fault diagnosis. Semi-supervised learning can reduce the proportion of labeled samples to decrease the learning cost while improving the recognition accuracy with unlabeled samples. In this study, we propose a novel semi-supervised deep reinforcement learning method. A semi-supervised generative adversarial network combined with the improved actor-critic algorithm is proposed to perform fault diagnosis when the labeled sample size is small. In the experiment of rolling bearing fault and engineering application, three-channel time-frequency graphs extracted from raw signals with the wavelet packet are compressed into single channel gray graphs. Then, to simulate the less labeled sample dataset, 2%, 5%, 20%, 50% and 100% sample labels are set by dislodging partial label from the processing sample. The results of the proposed method and other intelligent methods are listed to demonstrate that the proposed method could provide better performance over other methods even if the size of labeled sample is small in compound fault diagnosis.

关键词： Fault recognition Semi-supervised learning Deep reinforcement learning actor-critic algorithm Wavelet packet

来源：评论

学校读者我要写书评

暂无评论

Optimal fractional-order PID controller based on fractional-order actor-critic algorithm

引用

NEURAL COMPUTING & APPLICATIONS 2023年第3期35卷 2347-2380页

作者： Shalaby, Raafat El-Hossainy, Mohammad Abo-Zalam, Belal Mahmoud, Tarek A. Menoufia Univ Fac Elect Engn Dept Ind Elect & Control Engn Menoufia 32952 Egypt Nile Univ Sch Engn & Appl Sci Dept Mechatron Engn Giza 12588 Egypt New Cairo Technol Univ Fac Ind & Energy Technol Dept New & Renewable Energy Cairo 11853 Egypt

In this paper, an online optimization approach of a fractional-order PID controller based on a fractional-order actor-critic algorithm (FOPID-FOAC) is proposed. The proposed FOPID-FOAC scheme exploits the advantages of the FOPID controller and FOAC approaches to improve the performance of nonlinear systems. The proposed FOAC is built by developing a FO-based learning approach for the actor-critic neural network with adaptive learning rates. Moreover, a FO rectified linear unit (RLU) is introduced to enable the AC neural network to define and optimize its own activation function. By the means of the Lyapunov theorem, the convergence and the stability analysis of the proposed algorithm are investigated. The FO operators for the FOAC learning algorithm are obtained using the gray wolf optimization (GWO) algorithm. The effectiveness of the proposed approach is proven by extensive simulations based on the tracking problem of the two degrees of freedom (2-DOF) helicopter system and the stabilization issue of the inverted pendulum (IP) system. Moreover, the performance of the proposed algorithm is compared against optimized FOPID control approaches in different system conditions, namely when the system is subjected to parameter uncertainties and external disturbances. The performance comparison is conducted in terms of two types of performance indices, the error performance indices, and the time response performance indices. The first one includes the integral absolute error (IAE), and the integral squared error (ISE), whereas the second type involves the rising time, the maximum overshoot (Max. OS), and the settling time. The simulation results explicitly indicate the high effectiveness of the proposed FOPID-FOAC controller in terms of the two types of performance measurements under different scenarios compared with the other control algorithms.

关键词： Fractional-order PID controller Reinforcement learning actor-critic algorithm Gray wolf optimization Lyapunov theorem

来源：评论

学校读者我要写书评

暂无评论

An Adaptive Threshold for the Canny Edge With actor-critic algorithm

引用

IEEE ACCESS 2023年 11卷 67058-67069页

作者： Choi, Keong-Hun Ha, Jong-Eun Seoul Natl Univ Sci & Technol Grad Sch Automot Engn Seoul 01811 South Korea Seoul Natl Univ Sci & Technol Dept Mech & Automot Engn Seoul 01811 South Korea

We propose a method to automatically select proper values of three thresholds in the Canny edge algorithm. Edge detection is widely used for object recognition, detection, and segmentation. Due to its good performance, the Canny edge algorithm is still widely used among many edge detection algorithms. But, it requires manually selecting three appropriate thresholds for the given image. Some approaches have been proposed for automatically setting thresholds in the Canny edge algorithm. But, they either deal with partial among three entries or only show their performance in a limited range of variation. In natural scenes, images are acquired under various illumination, pose, and weather conditions. This paper proposes a method that can operate in various environments. We formulate the given problem by adopting an actor-critic algorithm. We propose an actor and critic network to solve the problem with an actor-critic algorithm. Also, we suggest a reward configuration based on an edge evaluation network and measure to prevent the reversal between high and low thresholds. The edge evaluation network uses an original image and an edge image as input. We set a negative reward when reversing the high and low thresholds occur. The proposed algorithm can adapt to unseen environments using images without requiring ground truth labels. Experimental results using diverse datasets show the feasibility of the proposed algorithm.

关键词： actor-critic algorithm edge detection deep reinforcement learning deep learning

来源：评论

学校读者我要写书评

暂无评论

Convergence of Decentralized actor-critic algorithm in General-Sum Markov Games

引用

IEEE CONTROL SYSTEMS LETTERS 2024年 8卷 2643-2648页

作者： Maheshwari, Chinmay Wu, Manxi Sastry, Shankar Univ Calif Berkeley Dept EECS Berkeley CA 94709 USA Univ Calif Berkeley Dept Civil & Environm Engn Berkeley CA 94709 USA

Markov games provide a powerful framework for modeling strategic multi-agent interactions in dynamic environments. Traditionally, convergence properties of decentralized learning algorithms in these settings have been established only for special cases, such as Markov zero-sum and potential games, which do not fully capture real-world interactions. In this letter, we address this gap by studying the asymptotic properties of learning algorithms in general-sum Markov games. In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic with asynchronous step sizes. This decentralized approach enables agents to operate independently, without requiring knowledge of others' strategies or payoffs. We introduce the concept of a Markov Near-Potential Function (MNPF) and demonstrate that it serves as an approximate Lyapunov function for the policy updates in the decentralized learning dynamics, which allows us to characterize the convergent set of strategies. We further strengthen our result under specific regularity conditions and with finite Nash equilibria.

关键词： Games Convergence Nash equilibrium Heuristic algorithms Approximation algorithms Trajectory Lyapunov methods Vectors Stochastic processes Standards Markov games decentralized learning Markov near-potential functions actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

An Online actor-critic algorithm with Function Approximation for Constrained Markov Decision Processes

引用

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS 2012年第3期153卷 688-708页

作者： Bhatnagar, Shalabh Lakshmanan, K. Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

关键词： actor-critic algorithm Constrained Markov decision processes Long-run average cost criterion Function approximation

来源：评论

学校读者我要写书评

暂无评论

The actor-critic algorithm as multi-time-scale stochastic approximation

引用

SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES 1997年第4期22卷 525-543页

作者： Borkar, VS Konda, VR Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India

The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an exa... 详细信息

关键词： actor-critic algorithm stochastic approximation Markov decision processes simulation-based algorithms policy iteration

来源：评论

学校读者我要写书评

暂无评论

Evaluating Correctness of Reinforcement Learning based on actor-critic algorithm 13

Evaluating Correctness of Reinforcement Learning based on Ac...

引用

13th International Conference on Ubiquitous and Future Networks (ICUFN)

作者： Kim, Youngjae Hussain, Manzoor Suh, Jae-Won Hong, Jang-Eui Chungbuk Natl Univ Coll Elect & Comp Engn Cheongju South Korea

ISBN: (纸本)9781665485500

Deep learning is used for decision making and functional control in various fields, such as autonomous systems. However, rather than being developed by logical design, deep learning models are trained by itself through learning data. Moreover, only reward values are used to evaluate its performance, which does not provide enough information that the model learned properly. This paper proposes a new method to assess the correctness of reinforcement learning, considering other properties of the learning algorithm. The proposed method is applied for the evaluation of actorcritic algorithms, and correctness-related insights of the algorithm are confirmed through experiments.

关键词： reinforcement learning actor-critic algorithm safety-critical system quality evaluation correctness

来源：评论

学校读者我要写书评

暂无评论

Intelligent fault recognition framework by using deep reinforcement learning with one dimension convolution and improved actor-critic algorithm

引用

ADVANCED ENGINEERING INFORMATICS 2021年 49卷 101315-101315页

作者： Wang, Zisheng Xuan, Jianping Huazhong Univ Sci & Technol Sch Mech Sci & Engn Wuhan 430074 Peoples R China

The quality of fault recognition part is one of the key factors affecting the efficiency of intelligent manufacturing. Many excellent achievements in deep learning (DL) have been realized recently as methods of fault recognition. However, DL models have inherent shortcomings. In particular, the phenomenon of over-fitting or degradation suggests that such an intelligent algorithm cannot fully use its feature perception ability. Researchers have mainly adapted the network architecture for fault diagnosis, but the above limitations are not taken into account. In this study, we propose a novel deep reinforcement learning method that combines the perception of DL with the decision-making ability of reinforcement learning. This method enhances the classification accuracy of the DL module to autonomously learn much more knowledge hidden in raw data. The proposed method based on the convolutional neural network (CNN) also adopts an improved actor-critic algorithm for fault recognition. The important parts in standard actor-critic algorithm, such as environment, neural network, reward, and loss functions, have been fully considered in improved actor-critic algorithm. Additionally, to fully distinguish compound faults under heavy background noise, multi-channel signals are first stacked synchronously and then input into the model in the end-to-end training mode. The diagnostic results on the compound fault of the bearing and tool in the machine tool experimental system show that compared with other methods, the proposed network structure has more accurate results. These findings demonstrate that under the guidance of the improved actor-critic algorithm and processing method for multi-channel data, the proposed method thus has stronger exploration performance.

关键词： Fault recognition Deep reinforcement learning actor-critic algorithm 1D convolution

来源：评论

学校读者我要写书评

暂无评论

A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems

引用

PEERJ COMPUTER SCIENCE 2024年 10卷 e2161页

作者： Sun, Yuezhongyi Yang, Boyu Harbin Univ Sci & Technol Sch Comp Sci & Technol Harbin Heilongjiang Peoples R China

In the dynamic fi eld of deep reinforcement learning, the self -attention mechanism has been increasingly recognized. Nevertheless, its application in discrete problem domains has been relatively limited, presenting complex optimization challenges. This article introduces a pioneering deep reinforcement learning algorithm, termed Attention -based actor -critic with Priority Experience Replay (A2CPER). A2CPER combines the strengths of self -attention mechanisms with the actor -critic framework and prioritized experience replay to enhance policy formulation for discrete problems. The algorithm ' s architecture features dual networks within the actor -critic model - the actor formulates action policies and the critic evaluates state values to judge the quality of policies. The incorporation of target networks aids in stabilizing network optimization. Moreover, the addition of self -attention mechanisms bolsters the policy network ' s capability to focus on critical information, while priority experience replay promotes training stability and reduces correlation among training samples. Empirical experiments on discrete action problems validate A2CPER ' s adeptness at policy optimization, marking signi fi cant performance improvements across tasks. In summary, A2CPER highlights the viability of selfattention mechanisms in reinforcement learning, presenting a robust framework for discrete problem -solving and potential applicability in complex decision -making scenarios.

关键词： actor-critic algorithm A2CPER Priority experience replay Self-attention mechanism Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：