We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounte...
详细信息
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Vibration signals can be used to extract effective fault features for fault diagnosis. However, traditional supervised learning requires considerable manpower and time to mark samples manually, and this process is dif...
详细信息
Vibration signals can be used to extract effective fault features for fault diagnosis. However, traditional supervised learning requires considerable manpower and time to mark samples manually, and this process is difficult to apply to practical fault diagnosis. Deep reinforcement learning which combines the perception ability of deep learning with the decision-making ability of reinforcement learning, can independently extract hidden fault features and effectively improve the accuracy of fault diagnosis. Semi-supervised learning can reduce the proportion of labeled samples to decrease the learning cost while improving the recognition accuracy with unlabeled samples. In this study, we propose a novel semi-supervised deep reinforcement learning method. A semi-supervised generative adversarial network combined with the improved actor-critic algorithm is proposed to perform fault diagnosis when the labeled sample size is small. In the experiment of rolling bearing fault and engineering application, three-channel time-frequency graphs extracted from raw signals with the wavelet packet are compressed into single channel gray graphs. Then, to simulate the less labeled sample dataset, 2%, 5%, 20%, 50% and 100% sample labels are set by dislodging partial label from the processing sample. The results of the proposed method and other intelligent methods are listed to demonstrate that the proposed method could provide better performance over other methods even if the size of labeled sample is small in compound fault diagnosis.
In this paper, an online optimization approach of a fractional-order PID controller based on a fractional-order actor-critic algorithm (FOPID-FOAC) is proposed. The proposed FOPID-FOAC scheme exploits the advantages o...
详细信息
In this paper, an online optimization approach of a fractional-order PID controller based on a fractional-order actor-critic algorithm (FOPID-FOAC) is proposed. The proposed FOPID-FOAC scheme exploits the advantages of the FOPID controller and FOAC approaches to improve the performance of nonlinear systems. The proposed FOAC is built by developing a FO-based learning approach for the actor-critic neural network with adaptive learning rates. Moreover, a FO rectified linear unit (RLU) is introduced to enable the AC neural network to define and optimize its own activation function. By the means of the Lyapunov theorem, the convergence and the stability analysis of the proposed algorithm are investigated. The FO operators for the FOAC learning algorithm are obtained using the gray wolf optimization (GWO) algorithm. The effectiveness of the proposed approach is proven by extensive simulations based on the tracking problem of the two degrees of freedom (2-DOF) helicopter system and the stabilization issue of the inverted pendulum (IP) system. Moreover, the performance of the proposed algorithm is compared against optimized FOPID control approaches in different system conditions, namely when the system is subjected to parameter uncertainties and external disturbances. The performance comparison is conducted in terms of two types of performance indices, the error performance indices, and the time response performance indices. The first one includes the integral absolute error (IAE), and the integral squared error (ISE), whereas the second type involves the rising time, the maximum overshoot (Max. OS), and the settling time. The simulation results explicitly indicate the high effectiveness of the proposed FOPID-FOAC controller in terms of the two types of performance measurements under different scenarios compared with the other control algorithms.
We propose a method to automatically select proper values of three thresholds in the Canny edge algorithm. Edge detection is widely used for object recognition, detection, and segmentation. Due to its good performance...
详细信息
We propose a method to automatically select proper values of three thresholds in the Canny edge algorithm. Edge detection is widely used for object recognition, detection, and segmentation. Due to its good performance, the Canny edge algorithm is still widely used among many edge detection algorithms. But, it requires manually selecting three appropriate thresholds for the given image. Some approaches have been proposed for automatically setting thresholds in the Canny edge algorithm. But, they either deal with partial among three entries or only show their performance in a limited range of variation. In natural scenes, images are acquired under various illumination, pose, and weather conditions. This paper proposes a method that can operate in various environments. We formulate the given problem by adopting an actor-critic algorithm. We propose an actor and critic network to solve the problem with an actor-critic algorithm. Also, we suggest a reward configuration based on an edge evaluation network and measure to prevent the reversal between high and low thresholds. The edge evaluation network uses an original image and an edge image as input. We set a negative reward when reversing the high and low thresholds occur. The proposed algorithm can adapt to unseen environments using images without requiring ground truth labels. Experimental results using diverse datasets show the feasibility of the proposed algorithm.
Markov games provide a powerful framework for modeling strategic multi-agent interactions in dynamic environments. Traditionally, convergence properties of decentralized learning algorithms in these settings have been...
详细信息
Markov games provide a powerful framework for modeling strategic multi-agent interactions in dynamic environments. Traditionally, convergence properties of decentralized learning algorithms in these settings have been established only for special cases, such as Markov zero-sum and potential games, which do not fully capture real-world interactions. In this letter, we address this gap by studying the asymptotic properties of learning algorithms in general-sum Markov games. In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic with asynchronous step sizes. This decentralized approach enables agents to operate independently, without requiring knowledge of others' strategies or payoffs. We introduce the concept of a Markov Near-Potential Function (MNPF) and demonstrate that it serves as an approximate Lyapunov function for the policy updates in the decentralized learning dynamics, which allows us to characterize the convergent set of strategies. We further strengthen our result under specific regularity conditions and with finite Nash equilibria.
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP)...
详细信息
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an exa...
详细信息
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.
Deep learning is used for decision making and functional control in various fields, such as autonomous systems. However, rather than being developed by logical design, deep learning models are trained by itself throug...
详细信息
ISBN:
(纸本)9781665485500
Deep learning is used for decision making and functional control in various fields, such as autonomous systems. However, rather than being developed by logical design, deep learning models are trained by itself through learning data. Moreover, only reward values are used to evaluate its performance, which does not provide enough information that the model learned properly. This paper proposes a new method to assess the correctness of reinforcement learning, considering other properties of the learning algorithm. The proposed method is applied for the evaluation of actorcriticalgorithms, and correctness-related insights of the algorithm are confirmed through experiments.
The quality of fault recognition part is one of the key factors affecting the efficiency of intelligent manufacturing. Many excellent achievements in deep learning (DL) have been realized recently as methods of fault ...
详细信息
The quality of fault recognition part is one of the key factors affecting the efficiency of intelligent manufacturing. Many excellent achievements in deep learning (DL) have been realized recently as methods of fault recognition. However, DL models have inherent shortcomings. In particular, the phenomenon of over-fitting or degradation suggests that such an intelligent algorithm cannot fully use its feature perception ability. Researchers have mainly adapted the network architecture for fault diagnosis, but the above limitations are not taken into account. In this study, we propose a novel deep reinforcement learning method that combines the perception of DL with the decision-making ability of reinforcement learning. This method enhances the classification accuracy of the DL module to autonomously learn much more knowledge hidden in raw data. The proposed method based on the convolutional neural network (CNN) also adopts an improved actor-critic algorithm for fault recognition. The important parts in standard actor-critic algorithm, such as environment, neural network, reward, and loss functions, have been fully considered in improved actor-critic algorithm. Additionally, to fully distinguish compound faults under heavy background noise, multi-channel signals are first stacked synchronously and then input into the model in the end-to-end training mode. The diagnostic results on the compound fault of the bearing and tool in the machine tool experimental system show that compared with other methods, the proposed network structure has more accurate results. These findings demonstrate that under the guidance of the improved actor-critic algorithm and processing method for multi-channel data, the proposed method thus has stronger exploration performance.
In the dynamic fi eld of deep reinforcement learning, the self -attention mechanism has been increasingly recognized. Nevertheless, its application in discrete problem domains has been relatively limited, presenting c...
详细信息
In the dynamic fi eld of deep reinforcement learning, the self -attention mechanism has been increasingly recognized. Nevertheless, its application in discrete problem domains has been relatively limited, presenting complex optimization challenges. This article introduces a pioneering deep reinforcement learning algorithm, termed Attention -based actor -critic with Priority Experience Replay (A2CPER). A2CPER combines the strengths of self -attention mechanisms with the actor -critic framework and prioritized experience replay to enhance policy formulation for discrete problems. The algorithm ' s architecture features dual networks within the actor -critic model - the actor formulates action policies and the critic evaluates state values to judge the quality of policies. The incorporation of target networks aids in stabilizing network optimization. Moreover, the addition of self -attention mechanisms bolsters the policy network ' s capability to focus on critical information, while priority experience replay promotes training stability and reduces correlation among training samples. Empirical experiments on discrete action problems validate A2CPER ' s adeptness at policy optimization, marking signi fi cant performance improvements across tasks. In summary, A2CPER highlights the viability of selfattention mechanisms in reinforcement learning, presenting a robust framework for discrete problem -solving and potential applicability in complex decision -making scenarios.
暂无评论