检索结果-内蒙古大学图书馆

A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems

引用

PEERJ COMPUTER SCIENCE 2024年 10卷 e2161-e2161页

作者： Sun, Yuezhongyi Yang, Boyu Harbin Univ Sci & Technol Sch Comp Sci & Technol Harbin Heilongjiang Peoples R China

In the dynamic fi eld of deep reinforcement learning, the self -attention mechanism has been increasingly recognized. Nevertheless, its application in discrete problem domains has been relatively limited, presenting complex optimization challenges. This article introduces a pioneering deep reinforcement learning algorithm, termed Attention -based actor -critic with Priority Experience Replay (A2CPER). A2CPER combines the strengths of self -attention mechanisms with the actor -critic framework and prioritized experience replay to enhance policy formulation for discrete problems. The algorithm ' s architecture features dual networks within the actor -critic model - the actor formulates action policies and the critic evaluates state values to judge the quality of policies. The incorporation of target networks aids in stabilizing network optimization. Moreover, the addition of self -attention mechanisms bolsters the policy network ' s capability to focus on critical information, while priority experience replay promotes training stability and reduces correlation among training samples. Empirical experiments on discrete action problems validate A2CPER ' s adeptness at policy optimization, marking signi fi cant performance improvements across tasks. In summary, A2CPER highlights the viability of selfattention mechanisms in reinforcement learning, presenting a robust framework for discrete problem -solving and potential applicability in complex decision -making scenarios.

关键词： actor-critic algorithm A2CPER Priority experience replay Self-attention mechanism Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Convergence of Decentralized actor-critic algorithm in General-Sum Markov Games

引用

IEEE CONTROL SYSTEMS LETTERS 2024年 8卷 2643-2648页

作者： Maheshwari, Chinmay Wu, Manxi Sastry, Shankar Univ Calif Berkeley Dept EECS Berkeley CA 94709 USA Univ Calif Berkeley Dept Civil & Environm Engn Berkeley CA 94709 USA

Markov games provide a powerful framework for modeling strategic multi-agent interactions in dynamic environments. Traditionally, convergence properties of decentralized learning algorithms in these settings have been established only for special cases, such as Markov zero-sum and potential games, which do not fully capture real-world interactions. In this letter, we address this gap by studying the asymptotic properties of learning algorithms in general-sum Markov games. In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic with asynchronous step sizes. This decentralized approach enables agents to operate independently, without requiring knowledge of others' strategies or payoffs. We introduce the concept of a Markov Near-Potential Function (MNPF) and demonstrate that it serves as an approximate Lyapunov function for the policy updates in the decentralized learning dynamics, which allows us to characterize the convergent set of strategies. We further strengthen our result under specific regularity conditions and with finite Nash equilibria.

关键词： Games Convergence Nash equilibrium Heuristic algorithms Approximation algorithms Trajectory Lyapunov methods Vectors Stochastic processes Standards Markov games decentralized learning Markov near-potential functions actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimal fractional-order PID controller based on fractional-order actor-critic algorithm

引用

NEURAL COMPUTING & APPLICATIONS 2023年第3期35卷 2347-2380页

作者： Shalaby, Raafat El-Hossainy, Mohammad Abo-Zalam, Belal Mahmoud, Tarek A. Menoufia Univ Fac Elect Engn Dept Ind Elect & Control Engn Menoufia 32952 Egypt Nile Univ Sch Engn & Appl Sci Dept Mechatron Engn Giza 12588 Egypt New Cairo Technol Univ Fac Ind & Energy Technol Dept New & Renewable Energy Cairo 11853 Egypt

In this paper, an online optimization approach of a fractional-order PID controller based on a fractional-order actor-critic algorithm (FOPID-FOAC) is proposed. The proposed FOPID-FOAC scheme exploits the advantages of the FOPID controller and FOAC approaches to improve the performance of nonlinear systems. The proposed FOAC is built by developing a FO-based learning approach for the actor-critic neural network with adaptive learning rates. Moreover, a FO rectified linear unit (RLU) is introduced to enable the AC neural network to define and optimize its own activation function. By the means of the Lyapunov theorem, the convergence and the stability analysis of the proposed algorithm are investigated. The FO operators for the FOAC learning algorithm are obtained using the gray wolf optimization (GWO) algorithm. The effectiveness of the proposed approach is proven by extensive simulations based on the tracking problem of the two degrees of freedom (2-DOF) helicopter system and the stabilization issue of the inverted pendulum (IP) system. Moreover, the performance of the proposed algorithm is compared against optimized FOPID control approaches in different system conditions, namely when the system is subjected to parameter uncertainties and external disturbances. The performance comparison is conducted in terms of two types of performance indices, the error performance indices, and the time response performance indices. The first one includes the integral absolute error (IAE), and the integral squared error (ISE), whereas the second type involves the rising time, the maximum overshoot (Max. OS), and the settling time. The simulation results explicitly indicate the high effectiveness of the proposed FOPID-FOAC controller in terms of the two types of performance measurements under different scenarios compared with the other control algorithms.

关键词： Fractional-order PID controller Reinforcement learning actor-critic algorithm Gray wolf optimization Lyapunov theorem

来源：评论

学校读者我要写书评

暂无评论

An Adaptive Threshold for the Canny Edge With actor-critic algorithm

引用

IEEE ACCESS 2023年 11卷 67058-67069页

作者： Choi, Keong-Hun Ha, Jong-Eun Seoul Natl Univ Sci & Technol Grad Sch Automot Engn Seoul 01811 South Korea Seoul Natl Univ Sci & Technol Dept Mech & Automot Engn Seoul 01811 South Korea

We propose a method to automatically select proper values of three thresholds in the Canny edge algorithm. Edge detection is widely used for object recognition, detection, and segmentation. Due to its good performance, the Canny edge algorithm is still widely used among many edge detection algorithms. But, it requires manually selecting three appropriate thresholds for the given image. Some approaches have been proposed for automatically setting thresholds in the Canny edge algorithm. But, they either deal with partial among three entries or only show their performance in a limited range of variation. In natural scenes, images are acquired under various illumination, pose, and weather conditions. This paper proposes a method that can operate in various environments. We formulate the given problem by adopting an actor-critic algorithm. We propose an actor and critic network to solve the problem with an actor-critic algorithm. Also, we suggest a reward configuration based on an edge evaluation network and measure to prevent the reversal between high and low thresholds. The edge evaluation network uses an original image and an edge image as input. We set a negative reward when reversing the high and low thresholds occur. The proposed algorithm can adapt to unseen environments using images without requiring ground truth labels. Experimental results using diverse datasets show the feasibility of the proposed algorithm.

关键词： actor-critic algorithm edge detection deep reinforcement learning deep learning

来源：评论

学校读者我要写书评

暂无评论

A novel semi-supervised generative adversarial network based on the actor-critic algorithm for compound fault recognition

引用

NEURAL COMPUTING & APPLICATIONS 2022年第13期34卷 10787-10805页

作者： Wang, Zisheng Xuan, Jianping Shi, Tielin Huazhong Univ Sci & Technol Sch Mech Sci & Engn Wuhan 430074 Peoples R China

Vibration signals can be used to extract effective fault features for fault diagnosis. However, traditional supervised learning requires considerable manpower and time to mark samples manually, and this process is difficult to apply to practical fault diagnosis. Deep reinforcement learning which combines the perception ability of deep learning with the decision-making ability of reinforcement learning, can independently extract hidden fault features and effectively improve the accuracy of fault diagnosis. Semi-supervised learning can reduce the proportion of labeled samples to decrease the learning cost while improving the recognition accuracy with unlabeled samples. In this study, we propose a novel semi-supervised deep reinforcement learning method. A semi-supervised generative adversarial network combined with the improved actor-critic algorithm is proposed to perform fault diagnosis when the labeled sample size is small. In the experiment of rolling bearing fault and engineering application, three-channel time-frequency graphs extracted from raw signals with the wavelet packet are compressed into single channel gray graphs. Then, to simulate the less labeled sample dataset, 2%, 5%, 20%, 50% and 100% sample labels are set by dislodging partial label from the processing sample. The results of the proposed method and other intelligent methods are listed to demonstrate that the proposed method could provide better performance over other methods even if the size of labeled sample is small in compound fault diagnosis.

关键词： Fault recognition Semi-supervised learning Deep reinforcement learning actor-critic algorithm Wavelet packet

来源：评论

学校读者我要写书评

暂无评论

Earth Observation Satellite Scheduling Based on actor-critic algorithm

Earth Observation Satellite Scheduling Based on Actor-Critic...

引用

2024 International Conference on Guidance, Navigation and Control

作者： Chen, Chao Wang, ZhiTao Wang, Nuan He, Rong Zeng, Dexian Space Engn Univ Beijing 101400 Peoples R China

ISBN: (纸本)9789819622269;9789819622245;9789819622238

The earth observation satellite(EOS) scheduling has problems such as complex constraints and difficulty in solving. For multi-orbit scheduling problem of the single EOS, this paper first describes the EOS scheduling process., and then Markov decision-making process is adopted to model the satellite scheduling process. The actor-critic(AC) algorithm is used to construct and train the scheduling decision-making model and the multistep estimation is used to calculate the advantage function. The experiment results demonstrate the satisfying efficiency, effectiveness and generalization ability of the suggested approach.

关键词： Earth Observation Satellite Multi-orbit Satellite Scheduling actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Evaluating Correctness of Reinforcement Learning based on actor-critic algorithm 13

Evaluating Correctness of Reinforcement Learning based on Ac...

引用

13th International Conference on Ubiquitous and Future Networks (ICUFN)

作者： Kim, Youngjae Hussain, Manzoor Suh, Jae-Won Hong, Jang-Eui Chungbuk Natl Univ Coll Elect & Comp Engn Cheongju South Korea

ISBN: (纸本)9781665485500

Deep learning is used for decision making and functional control in various fields, such as autonomous systems. However, rather than being developed by logical design, deep learning models are trained by itself through learning data. Moreover, only reward values are used to evaluate its performance, which does not provide enough information that the model learned properly. This paper proposes a new method to assess the correctness of reinforcement learning, considering other properties of the learning algorithm. The proposed method is applied for the evaluation of actorcritic algorithms, and correctness-related insights of the algorithm are confirmed through experiments.

关键词： reinforcement learning actor-critic algorithm safety-critical system quality evaluation correctness

来源：评论

学校读者我要写书评

暂无评论

Intelligent fault recognition framework by using deep reinforcement learning with one dimension convolution and improved actor-critic algorithm

引用

ADVANCED ENGINEERING INFORMATICS 2021年 49卷

作者： Wang, Zisheng Xuan, Jianping Huazhong Univ Sci & Technol Sch Mech Sci & Engn Wuhan 430074 Peoples R China

The quality of fault recognition part is one of the key factors affecting the efficiency of intelligent manufacturing. Many excellent achievements in deep learning (DL) have been realized recently as methods of fault recognition. However, DL models have inherent shortcomings. In particular, the phenomenon of over-fitting or degradation suggests that such an intelligent algorithm cannot fully use its feature perception ability. Researchers have mainly adapted the network architecture for fault diagnosis, but the above limitations are not taken into account. In this study, we propose a novel deep reinforcement learning method that combines the perception of DL with the decision-making ability of reinforcement learning. This method enhances the classification accuracy of the DL module to autonomously learn much more knowledge hidden in raw data. The proposed method based on the convolutional neural network (CNN) also adopts an improved actor-critic algorithm for fault recognition. The important parts in standard actor-critic algorithm, such as environment, neural network, reward, and loss functions, have been fully considered in improved actor-critic algorithm. Additionally, to fully distinguish compound faults under heavy background noise, multi-channel signals are first stacked synchronously and then input into the model in the end-to-end training mode. The diagnostic results on the compound fault of the bearing and tool in the machine tool experimental system show that compared with other methods, the proposed network structure has more accurate results. These findings demonstrate that under the guidance of the improved actor-critic algorithm and processing method for multi-channel data, the proposed method thus has stronger exploration performance.

关键词： Fault recognition Deep reinforcement learning actor-critic algorithm 1D convolution

来源：评论

学校读者我要写书评

暂无评论

Episodic Memory-Double actor-critic Twin Delayed Deep Deterministic Policy Gradient

引用

NEURAL NETWORKS 2025年 187卷 107286页

作者： Shu, Man Lu, Shuai Gong, Xiaoyu An, Daolong Li, Songlin Jilin Univ Key Lab Symbol Computat & Knowledge Engn Minist Educ Changchun 130012 Peoples R China Chinese Acad Sci Changchun Inst Opt Fine Mech & Phys Changchun 130033 Peoples R China Jilin Univ Coll Comp Sci & Technol Changchun 130012 Peoples R China Jilin Univ Coll Software Changchun 130012 Peoples R China

Existing deep reinforcement learning (DRL) algorithms suffer from the problem of low sample efficiency. Episodic memory allows DRL algorithms to remember and use past experiences with high return, thereby improving sample efficiency. However, due to the high dimensionality of the state-action space in continuous action tasks, previous methods in continuous action tasks often only utilize the information stored in episodic memory, rather than directly employing episodic memory for action selection as done in discrete action tasks. We suppose that episodic memory retains the potential to guide action selection in continuous control tasks. Our objective is to enhance sample efficiency by leveraging episodic memory for action selection in such tasks-either reducing the number of training steps required to achieve comparable performance or enabling the agent to obtain higher rewards within the same number of training steps. To this end, we propose an "Episodic Memory-Double actor-critic (EMDAC)"framework, which can use episodic memory for action selection in continuous action tasks. The critics and episodic memory evaluate the value of state-action pairs selected by the two actors to determine the final action. Meanwhile, we design an episodic memory based on a Kalman filter optimizer, which updates using the episodic rewards of collected state-action pairs. The Kalman filter optimizer assigns different weights to experiences collected at different time periods during the memory update process. In our episodic memory, state-action pair clusters are used as indices, recording both the occurrence frequency of these clusters and the value estimates for the corresponding state-action pairs. This enables the estimation of the value of state-action pair clusters by querying the episodic memory. After that, we design intrinsic reward based on the novelty of state-action pairs with episodic memory, defined by the occurrence frequency of state-action pair clusters, to enhance the

关键词： Deep reinforcement learning Sample efficiency actor-critic algorithm Episodic memory

来源：评论

学校读者我要写书评

暂无评论

Energy-Efficient Content Fetching Strategies in Cache-Enabled D2D Networks via an actor-critic Reinforcement Learning Structure

引用

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 2024年第11期73卷 17485-17495页

作者： Yan, Ming Luo, Meiqi Chan, Chien Aun Gygax, Andre F. Li, Chunguo Chih-Lin, I Commun Univ China Sch Informat & Commun Engn Beijing 100024 Peoples R China Commun Univ China Key Lab Acoust Visual Technol & Intelligent Contro Beijing 100024 Peoples R China Univ Melbourne Dept Elect & Elect Engn Melbourne Vic 3010 Australia Univ Melbourne Fac Business & Econ Melbourne Vic 3010 Australia Southeast Univ Sch Informat Sci & Engn Nanjing 210096 Peoples R China China Mobile Res Inst Beijing 100053 Peoples R China

As one of the important complementary technologies of the fifth-generation (5G) wireless communication and beyond, mobile device-to-device (D2D) edge caching and computing can effectively reduce the pressure on backbone networks and improve the user experience. Specific content can be pre-cached on the user devices based on personalized content placement strategies, and the cached content can be fetched by neighboring devices in the same D2D network. However, when multiple devices simultaneously fetch content from the same device, collisions will occur and reduce communication efficiency. In this paper, we design the content fetching strategies based on an actor-critic deep reinforcement learning (DRL) architecture, which can adjust the content fetching collision rate to adapt to different application scenarios. First, the optimization problem is formulated with the goal of minimizing the collision rate to improve the throughput, and a general actor-critic DRL algorithm is used to improve the content fetching strategy. Second, by optimizing the network architecture and reward function, the two-level actor-critic algorithm is improved to effectively manage the collision rate and transmission power. Furthermore, to balance the conflict between the collision rate and device energy consumption, the related reward values are weighted in the reward function to optimize the energy efficiency. The simulation results show that the content fetching collision rate based on the improved two-level actor-critic algorithm decreases significantly compared with that of the baseline algorithms, and the network energy consumption can be optimized by adjusting the weight factors.

关键词： Device-to-device communication Resource management Energy consumption Base stations Optimization Delays Throughput Device-to-device (D2D) networks actor-critic algorithm deep reinforcement learning (DRL) edge caching fetching strategy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：