检索结果-内蒙古大学图书馆

Episodic Memory-Double actor-critic Twin Delayed Deep Deterministic Policy Gradient

NEURAL NETWORKS 2025年 187卷 107286页

作者： Shu, Man Lu, Shuai Gong, Xiaoyu An, Daolong Li, Songlin Jilin Univ Key Lab Symbol Computat & Knowledge Engn Minist Educ Changchun 130012 Peoples R China Chinese Acad Sci Changchun Inst Opt Fine Mech & Phys Changchun 130033 Peoples R China Jilin Univ Coll Comp Sci & Technol Changchun 130012 Peoples R China Jilin Univ Coll Software Changchun 130012 Peoples R China

Existing deep reinforcement learning (DRL) algorithms suffer from the problem of low sample efficiency. Episodic memory allows DRL algorithms to remember and use past experiences with high return, thereby improving sample efficiency. However, due to the high dimensionality of the state-action space in continuous action tasks, previous methods in continuous action tasks often only utilize the information stored in episodic memory, rather than directly employing episodic memory for action selection as done in discrete action tasks. We suppose that episodic memory retains the potential to guide action selection in continuous control tasks. Our objective is to enhance sample efficiency by leveraging episodic memory for action selection in such tasks-either reducing the number of training steps required to achieve comparable performance or enabling the agent to obtain higher rewards within the same number of training steps. To this end, we propose an "Episodic Memory-Double actor-critic (EMDAC)"framework, which can use episodic memory for action selection in continuous action tasks. The critics and episodic memory evaluate the value of state-action pairs selected by the two actors to determine the final action. Meanwhile, we design an episodic memory based on a Kalman filter optimizer, which updates using the episodic rewards of collected state-action pairs. The Kalman filter optimizer assigns different weights to experiences collected at different time periods during the memory update process. In our episodic memory, state-action pair clusters are used as indices, recording both the occurrence frequency of these clusters and the value estimates for the corresponding state-action pairs. This enables the estimation of the value of state-action pair clusters by querying the episodic memory. After that, we design intrinsic reward based on the novelty of state-action pairs with episodic memory, defined by the occurrence frequency of state-action pair clusters, to enhance the

关键词： Deep reinforcement learning Sample efficiency actor-critic algorithm Episodic memory

来源：评论

学校读者我要写书评

暂无评论

Rapid Solution for Flexible Pickup and Delivery Services Problem Based on Improved actor-critic Deep Reinforcement Learning

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2025年第6期26卷 7640-7654页

作者： Tian, Ran Sun, Zhihui Chang, Longlong Wu, Jiarui Lu, Xin Northwest Normal Univ Sch Comp Sci & Engn Lanzhou 730071 Peoples R China

The problem of the Flexible Pickup and Delivery Services Problem (FPDSP) arises from the actual needs of multi-warehouse management strategies and is one of the key challenges in the current urban distribution logistics industry. The problem aims to quickly calculate the route planning in complex scenarios to ensure that the total traveling time of the vehicle is minimized while meeting the time window requirements. To address this problem, we propose a deep reinforcement learning method based on the actor-critic algorithm to quickly calculate the approximate optimal solution of FPDSP. Specifically, we propose a Transformer Model with Parallel Encoders (TMPE). The model efficiently extracts order features through parallel encoders and then uses serial decoders to optimize the fusion of feature information to optimize the order selection process. In addition, we designed a reward function to reduce the number of repeated pickups made by the vehicle at the same consignor's location between different orders, thereby effectively reducing the vehicle's total travel time. Experimental results show that our method can quickly find feasible solutions to the problem compared with heuristic methods on seven different datasets. At the same time, compared with all baseline methods, the number of optimal solutions of our method reaches 14, which significantly improves the problem-solving ability. This result provides a new solution for optimizing pickup and delivery logistics in multiple warehouses in cities in the future.

关键词： Logistics Heuristic algorithms Deep reinforcement learning Vehicle routing Feature extraction Computational modeling Transformers Decoding Urban areas Transportation Flexible pickup and delivery services actor-critic algorithm deep reinforcement learning transformer

来源：评论

学校读者我要写书评

暂无评论

Intermittent Dynamic Event-Triggered Optimal Control for Networked Control Systems With Input Saturation

引用

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 2025年第6期35卷 1935-1949页

作者： Zhang, Cong Zhang, Xiaodan Xiao, Feng Wei, Bo North China Elect Power Univ State Key Lab Alternate Elect Power Syst Renewable Beijing Peoples R China North China Elect Power Univ Sch Control & Comp Engn Beijing Peoples R China

In this article, we explore an event-triggered optimal control problem for nonlinear networked control systems (NCSs) with input saturation and aperiodic intermittent control. First, a non-quadratic cost function with the property of intermittent control is formulated, and a Hamilton-Jacobi-Bellman (HJB) equation is designed based on the given cost function to acquire optimal control inputs. To avoid continuous-time communication in networks, a novel aperiodically intermittent dynamic event-triggered (AIDET) control scheme, integrating a dynamic event-triggered control scheme and an aperiodic intermittent control scheme, is proposed in this article. A piecewise continuous internal dynamic variable is introduced in the event-triggering condition, which is more conducive to increasing inter-event times than static event-triggering schemes. Furthermore, the event-triggering condition designed in this article is proven strictly to exclude the Zeno behavior. Moreover, due to the difficulty of directly solving the HJB equation, an actor-critic algorithm in the AIDET scheme is proposed to approximate the optimal control inputs. The approximation errors of weight vectors are proved to be uniformly ultimately bounded. The stability of the considered systems in the proposed AIDET control scheme is analyzed using the Lyapunov theory. Finally, some simulation examples are given to illustrate the effectiveness of the proposed actor-critic algorithm-based AIDET control scheme.

关键词： actor-critic algorithm data-driven optimal control input saturation intermittent control networked control systems

来源：评论

学校读者我要写书评

暂无评论

Event-triggered fractional-order fuzzy sliding mode control using online reinforcement learning for uncertain nonlinear systems: Practical validation

引用

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2025年 151卷

作者： Mahmoud, Tarek A. El-Hossainy, Mohammad Abo-Zalam, Belal Shalaby, Raafat Menoufia Univ Fac Elect Engn Dept Ind Elect & Control Engn Menoufia 32952 Egypt Nile Univ SESC Res Ctr Sch Engn & Appl Sci MECT Program Giza 12588 Egypt

In this paper, a novel event-triggered control strategy is proposed for uncertain nonlinear systems by developing a fractional-order fuzzy sliding mode controller based on a fractional-order actor-critic network. The proposed approach offers several key features. First, a sigma-point Kalman filter is employed to accurately estimate unmeasured states. Second, a fractional-order sliding mode controller with an event-triggered mechanism is designed to achieve practical sliding mode control while preventing the Zeno phenomenon. Third, to reduce chattering in sliding mode control, a fractional-order actor-critic recurrent neural network is proposed, effectively approximating the switching control stage and enhancing system performance while reducing event triggers. The fractional-order actor-critic network incorporates fuzzy rules defined by a generalized Gaussian function with the Mittag-Leffler function, and a critic network approximates the value function, further enhancing performance. Parameter learning is guided by a fractional-order Gauss-Newton method. Stability analysis is performed using the Lyapunov method. Finally, the efficacy of the proposed method is demonstrated via experimental validation on a real inverted pendulum system.

关键词： actor-critic algorithm Event-triggered fractional-order fuzzy sliding mode controller Mittag-Leffler function Real inverted pendulum system Takagi-Sugeno-Kang fuzzy system

来源：评论

学校读者我要写书评

暂无评论

A constrained optimization perspective on actor-critic algorithms and application to network routing

引用

SYSTEMS & CONTROL LETTERS 2016年 92卷 46-51页

作者： Prashanth, L. A. Prasad, H. L. Bhatnagar, Shalabh Chandra, Prakash Univ Maryland Syst Res Inst College Pk MD 20742 USA Astrome Technol Pvt Ltd Bangalore Karnataka India Indian Inst Sci Dept Comp Sci & Automat Bangalore 560012 Karnataka India Indian Inst Sci Syst Sci & Automat Bangalore 560012 Karnataka India

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application. (C) 2016 Elsevier B.V. All rights reserved.

关键词： actor-critic algorithm Reinforcement learning Constrained optimization

来源：评论

学校读者我要写书评

暂无评论

A Novel Model of Generative Automatic Text Summarization Based on BART

IAENG International Journal of Computer Science

引用

IAENG International Journal of Computer Science 2025年第2期52卷 507-514页

作者： Wang, Yahui Chang, Qingxia Meng, Xuelei Foreign Languages School Lanzhou Jiaotong University Gansu Lanzhou730070 China Traffic and Transportation School Lanzhou Jiaotong University Gansu Lanzhou730070 China

To obtain useful information accurately and quickly from the massive text information is the most urgent need for people nowadays. The text automatic summarization technology summarizes and condenses the given source text information and generates short texts that are concise, fluent and retain key information. This paper proposes a novel automatic summarization model - Automatic Summarization Model based on BART and actor - critic algorithm (BART-ACA-AST) to realize the efficient text summarization processing. The ROUGE metric system is used to evaluate the similarity and correlation between the mechanically generated text summaries and the reference summaries to assess the quality of the listed models. BERTScore is used to appraise the semantic resemblance between the rewritten summary and the reference summary more concisely. The computing results demonstrate the excellent performance of the model. The method proposed in this article can serve as a reference for the Automatic text summarization work. © (2025), (International Association of Engineers). All rights reserved.

关键词： actor-critic algorithm automatic text summarization BART BERTScore ROUGE

来源：评论

学校读者我要写书评

暂无评论

Graph attention, learning 2-opt algorithm for the traveling salesman problem

引用

COMPLEX & INTELLIGENT SYSTEMS 2025年第1期11卷 1-21页

作者： Luo, Jia Heng, Herui Wu, Geng Ningbo Univ Technol Sch Econ & Management Ningbo 315211 Peoples R China Shanghai Maritime Univ Inst Logist Sci & Engn Shanghai 201306 Peoples R China

In recent years, deep graph neural networks (GNNs) have been used as solvers or helper functions for the traveling salesman problem (TSP), but they are usually used as encoders to generate static node representations for downstream tasks and are incapable of obtaining the dynamic permutational information in completely updating solutions. For addressing this problem, we propose a permutational encoding graph attention encoder and attention-based decoder (PEG2A) model for the TSP that is trained by the advantage actor-critic algorithm. In this work, the permutational encoding graph attention (PEGAT) network is designed to encode node embeddings for gathering information from neighbors and obtaining the dynamic graph permutational information simultaneously. The attention-based decoder is tailored to compute probability distributions over picking pair nodes for 2-opt moves. The experimental results show that our method outperforms the compared learning-based algorithms and traditional heuristic methods.

关键词： TSP 2-opt Permutational encoding Graph attention network Attention-based mechanism actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimizing Non-Terrestrial Hybrid RF/FSO Links With Reinforcement Learning: Navigating Through Clouds

IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY

引用

IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY 2025年 6卷 793-806页

作者： Almohamad, Abdullateef Ibrahim, Mostafa Ekin, Sabit Hasna, Mazen Althunibat, Saud Qaraqe, Khalid Texas A&M Univ Coll Stn Dept Elect & Comp Engn College Stn TX 77843 USA Texas A&M Univ Coll Stn Dept Engn Technol & Ind Distribut College Stn TX 77843 USA Qatar Univ Elect Engn Dept Doha Qatar Al Hussein Bin Talal Univ Dept Commun Engn Maan Jordan Hamad Bin Khalifa Univ Coll Sci & Engn Doha Qatar

In the pursuit of ubiquitous broadband connectivity, there has been a significant shift towards the vertical expansion of communication networks into space, particularly through the exploitation of low Earth orbit (LEO) satellite constellations, which are favored for their relatively low latency. However, this approach faces many challenges that need to be addressed, including atmospheric turbulence, high path loss, and dynamic cloud formations. High-altitude pseudo-satellites (HAPS) have emerged as promising relaying layers between LEO satellites and ground stations, enhancing coverage, latency, and direct terrestrial user connectivity. While radio frequency (RF) bands suffer from congestion and limited bandwidth, free space optical (FSO) communications offer higher data rates, but are susceptible to misalignment and weather-induced signal degradation. To address these challenges, a hybrid RF/FSO approach has been proposed to take advantage of both technologies by dynamic switching between RF and FSO based on propagation channel conditions. This paper introduces a reinforcement learning-based algorithm designed to optimize the trajectory of HAPS, maneuver around cloudy areas, and seamlessly switch between the RF and FSO communication modes to maximize the achievable capacity. The proposed approach aims to maximize system performance by intelligently adapting to environmental conditions and offering a promising solution for next-generation space communication networks.

关键词： Radio frequency Clouds Meteorology Low earth orbit satellites Autonomous aerial vehicles Trajectory optimization Reliability Communication systems Vehicle dynamics Satellite broadcasting actor-critic algorithm high altitude pseudo satellite (HAPS) hybrid RF/FSO optical communications reinforcement learning satellite communications

来源：评论

学校读者我要写书评

暂无评论

Energy-Efficient Content Fetching Strategies in Cache-Enabled D2D Networks via an actor-critic Reinforcement Learning Structure

引用

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 2024年第11期73卷 17485-17495页

作者： Yan, Ming Luo, Meiqi Chan, Chien Aun Gygax, Andre F. Li, Chunguo Chih-Lin, I Commun Univ China Sch Informat & Commun Engn Beijing 100024 Peoples R China Commun Univ China Key Lab Acoust Visual Technol & Intelligent Contro Beijing 100024 Peoples R China Univ Melbourne Dept Elect & Elect Engn Melbourne Vic 3010 Australia Univ Melbourne Fac Business & Econ Melbourne Vic 3010 Australia Southeast Univ Sch Informat Sci & Engn Nanjing 210096 Peoples R China China Mobile Res Inst Beijing 100053 Peoples R China

As one of the important complementary technologies of the fifth-generation (5G) wireless communication and beyond, mobile device-to-device (D2D) edge caching and computing can effectively reduce the pressure on backbone networks and improve the user experience. Specific content can be pre-cached on the user devices based on personalized content placement strategies, and the cached content can be fetched by neighboring devices in the same D2D network. However, when multiple devices simultaneously fetch content from the same device, collisions will occur and reduce communication efficiency. In this paper, we design the content fetching strategies based on an actor-critic deep reinforcement learning (DRL) architecture, which can adjust the content fetching collision rate to adapt to different application scenarios. First, the optimization problem is formulated with the goal of minimizing the collision rate to improve the throughput, and a general actor-critic DRL algorithm is used to improve the content fetching strategy. Second, by optimizing the network architecture and reward function, the two-level actor-critic algorithm is improved to effectively manage the collision rate and transmission power. Furthermore, to balance the conflict between the collision rate and device energy consumption, the related reward values are weighted in the reward function to optimize the energy efficiency. The simulation results show that the content fetching collision rate based on the improved two-level actor-critic algorithm decreases significantly compared with that of the baseline algorithms, and the network energy consumption can be optimized by adjusting the weight factors.

关键词： Device-to-device communication Resource management Energy consumption Base stations Optimization Delays Throughput Device-to-device (D2D) networks actor-critic algorithm deep reinforcement learning (DRL) edge caching fetching strategy

来源：评论

学校读者我要写书评

暂无评论

actor-critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015年第1期26卷 140-151页

作者： Kiumarsi, Bahare Lewis, Frank L. Univ Texas Arlington UTA Res Inst Ft Worth TX 76118 USA

This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.

关键词： actor-critic algorithm discrete-time (DT) nonlinear optimal tracking input constraints neural network (NN) reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：