检索结果-内蒙古大学图书馆

Proactive Content Caching Based on actor-critic Reinforcement Learning for Mobile Edge Networks

IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 2022年第2期8卷 1239-1252页

作者： Jiang, Wei Feng, Daquan Sun, Yao Feng, Gang Wang, Zhenzhong Xia, Xiang-Gen Shenzhen Univ Guangdong Prov Engn Lab Digital Creat Technol Shenzhen Key Lab Digital Creat Technol Coll Elect & Informat EngnGuangdong Key Lab Inte Shenzhen 518060 Peoples R China Univ Glasgow James Watt Sch Engn Glasgow G12 8QQ Lanark Scotland Univ Elect Sci & Technol China Yangtze Delta Reg Inst Huzhou Huzhou 313001 Scotland Univ Elect Sci & Technol China Natl Key Lab Sci & Technol Commun Chengdu 611731 Peoples R China Tech Management Ctr China Media Grp Beijing 100020 Peoples R China Univ Delaware Dept Elect & Comp Engn Newark DE 19716 USA

Mobile edge caching/computing (MEC) has emerged as a promising approach for addressing the drastic increasing mobile data traffic by bringing high caching and computing capabilities to the edge of networks. Under MEC architecture, content providers (CPs) are allowed to lease some virtual machines (VMs) at MEC servers to proactively cache popular contents for improving users' quality of experience. The scalable cache resource model rises the challenge for determining the ideal number of leased VMs for CPs to obtain the minimum expected downloading delay of users at the lowest caching cost. To address these challenges, in this paper, we propose an actor-critic (AC) reinforcement learning based proactive caching policy for mobile edge networks without the prior knowledge of users' content demand. Specifically, we formulate the proactive caching problem under dynamical users' content demand as a Markov decision process and propose a AC based caching algorithm to minimize the caching cost and the expected downloading delay. Particularly, to reduce the computational complexity, a branching neural network is employed to approximate the policy function in the actor part. Numerical results show that the proposed caching algorithm can significantly reduce the total cost and the average downloading delay when compared with other popular algorithms.

关键词： actor-critic algorithm branching neural network reinforcement learning mobile edge caching

来源：评论

学校读者我要写书评

暂无评论

Online actor-critic Reinforcement Learning Control for Uncertain Surface Vessel Systems with External Disturbances

引用

INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS 2022年第3期20卷 1029-1040页

作者： Vu, Van Tu Tran, Quang Huy Pham, Thanh Loc Dao, Phuong Nam Hai Phong Univ Hai Phong Vietnam Natl Cheng Kung Uninvers NCKU Dept Mech Engn Tainan Taiwan Hanoi Univ Sci & Univ Sch Elect Engn 01 Dai Co Viet Hanoi Vietnam

This article addresses a trajectory tracking control approach for an uncertain surface vessel using the new cascade structure of adaptive reinforcement learning (ARL) algorithm and kinematic controller, feed-forward term. Since a surface vessel is decoupled by kinematic sub-system and dynamic sub-system, the cascade control system is an ideal method for obtaining the tracking problem. In the proposed control structure, the dynamic control loop is designed to be the optimized method of the corresponding dynamic sub-system and the kinematic control loop is implemented by a nonlinear controller combining with feed-forward term. The online actor-critic architecture is considered in ARL algorithm to overcome the challenge of solving the Hamilton-Jacobi-Bellman (HJB) equation. Additionally, the proposed controller is able to handle the difficulty of the non-autonomous optimal control problem by designing the ARL technique for the corresponding system with a small number of state variables. Based on theoretical analysis, the ARL based control design has been made to guarantee the uniformly ultimately bounded (UUB) stability of the closed system. Finally, the simulation results are illustrated to verify the effectiveness of the proposed control scheme.

关键词： actor-critic algorithm adaptive reinforcement learning (ARL) control cascade control systems surface vessel (SV) systems trajectory tracking control

来源：评论

学校读者我要写书评

暂无评论

Improving Exploration in actor-critic With Weakly Pessimistic Value Estimation and Optimistic Policy Optimization

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024年第7期35卷 8783-8796页

作者： Li, Fan Fu, Mingsheng Chen, Wenyu Zhang, Fan Zhang, Haixian Qu, Hong Yi, Zhang Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 611731 Peoples R China Sichuan Univ Sch Comp Sci Chengdu 610065 Peoples R China

Deep off-policy actor-critic algorithms have been successfully applied to challenging tasks in continuous control. However, these methods typically suffer from the poor sample efficiency problem, limiting their widespread adoption in real-world domains. To mitigate this issue, we propose a novel actor-critic algorithm with weakly pessimistic value estimation and optimistic policy optimization (WPVOP) for continuous control. WPVOP integrates two key ingredients: 1) a weakly pessimistic value estimation, which compensates the pessimism of lower confidence bound in conventional value function (i.e., clipped double Q-learning) to trigger exploration in low-value state-action regions and 2) an optimistic policy optimization algorithm by sampling actions that could benefit the policy learning most toward optimal Q-values for efficient exploration. We theoretically analyze that the proposed weakly pessimistic value estimation method is lower and upper bounded, and empirically show that it could avoid extremely over-optimistic value estimates. We show that these two ideas are largely complementary, and can be fruitfully integrated to improve performance and promote sample efficiency of exploration. We evaluate WPVOP on the suite of continuous control tasks from MuJoCo, achieving state-of-the-art sample efficiency and performance.

关键词： actor-critic algorithm continuous control reinforcement learning (RL) sample efficiency

来源：评论

学校读者我要写书评

暂无评论

An actor-critic reinforcement learning-based resource management in mobile edge computing systems

引用

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 2020年第8期11卷 1875-1889页

作者： Fu, Fang Zhang, Zhicai Yu, Fei Richard Yan, Qiao Shanxi Univ Sch Phys & Elect Taiyuan Peoples R China Carleton Univ Coll Syst & Comp Engn Ottawa ON Canada Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen Peoples R China

Reinforcement learning (RL) as an effective tool has attracted great attention in wireless communication field nowadays. In this paper, we investigate the offloading decision and resource allocation problem in mobile edge computing (MEC) systems based on RL methods. Different from existing literature, our research focuses on improving mobile operators' revenue by maximizing the amount of the offloaded tasks while decreasing the energy expenditure and time-delays. Considering the dynamic characteristics of wireless environment, the above problem is modeled as a Markov decision process (MDP). Since the action space of the MDP is multidimensional continuous variables mixed with discrete variables, traditional RL algorithms are powerless. Therefore, an actor-critic (AC) with eligibility traces algorithm is proposed to resolve the problem. The actor part introduces the parameterized normal distribution to generate the probabilities of continuous stochastic actions, and the critic part employs a linear approximator to estimate the value of states, based on which the actor part updates policy parameters in the direction of performance improvement. Furthermore, an advantage function is designed to reduce the variance of the learning process. Simulation results indicate that the proposed algorithm can find the best strategy to maximize the amount of the tasks executed by the MEC server while decreasing the energy consumption and time-delays.

关键词： Reinforcement learning actor-critic algorithm Eligibility traces Mobile edge computing Resource allocation

来源：评论

学校读者我要写书评

暂无评论

QoS Aware Transcoding for Live Streaming in Edge-Clouds Aided HetNets: An Enhanced actor-critic Approach

引用

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 2019年第11期68卷 11295-11308页

作者： Zhang, Zhicai Wang, Ru Yu, F. Richard Fu, Fang Yan, Qiao Shanxi Univ Sch Phys & Elect Engn Taiyuan 030006 Shanxi Peoples R China Carleton Univ Dept Syst & Comp Engn Ottawa ON K1S 5B6 Canada Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen 518060 Guangdong Peoples R China

With the advances in hand-held devices (smartphones and tablets, etc.) and high speed wireless networks, users have an explosive growth demand for live streaming service. Due to the diversity of user equipments (UEs), the live streaming has to be transcoded into different versions. However, transcoding is a computationally expensive and time consuming process. Due to the dynamic characteristics of wireless network environment, providing high quality and low latency live videos for UEs is a big challenge. This study investigates user scheduling, transcoding decision, computational and wireless spectrum resources allocation problem in edge-clouds aided heterogeneous networks (HetNets). The research focuses on improving UEs' quality of service (QoS) for live-streaming services, which includes quality of video and latency requirements. Different from existing literature, to approach the real wireless environment, the available computational and wireless spectrum resources are modeled as random processes in the work. Considering dynamic characteristics of wireless networks and available resources, the above problem is modeled as a Markov decision process (MDP). Since the action space of the MDP is multi-dimensional continuous variables mixed with discrete variables, it is difficult to solve this problem by traditional learning algorithms. Therefore, an enhanced actor-critic algorithm is proposed to resolve the problem, in which both the actor part and the critic part employ eligibility traces. Extensive simulation results with different system parameters show the effectiveness of the proposed algorithm.

关键词： Live-streaming quality of service edge computing resource allocation actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Fault Diagnosis for Gas Turbine Rotor Using actor-critic Network

Fault Diagnosis for Gas Turbine Rotor Using Actor-Critic Net...

引用

International Conference of The Efficiency and Performance Engineering Network (TEPEN)

作者： Cui, Yingjie Wang, Hongjun Beijing Informat Sci & Technol Univ Sch Mech & Elect Engn Beijing 100192 Peoples R China Beijing Int Sci Cooperat Base High End Equipment Beijing 100192 Peoples R China Minist Educ Key Lab Modern Measurement & Control Technol Beijing 100192 Peoples R China

ISBN: (纸本)9783031261954;9783031261930;9783031261923

As a key component of gas turbine, gas turbine rotor often operates under high speed and variable working conditions, which is extremely prone to failure. Aiming at the problem of low fault diagnosis accuracy of gas turbine rotor under variable working conditions, a deep reinforcement learning method based on actor and critic is adopted. The actor network is responsible for inputting fault state data, fault type action and updated error signal. The critic network is responsible for inputting current fault state and rewarding the next fault state. The time difference method is used to update the output of network parameters and output reward information and diagnosis results. The diagnostic ability of three kinds of gas turbine rotor test bench data is tested, and compared with other methods, the actor-critic model has good fault diagnosis ability and domain adaptation ability in the variable condition experiment, which can solve the actual variable condition fault diagnosis ability of gas turbine to a certain extent.

关键词： Gas turbine rotor Deep reinforcement learning actor-critic algorithm Fault diagnosis

来源：评论

学校读者我要写书评

暂无评论

Manipulator Motion Planning based on actor-critic Reinforcement Learning 40

Manipulator Motion Planning based on Actor-Critic Reinforcem...

引用

40th Chinese Control Conference (CCC)

作者： Li, Qiang Nie, Jun Wang, Haixia Lu, Xiao Song, Shibin Shandong Univ Sci & Technol Coll Elect Engn & Automat Qingdao 266590 Peoples R China

ISBN: (纸本)9789881563804

The manipulator control model has the characteristics of high-order, nonlinear, multivariable and strong coupling, which makes it difficult for the manipulator to have good adaptability and autonomy. Aiming at the problem of poor reusability and poor autonomy of manipulator applications, a motion planning algorithm based on reinforcement learning is proposed. In this paper, the reinforcement learning continuous control algorithm actor-critic is applied to the motion planning of the manipulator to increase the environmental applicability and autonomy of the manipulator, and realize the intelligent control of the manipulator under simple kinematics modeling. At first, the simulation environment of the hand-eye system of the manipulator is constructed, then the reinforcement learning algorithm model is established according to the simulation environment, and finally, the motion planning training of the manipulator is completed in the simulation environment. Experimental results demonstrate that the proposed manipulator motion planning algorithm based on actor-critic reinforcement learning has good environmental adaptability and stability.

关键词： Reinforcement Learning actor-critic algorithm Manipulator Motion Planning Value evaluation Policy evaluation

来源：评论

学校读者我要写书评

暂无评论

A Fast Decentralized Scheduling Method of Cooperative Localization Based on actor-critic Deep Reinforcement 3

A Fast Decentralized Scheduling Method of Cooperative Locali...

引用

3rd International Conference on Information Communication and Software Engineering (ICICSE)

作者： Di, Xinyue Guan, Yalin Yu, Weijia Lin, Heyun Commun Univ China Beijing Peoples R China Guangxi Power Grid Dispatching Control Ctr Nanning Peoples R China

ISBN: (纸本)9798350313048

With the emergence of more and more automated vehicles, localization of vehicles has attracted a lot of attention. Among multiple localization methods, cooperative localization is very attractive due to its high coverage and accuracy. However, it is costly and leads to larger delays to exhaustively measure and exchange information between all adjacent vehicles. Thus, it is a challenge to schedule the transmissions for the cooperative localization. In this paper, we describe the cooperative localization as a partially observable Markov process and propose an actorcritic deep reinforcement learning algorithm to bring the vehicles to a given localization accuracy threshold as quickly as possible. The proposed algorithm allows the transmissions to be optimally scheduled in a distributed manner. Simulation results show that, compared with random, greedy, and two existing deep reinforcement learning algorithms, the proposed algorithm has better performance and is more adaptable to large-scale complex networks.

关键词： vehicular localization cooperative localization scheduling problem deep reinforcement learning actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

actor-critic Based Graphical Games for Discrete-time Linear Systems with Input Constraints 39

Actor-critic Based Graphical Games for Discrete-time Linear ...

引用

39th Chinese Control Conference (CCC)

作者： Wang, Tian-Xiang Liang, Yong Wang, Bing-Chang Shandong Univ Sch Control Sci & Engn Jinan Shandong Peoples R China

ISBN: (纸本)9789881563903

In dynamic graphical games, in order to obtain the optimal strategy for each agent, the traditional method is to solve a set of coupled HJB equations. It is very difficult to solve such problems by traditional methods, especially the input of each agent is constrained. actor-critic is a reinforcement learning method that can solve such problems through online iteration. This paper proposes an online iterative algorithm for solving linear discrete-time systems graphics games with input constraints, and this algorithm without the need for drift dynamics of agents. Each agent needs two neural networks to fit the agent's value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.

关键词： actor-critic algorithm Differential Games Input Constraints Neural Network (NN) Reinforcement Learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Temporal Detection of Anomalies via actor-critic Based Controlled Sensing

Temporal Detection of Anomalies via Actor-Critic Based Contr...

引用

IEEE Global Communications Conference (GLOBECOM)

作者： Joseph, Geethu Gursoy, M. Cenk Varshney, Pramod K. Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA

ISBN: (纸本)9781728181042

We address the problem of monitoring a set of binary stochastic processes and generating an alert when the number of anomalies among them exceeds a threshold. For this, the decision-maker selects and probes a subset of the processes to obtain noisy estimates of their states (normal or anomalous). Based on the received observations, the decision-maker first determines whether to declare that the number of anomalies has exceeded the threshold or to continue taking observations. When the decision is to continue, it then decides whether to collect observations at the next time instant or defer it to a later time. If it chooses to collect observations, it further determines the subset of processes to be probed. To devise this three-step sequential decision-making process, we use a Bayesian formulation wherein we learn the posterior probability on the states of the processes. Using the posterior probability, we construct a Markov decision process and solve it using deep actor-critic reinforcement learning. Via numerical experiments, we demonstrate the superior performance of our algorithm compared to the traditional model-based algorithms.

关键词： Active hypothesis testing anomaly detection change-point detection deep reinforcement learning actor-critic algorithm dynamic decision-making sequential sensing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：