检索结果-内蒙古大学图书馆

reinforcement learning for adaptive Caching With dynamic Storage Pricing

ieee JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 2019年第10期37卷 2267-2281页

作者： Sadeghi, Alireza Sheikholeslami, Fatemeh Marques, Antonio G. Giannakis, Georgios B. Univ Minnesota Digital Technol Ctr Minneapolis MN 55455 USA Univ Minnesota Dept Elect & Comp Engn Minneapolis MN 55455 USA King Juan Carlos Univ Dept Signal Theory & Commun Madrid 28943 Spain

Small base stations (SBs) of fifth-generation (SG) cellular networks are envisioned to have storage devices to locally serve requests for reusable and popular contents by caching them at the edge of the network, close to the end users. The ultimate goal is to smartly utilize a limited storage capacity to serve locally contents that are frequently requested instead of fetching them from the cloud, contributing to a better overall network performance and service experience. To enable the SBs with efficient fetch-cache decision-making schemes operating in dynamic settings, this paper introduces simple but flexible generic time-varying fetching and caching costs, which are then used to formulate a constrained minimization of the aggregate cost across files and time. Since caching decisions per time slot influence the content availability in future slots, the novel formulation for optimal fetch-cache decisions falls into the class of dynamic programming. Under this generic formulation, first by considering stationary distributions for the costs as well as file popularities, an efficient reinforcement learning-based solver known as value iteration algorithm can be used to solve the emerging optimization problem. Later, it is shown that practical limitations on cache capacity can be handled using a particular instance of this generic dynamic pricing formulation. Under this setting, to provide a light-weight online solver for the corresponding optimization, the well-known reinforcement learning algorithm, Q-learning, is employed to find optimal fetch-cache decisions. Numerical tests corroborating the merits of the proposed approach wrap up the paper.

关键词： dynamic caching fetching dynamic programming value iteration Q-learning

来源：评论

学校读者我要写书评

暂无评论

Power Control for Wireless VBR Video Streaming: From Optimization to reinforcement learning

引用

ieee TRANSACTIONS ON COMMUNICATIONS 2019年第8期67卷 5629-5644页

作者： Ye, Chuang Gursoy, M. Cenk Velipasalar, Senem Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA

In this paper, we investigate the problem of power control for streaming variable bit rate (VBR) videos over wireless links. A system model involving a transmitter (e.g., a base station) that sends VBR video data to a receiver (e.g., a mobile user) equipped with a playout buffer is adopted, as used in dynamic adaptive streaming video applications. In this setting, we analyze power control policies considering the following two objectives: 1) the minimization of the transmit power consumption and 2) the minimization of the transmission completion time of the communication session. In order to play the video without interruptions, the power control policy should also satisfy the requirement in which the VBR video data is delivered to the mobile user without causing playout buffer underflow or overflows. A directional water-filling algorithm, which provides a simple and concise interpretation of the necessary optimality conditions, is identified as the optimal offline policy. Following this, two online policies are proposed for power control based on channel side information (CSI) prediction within a short time window. dynamic programming is employed to implement the optimal offline and the initial online power control policies that minimize the transmit power consumption in the communication session. Subsequently, reinforcement learning (RL)-based approach is employed for the second online power control policy. Through the simulation results, we show that the optimal offline power control policy that minimizes the overall power consumption leads to substantial energy savings compared with the strategy of minimizing the time duration of video streaming. We also demonstrate that the RL algorithm performs better than the dynamic programming-based online grouped water-filling (GWF) strategy unless the channel is highly correlated.

关键词： dynamic programming playout buffer underflow playout buffer overflow power control reinforcement learning variable bit rate (VBR) video video streaming

来源：评论

学校读者我要写书评

暂无评论

Output Feedback Q-learning Control for the Discrete-Time Linear Quadratic Regulator Problem

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2019年第5期30卷 1523-1536页

作者： Rizvi, Syed Ali Asad Lin, Zongli Univ Virginia Charles L Brown Dept Elect & Comp Engn Charlottesville VA 22904 USA

Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.

关键词： Approximate dynamic programming (ADP) linear quadratic regulation (LQR) output feedback Q-learning reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Novel Scheme for Congestion Control in Cellular Networks Using Deep reinforcement learning and Markov Decision Process Models

Novel Scheme for Congestion Control in Cellular Networks Usi...

引用

2020 International Conference in Mathematics, Computer Engineering and Computer Science, ICMCECS 2020

作者： Arinze, Uchechukwu Bakpo, Francis Eneh, Agozie Longe, Olumide Department of Computer Science Enugu Nigeria Department of Information Systems Adamawa Yola Nigeria

ISBN: (数字)9781728131269

ISBN: (纸本)9781728131269

This research deals with the general issue of quality of service (QoS) provisioning and resource utilization in telecommunication networks. The issue requires that mobile network income be optimized while simultaneously satisfying QoS constraints that prevent getting into specific states and utilization of specific actions. However, supporting QoS requirements of different traffic types is more complicated due to the need to minimize two performance indicators - the probability of discarding a handover call and the probability of hindering a new call. Several approaches proposed recently try to provide efficient model-based solution to the problem by formulating it as an average reward neuro-dynamic programming (NDP) optimization problem together with decomposition function, but this is limited by Bellman's curse of dimensionality. In this paper, we proposed a novel hybrid optimization scheme to address the problem using Deep reinforcement learning (DRL), Markov Decision Process (MDP) and adaptive joint call admission control (AJCAC) respectively. In the proposed scheme, two classes of arrival traffic at the base station (BS) are considered;voice (real-time) and data (non-real-time) calls. Furthermore, traffic is classified as new and handoff according to the type of request. The scheme introduces an adaptive threshold value, which dynamically adjusts the network resources under high traffic intensity. In addition, the scheme introduces a learning agent whose state is described by an MDP. The MATLAB version 2010 software, OMNET++ simulator and SPSS will be used for data, numerical, algorithm simulation and modeling. Data analysis and simulation results will be carried out for performance evaluation of the proposed DQL-AJCAC scheme against existing models. © 2020 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

引用

ieee symposium Series on Computational Intelligence (SSCI)

作者： Cao, Xu Lin, Yanghao Fudan Univ Sch Data Sci Shanghai Peoples R China

ISBN: (纸本)9781728124858

We combine adaptive dynamic programming (ADP), a reinforcement learning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategies for a traditional board game Gomoku. For the adaptive dynamic programming part, we train a shallow forward neural network to give a quick evaluation of Gomoku board situations. UCT is a general approach in MCTS as a tree policy. Our framework use UCT to balance the exploration and exploitation of Gomoku game trees while we also apply powerful pruning strategies and heuristic function to re-select the available 2-adjacent grids of the state and use ADP instead of simulation to give estimated values of expanded nodes. Experiment result shows that this method can eliminate the search depth defect of the simulation process and converge to the correct value faster than single UCT. This approach can be applied to design new Gomoku AI and solve other Gomoku-like board game.

关键词： adaptive dynamic programming monte carlo tree search gomoku exponential heuristic progressive bias

来源：评论

学校读者我要写书评

暂无评论

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-learning

引用

ieee ACCESS 2020年 8卷 208938-208951页

作者： Xiao, Zhenfei Li, Jinna Li, Ping Liaoning Shihua Univ Sch Informat & Control Engn Fushun 113001 Liaoning Peoples R China

In this paper, a data-driven optimal control method based on adaptive dynamic programming and game theory is presented for solving the output feedback solutions of the H-infinity control problem for linear discrete-time systems with multiple players subject to multi-source disturbances. We first transform the H-infinity control problem into a multi-player game problem following the theoretical solutions according to game theory. Since the system state may not be measurable, we derive the output feedback based control policies and disturbances through mathematical operations. Considering the advantages of off-policy reinforcement learning (RL) over on-policy RL, a novel off-policy game Q-learning algorithm dealing with mixed competition and cooperation among players is developed, such that the H-infinity control problem can be finally solved for linear multi-player systems without the knowledge of system dynamics. Moreover, rigorous proofs of algorithm convergence and unbiasedness of solutions are presented. Finally, simulation results demonstrated the effectiveness of the proposed method.

关键词： Output feedback H-infinity control adaptive dynamic programming game theory reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With reinforcement learning

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2019年第12期49卷 2739-2750页

作者： Lv, Yongfeng Ren, Xuemei Beijing Inst Technol Sch Automat Beijing 100081 Peoples R China

Inspired by Nash game theory, a multiplayer mixed-zero-sum (MZS) nonlinear game considering both two situations [zero-sum and nonzero-sum (NZS) Nash games] is proposed in this paper. A synchronous reinforcement learning (RL) scheme based on the identifier-critic structure is developed to learn the Nash equilibrium solution of the proposed MZS game. First, the MZS game formulation is presented, where the performance indexes for players 1 to N - 1 and N NZS Nash game are presented, and another performance index for players N and N + 1 zero-sum game is presented, such that player N cooperates with players 1 to N - 1, while competes with player N + 1, which leads to a Nash equilibrium of all players. A single-layer neural network (NN) is then used to approximate the unknown dynamics of the nonlinear game system. Finally, an RL scheme based on NNs is developed to learn the optimal performance indexes, which can be used to produce the optimal control policy of every player such that Nash equilibrium can be obtained. Thus, the widely used actor NN in RL literature is not needed. To this end, a recently proposed adaptive law is used to estimate the unknown identifier coefficient vectors, and an improved adaptive law with the error performance index is further developed to update the critic coefficient vectors. Both linear and nonlinear simulations are presented to demonstrate the existence of Nash equilibrium for MZS game and performance of the proposed algorithm.

关键词： Approximate dynamic programming (ADP) Nash games neural networks (NNs) reinforcement learning (RL) system identification

来源：评论

学校读者我要写书评

暂无评论

Fault Tolerant Tracking Control Through Particle Swarm Optimization Based Policy Iteration 35

Fault Tolerant Tracking Control Through Particle Swarm Optim...

引用

35th Youth Academic Annual Conference of Chinese-Association-of-Automation (YAC)

作者： Liu, Xi Liu, Derong Zhao, Bo Guangdong Univ Technol Sch Automat Guangzhou Peoples R China Beijing Normal Univ Sch Syst Sci Beijing Peoples R China

ISBN: (纸本)9781728176840

This paper focuses on fault tolerant tracking control (FTTC) problems for nonlinear systems with actuator failure. For fault-free system, the tracking control input is derived by the policy iteration. To deal with the difficulty in choosing the weight of critic neural network (CNN), the CNN is trained by the particle swarm optimization to instead the traditional gradient descent method. To handle the actuator failure, a fault observer is constructed to compensate the tracking control input, and then the fault tolerant tracking controller is derived. The developed FTTC scheme can guarantee the tracking errors to be uniformly ultimately bounded even if the system suffers form actuator faults. A simulation study is provided to illustrate the effectiveness of the designed FTTC scheme.

关键词： adaptive dynamic programming reinforcement learning Fault tolerant control Optimal control Neural networks Particle swarm optimization

来源：评论

学校读者我要写书评

暂无评论

Model-Free adaptive Control Approach Using Integral reinforcement learning 13

Model-Free Adaptive Control Approach Using Integral Reinforc...

引用

13th ieee International symposium on Robotic and Sensors Environments (ROSE)

作者： Abouheaf, Mohammed Gueaieb, Wail Univ Ottawa Sch Elect Engn & Comp Sci Ottawa ON Canada Aswan Univ Coll Energy Engn Aswan Egypt

ISBN: (纸本)9781728119649

Integral reinforcement learning control approaches with derivative weighting performance indices require full knowledge of dynamic models of the considered systems. These approaches do not provide straightforward solutions for underlying integral Bellman optimality equations. This urged for innovative online model-free processes with simple adaptation mechanisms. An online integral reinforcement learning control approach is developed herein for systems operating in uncertain dynamical environments. It employs a value iteration adaptation process to solve the underlying integral temporal difference equation accompanied by model-free optimal control strategies. The proposed approach is tested to control a flexible wing aircraft where the system dynamics are not required by the online learning process. The stability and convergence properties of the adaptive learning mechanism are formally proven before they are validated through numerical simulations.

关键词： Mathematical model Optimal control Aircraft Atmospheric modeling reinforcement learning Stability analysis Adaptation models

来源：评论

学校读者我要写书评

暂无评论

An Online reinforcement learning Wing-Tracking Mechanism for Flexible Wing Aircraft 13

An Online Reinforcement Learning Wing-Tracking Mechanism for...

引用

13th ieee International symposium on Robotic and Sensors Environments (ROSE)

作者： Abouheaf, Mohammed Mailhot, Nathaniel Gueaieb, Wail Univ Ottawa Sch Elect Engn & Comp Sci Ottawa ON Canada Aswan Univ Coll Energy Engn Aswan Egypt Univ Ottawa Dept Mech Engn Ottawa ON Canada

ISBN: (纸本)9781728119649

Flexible wing aircraft are gaining an increasing interest due to their salient features, such as inexpensive market price, low-cost operation, in-flight robustness, multi-purpose use, and their ability to operate with very little infrastructure. The continuous variations in the aerodynamics of the wing and additionally the kinematic and dynamic constraints that evolve due to the wing-fuselage interactions make the modeling task of such systems ultimately challenging. An online model-free adaptive control mechanism based on two linear actuation systems is proposed in this manuscript to fulfill different pitch-roll maneuvers. The mechanism employs model-free tracking control strategies and utilizes a real-time value iteration-based reinforcement learning process. The adaptation of the control gains is accomplished online using means of adaptive critics.

关键词： Mathematical model Adaptation models Aerodynamics Aircraft Atmospheric modeling adaptive learning reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：