Small base stations (SBs) of fifth-generation (SG) cellular networks are envisioned to have storage devices to locally serve requests for reusable and popular contents by caching them at the edge of the network, close...
详细信息
Small base stations (SBs) of fifth-generation (SG) cellular networks are envisioned to have storage devices to locally serve requests for reusable and popular contents by caching them at the edge of the network, close to the end users. The ultimate goal is to smartly utilize a limited storage capacity to serve locally contents that are frequently requested instead of fetching them from the cloud, contributing to a better overall network performance and service experience. To enable the SBs with efficient fetch-cache decision-making schemes operating in dynamic settings, this paper introduces simple but flexible generic time-varying fetching and caching costs, which are then used to formulate a constrained minimization of the aggregate cost across files and time. Since caching decisions per time slot influence the content availability in future slots, the novel formulation for optimal fetch-cache decisions falls into the class of dynamicprogramming. Under this generic formulation, first by considering stationary distributions for the costs as well as file popularities, an efficient reinforcementlearning-based solver known as value iteration algorithm can be used to solve the emerging optimization problem. Later, it is shown that practical limitations on cache capacity can be handled using a particular instance of this generic dynamic pricing formulation. Under this setting, to provide a light-weight online solver for the corresponding optimization, the well-known reinforcementlearning algorithm, Q-learning, is employed to find optimal fetch-cache decisions. Numerical tests corroborating the merits of the proposed approach wrap up the paper.
In this paper, we investigate the problem of power control for streaming variable bit rate (VBR) videos over wireless links. A system model involving a transmitter (e.g., a base station) that sends VBR video data to a...
详细信息
In this paper, we investigate the problem of power control for streaming variable bit rate (VBR) videos over wireless links. A system model involving a transmitter (e.g., a base station) that sends VBR video data to a receiver (e.g., a mobile user) equipped with a playout buffer is adopted, as used in dynamicadaptive streaming video applications. In this setting, we analyze power control policies considering the following two objectives: 1) the minimization of the transmit power consumption and 2) the minimization of the transmission completion time of the communication session. In order to play the video without interruptions, the power control policy should also satisfy the requirement in which the VBR video data is delivered to the mobile user without causing playout buffer underflow or overflows. A directional water-filling algorithm, which provides a simple and concise interpretation of the necessary optimality conditions, is identified as the optimal offline policy. Following this, two online policies are proposed for power control based on channel side information (CSI) prediction within a short time window. dynamicprogramming is employed to implement the optimal offline and the initial online power control policies that minimize the transmit power consumption in the communication session. Subsequently, reinforcementlearning (RL)-based approach is employed for the second online power control policy. Through the simulation results, we show that the optimal offline power control policy that minimizes the overall power consumption leads to substantial energy savings compared with the strategy of minimizing the time duration of video streaming. We also demonstrate that the RL algorithm performs better than the dynamicprogramming-based online grouped water-filling (GWF) strategy unless the channel is highly correlated.
Approximate dynamicprogramming (ADP) and reinforcementlearning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-sta...
详细信息
Approximate dynamicprogramming (ADP) and reinforcementlearning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.
This research deals with the general issue of quality of service (QoS) provisioning and resource utilization in telecommunication networks. The issue requires that mobile network income be optimized while simultaneous...
详细信息
We combine adaptivedynamicprogramming (ADP), a reinforcementlearning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategi...
详细信息
ISBN:
(纸本)9781728124858
We combine adaptivedynamicprogramming (ADP), a reinforcementlearning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategies for a traditional board game Gomoku. For the adaptivedynamicprogramming part, we train a shallow forward neural network to give a quick evaluation of Gomoku board situations. UCT is a general approach in MCTS as a tree policy. Our framework use UCT to balance the exploration and exploitation of Gomoku game trees while we also apply powerful pruning strategies and heuristic function to re-select the available 2-adjacent grids of the state and use ADP instead of simulation to give estimated values of expanded nodes. Experiment result shows that this method can eliminate the search depth defect of the simulation process and converge to the correct value faster than single UCT. This approach can be applied to design new Gomoku AI and solve other Gomoku-like board game.
In this paper, a data-driven optimal control method based on adaptivedynamicprogramming and game theory is presented for solving the output feedback solutions of the H-infinity control problem for linear discrete-ti...
详细信息
In this paper, a data-driven optimal control method based on adaptivedynamicprogramming and game theory is presented for solving the output feedback solutions of the H-infinity control problem for linear discrete-time systems with multiple players subject to multi-source disturbances. We first transform the H-infinity control problem into a multi-player game problem following the theoretical solutions according to game theory. Since the system state may not be measurable, we derive the output feedback based control policies and disturbances through mathematical operations. Considering the advantages of off-policy reinforcementlearning (RL) over on-policy RL, a novel off-policy game Q-learning algorithm dealing with mixed competition and cooperation among players is developed, such that the H-infinity control problem can be finally solved for linear multi-player systems without the knowledge of system dynamics. Moreover, rigorous proofs of algorithm convergence and unbiasedness of solutions are presented. Finally, simulation results demonstrated the effectiveness of the proposed method.
Inspired by Nash game theory, a multiplayer mixed-zero-sum (MZS) nonlinear game considering both two situations [zero-sum and nonzero-sum (NZS) Nash games] is proposed in this paper. A synchronous reinforcement learni...
详细信息
Inspired by Nash game theory, a multiplayer mixed-zero-sum (MZS) nonlinear game considering both two situations [zero-sum and nonzero-sum (NZS) Nash games] is proposed in this paper. A synchronous reinforcementlearning (RL) scheme based on the identifier-critic structure is developed to learn the Nash equilibrium solution of the proposed MZS game. First, the MZS game formulation is presented, where the performance indexes for players 1 to N - 1 and N NZS Nash game are presented, and another performance index for players N and N + 1 zero-sum game is presented, such that player N cooperates with players 1 to N - 1, while competes with player N + 1, which leads to a Nash equilibrium of all players. A single-layer neural network (NN) is then used to approximate the unknown dynamics of the nonlinear game system. Finally, an RL scheme based on NNs is developed to learn the optimal performance indexes, which can be used to produce the optimal control policy of every player such that Nash equilibrium can be obtained. Thus, the widely used actor NN in RL literature is not needed. To this end, a recently proposed adaptive law is used to estimate the unknown identifier coefficient vectors, and an improved adaptive law with the error performance index is further developed to update the critic coefficient vectors. Both linear and nonlinear simulations are presented to demonstrate the existence of Nash equilibrium for MZS game and performance of the proposed algorithm.
This paper focuses on fault tolerant tracking control (FTTC) problems for nonlinear systems with actuator failure. For fault-free system, the tracking control input is derived by the policy iteration. To deal with the...
详细信息
ISBN:
(纸本)9781728176840
This paper focuses on fault tolerant tracking control (FTTC) problems for nonlinear systems with actuator failure. For fault-free system, the tracking control input is derived by the policy iteration. To deal with the difficulty in choosing the weight of critic neural network (CNN), the CNN is trained by the particle swarm optimization to instead the traditional gradient descent method. To handle the actuator failure, a fault observer is constructed to compensate the tracking control input, and then the fault tolerant tracking controller is derived. The developed FTTC scheme can guarantee the tracking errors to be uniformly ultimately bounded even if the system suffers form actuator faults. A simulation study is provided to illustrate the effectiveness of the designed FTTC scheme.
Integral reinforcementlearning control approaches with derivative weighting performance indices require full knowledge of dynamic models of the considered systems. These approaches do not provide straightforward solu...
详细信息
ISBN:
(纸本)9781728119649
Integral reinforcementlearning control approaches with derivative weighting performance indices require full knowledge of dynamic models of the considered systems. These approaches do not provide straightforward solutions for underlying integral Bellman optimality equations. This urged for innovative online model-free processes with simple adaptation mechanisms. An online integral reinforcementlearning control approach is developed herein for systems operating in uncertain dynamical environments. It employs a value iteration adaptation process to solve the underlying integral temporal difference equation accompanied by model-free optimal control strategies. The proposed approach is tested to control a flexible wing aircraft where the system dynamics are not required by the online learning process. The stability and convergence properties of the adaptivelearning mechanism are formally proven before they are validated through numerical simulations.
Flexible wing aircraft are gaining an increasing interest due to their salient features, such as inexpensive market price, low-cost operation, in-flight robustness, multi-purpose use, and their ability to operate with...
详细信息
ISBN:
(纸本)9781728119649
Flexible wing aircraft are gaining an increasing interest due to their salient features, such as inexpensive market price, low-cost operation, in-flight robustness, multi-purpose use, and their ability to operate with very little infrastructure. The continuous variations in the aerodynamics of the wing and additionally the kinematic and dynamic constraints that evolve due to the wing-fuselage interactions make the modeling task of such systems ultimately challenging. An online model-free adaptive control mechanism based on two linear actuation systems is proposed in this manuscript to fulfill different pitch-roll maneuvers. The mechanism employs model-free tracking control strategies and utilizes a real-time value iteration-based reinforcementlearning process. The adaptation of the control gains is accomplished online using means of adaptive critics.
暂无评论