We apply an improved variant of Monte Carlo Tree Search (MCTS), MCTS with Voronoi Progressive Widening (VPW), to cognitive radar tracking. Because cognitive radar systems have unparalleled waveform agility across an i...
详细信息
ISBN:
(纸本)9798350329216;9798350329209
We apply an improved variant of Monte Carlo Tree Search (MCTS), MCTS with Voronoi Progressive Widening (VPW), to cognitive radar tracking. Because cognitive radar systems have unparalleled waveform agility across an immense parameter space, reinforcement learning techniques must deal with large, multi-dimensional action spaces. Prior applications of MCTS are inefficient because they uniformly explore new actions without regards to available information. We demonstrate how a Voronoi partitioning based scheme improves on the exploration of new waveforms leading to better combined tracking performance and radar resource usage in a standard benchmark tracking scenario.
This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics. The model-free on-policy and off-policy learning algorithms are developed, where the system dynam...
详细信息
ISBN:
(纸本)9798331540845;9789887581598
This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics. The model-free on-policy and off-policy learning algorithms are developed, where the system dynamics is not required and the Riccati equation does not need solving. The convergence of the algorithms is shown and relationships between algorithms are illustrated. The effectiveness of model-free algorithms is shown by numerical experiments.
approximate dynamic programming (ADP) has emerged as a leading method for solving optimal control problems using reinforcement learning (RL) with many benefits and also many open research problems. Model-based methods...
详细信息
approximate dynamic programming (ADP) has emerged as a leading method for solving optimal control problems using reinforcement learning (RL) with many benefits and also many open research problems. Model-based methods allow for off-trajectory learning, but they require exact model knowledge. When exact model knowledge is not readily available a priori, approximate models can be used to obtain approximations of the optimal value function and the optimal control policy. This dissertation focuses on the intersection of optimality and uncertainty by filling the gaps in the literature and advancing real-time learning in ADP. Specifically the methods developed in this dissertation highlight the advancements of approximate optimal control in the presence of unknown model dynamics with stability *** 1 provides an literature overview containing background on RL and RL in control, actor-critic methods, and ADP. An outline of the dissertation is also provided in this chapter. The subsequent chapters of this dissertation elaborate on the evolution of system identification techniques for ADP in the presence of unknown systems for different *** 2 introduces a hierarchical agent to facilitate switched ADP. The standard switched ADP result lacks guidance on how/when to switch. This chapter introduces a framework that uses hierarchical reinforcement learning (HRL) to create a switching pattern. Previous results contained unsupervised switching, and this chapter provides a method for supervised switching to be used to achieve optimality by using a hierarchy to optimize a selected performance method. The hierarchical agent selects which subsystem to switch to based on which subsystem yields the lowest value function approximation at that time. The control objectives are to minimize the infinite-horizon cost function of each subsystem and to design a switching rule that yields a lower cost for switching between subsystems. Uniformly ultimately bounded (UUB)
In this article, a new time-varying adaptivedynamicprogramming (ADP) algorithm is developed to solve finite-horizon optimal control problems for a class of discrete-time affine nonlinear systems. Inspired by the pseu...
详细信息
In this article, a new time-varying adaptivedynamicprogramming (ADP) algorithm is developed to solve finite-horizon optimal control problems for a class of discrete-time affine nonlinear systems. Inspired by the pseudolinear method, the nonlinear system can be approximated by a series of time-varying linear systems. In each iteration of the time-varying ADP algorithm, the optimal control law for the time-varying linear system is obtained. For an arbitrary initial state, it is proven that states of the time-varying linear systems converge to the states of discrete-time affine nonlinear systems. It is also shown that the iterative value functions and the iterative control laws converge to the optimal value function and the optimal control law, respectively. Finally, numerical results are presented to verify the effectiveness of the present method.
Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical...
详细信息
ISBN:
(纸本)9798350304060;9798350304053
Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.
The core task of tracking control is to make the controlled plant track a desired *** traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps *...
详细信息
The core task of tracking control is to make the controlled plant track a desired *** traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps *** this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control *** the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to *** discounted iterative scheme under the new cost function for the special case of linear systems is ***,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.
Cooperative driving of human driver and automated system can effectively reduce the necessity of extremely accurate environment perception of highly automated vehicles, and enhance the robustness of decision-making an...
详细信息
Cooperative driving of human driver and automated system can effectively reduce the necessity of extremely accurate environment perception of highly automated vehicles, and enhance the robustness of decision-making and motion control. However, due to the two players' different intentions, severe conflicts may exist during the cooperation, which often result in negative consequences on driving safety and maneuverability. This paper presents an indirect shared control method to model the situation and improve the driving performance, which focus on the affine input nonlinear vehicle dynamic system for shared controller design under the framework of non-zero sum differential game. The Nash equilibria strategy indicates the best response for the automated system, which can guide the automated controller to act more safely and comfortably. Aimed to obtain fast solution for practical application, approximate dynamic programming is utilized to find the Nash equilibria, which is represented by deep neural networks and solved iteratively. Driver-in-the-loop tests on a driving simulator were conducted to verify the performance of the proposed method under highway driving scenarios. The results show that the designed controller is able to reduce the driving workload and ensure the driving safety.
The majority of the research efforts that aim to solve UAV path optimization problems in a Reinforcement Learning (RL) setting focus on closed spaces or urban areas as the operating environment. The problem of Tactica...
详细信息
The majority of the research efforts that aim to solve UAV path optimization problems in a Reinforcement Learning (RL) setting focus on closed spaces or urban areas as the operating environment. The problem of Tactical UAV (TUAV) path planning under hostile radar tracking threat has some peculiarities that distinguish it from other typical UAV path optimization problems. Particularly, 1-spatial regions delineated by threat probabilities may be legitimately penetrable under certain conditions that do not impair the survivability of the UAV and 2-A TUAV is detectable by a radar via its Radar Cross Section (RCS) which is a function of multiple parameters such as the radar operating frequency, the shape of the UAV and more importantly the engagement geometry between the radar and the UAV. The latter suggests that any maneuver performed by the UAV may change multiple angles that specify the engagement geometry. The work presented in this paper proposes a RL based solution to this complex problem in a novel way by 1-Implementing a Markov Decision Process (MDP) compliant RL environment with comprehensive probabilistic radar behavior models incorporated into it and 2-Integrating a core RL algorithm (namely DQN with Prioritized Experience Replay (DQN-PER) with a specific variant of transfer learning (namely learning from demonstrations (LfD)) in a single framework, demonstrating the utility of combining a core RL algorithm and a machine learning scheme toward boosting the performance of a learning agent, and more importantly to alleviate the sparse reward problem.
This paper investigates scheduling in space and time domains for multi-user underwater acoustic networks under fairness considerations. The problem is formulated as a sequential decision-making problem under the Marko...
详细信息
ISBN:
(纸本)9798350325744
This paper investigates scheduling in space and time domains for multi-user underwater acoustic networks under fairness considerations. The problem is formulated as a sequential decision-making problem under the Markov Decision Processes (MDP) framework. Considering the difficulty of collecting data samples in an underwater acoustic channel for exploration, a planning approach is taken instead of online learning. To guarantee fairness among users, the proportional fair measure is employed, which breaks the additive structure between current and future rewards. To this end, a new, fairly weighted, decomposable reward function is proposed, enabling dynamicprogramming as the solution strategy. Furthermore, a sampling-based approximate planning scheme is developed to resolve the high computation complexity induced by the exponentially large state space. The characteristics of error accumulation in successive approximations are analyzed, and an upper bound on the approximation error is derived. It is shown that the instantaneous error decays with time. Numerical results show that the proposed scheme significantly improves network capacity while maintaining a high level of fairness relative to other schemes.
暂无评论