检索结果-内蒙古大学图书馆

Reinforcement learning and approximate dynamic programming for feedback control /

引用

2013年

作者： edited by Frank L. Lewis Derong Liu.

来源：内蒙古大学图书馆图书评论

学校读者我要写书评

暂无评论

Voronoi Progressive Widening for Cognitive Radar Tracking with Large Waveform Libraries

Voronoi Progressive Widening for Cognitive Radar Tracking wi...

引用

IEEE Radar Conference (RadarConf)

作者： Rybicki, Brian W. Nelson, Jill K. George Mason Univ Dept Elect & Comp Engn Fairfax VA USA US Naval Res Lab Washington DC USA

ISBN: (纸本)9798350329216;9798350329209

We apply an improved variant of Monte Carlo Tree Search (MCTS), MCTS with Voronoi Progressive Widening (VPW), to cognitive radar tracking. Because cognitive radar systems have unparalleled waveform agility across an immense parameter space, reinforcement learning techniques must deal with large, multi-dimensional action spaces. Prior applications of MCTS are inefficient because they uniformly explore new actions without regards to available information. We demonstrate how a Voronoi partitioning based scheme improves on the exploration of new waveforms leading to better combined tracking performance and radar resource usage in a standard benchmark tracking scenario.

关键词： cognitive radar track loss approximate dynamic programming Monte Carlo tree search POMDPs Voronoi partitioning

来源：评论

学校读者我要写书评

暂无评论

On-policy and Off-policy Value Iteration Algorithms for Stochastic Zero-Sum Games 14

On-policy and Off-policy Value Iteration Algorithms for Stoc...

引用

14th Asian Control Conference (ASCC)

作者： Guo, Liangyuan Wang, Bing-Chang Sun, Bo Shandong Univ Sch Control Sci & Engn Jinan Peoples R China

ISBN: (纸本)9798331540845;9789887581598

This paper considers the value iteration algorithms of stochastic zero-sum linear quadratic games with unkown dynamics. The model-free on-policy and off-policy learning algorithms are developed, where the system dynamics is not required and the Riccati equation does not need solving. The convergence of the algorithms is shown and relationships between algorithms are illustrated. The effectiveness of model-free algorithms is shown by numerical experiments.

关键词： approximate dynamic programming Stochastic zero-sum games Value iteration

来源：评论

学校读者我要写书评

暂无评论

Real-Time Learning for Suboptimal Control of Unknown Systems

Real-Time Learning for Suboptimal Control of Unknown Systems

引用

作者： Makumi, Wanjiku Aprile University of Florida

学位级别：Ph.D., Doctor of Philosophy

approximate dynamic programming (ADP) has emerged as a leading method for solving optimal control problems using reinforcement learning (RL) with many benefits and also many open research problems. Model-based methods allow for off-trajectory learning, but they require exact model knowledge. When exact model knowledge is not readily available a priori, approximate models can be used to obtain approximations of the optimal value function and the optimal control policy. This dissertation focuses on the intersection of optimality and uncertainty by filling the gaps in the literature and advancing real-time learning in ADP. Specifically the methods developed in this dissertation highlight the advancements of approximate optimal control in the presence of unknown model dynamics with stability *** 1 provides an literature overview containing background on RL and RL in control, actor-critic methods, and ADP. An outline of the dissertation is also provided in this chapter. The subsequent chapters of this dissertation elaborate on the evolution of system identification techniques for ADP in the presence of unknown systems for different *** 2 introduces a hierarchical agent to facilitate switched ADP. The standard switched ADP result lacks guidance on how/when to switch. This chapter introduces a framework that uses hierarchical reinforcement learning (HRL) to create a switching pattern. Previous results contained unsupervised switching, and this chapter provides a method for supervised switching to be used to achieve optimality by using a hierarchy to optimize a selected performance method. The hierarchical agent selects which subsystem to switch to based on which subsystem yields the lowest value function approximation at that time. The control objectives are to minimize the infinite-horizon cost function of each subsystem and to design a switching rule that yields a lower cost for switching between subsystems. Uniformly ultimately bounded (UUB)

关键词： Deep learning technique Reinforcement learning approximate dynamic programming Optimal control problems

来源：评论

学校读者我要写书评

暂无评论

A New Approach to Finite-Horizon Optimal Control for Discrete-Time Affine Nonlinear Systems via a Pseudolinear Method

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2022年第5期67卷 2610-2617页

作者： Wei, Qinglai Zhu, Liao Li, Tao Liu, Derong Chinese Acad Sci State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing 100049 Peoples R China Macau Univ Sci & Technol Inst Syst Engn Macau 999078 Peoples R China Guangdong Univ Technol Sch Automat Guangzhou 510006 Peoples R China

In this article, a new time-varying adaptivedynamic programming (ADP) algorithm is developed to solve finite-horizon optimal control problems for a class of discrete-time affine nonlinear systems. Inspired by the pseudolinear method, the nonlinear system can be approximated by a series of time-varying linear systems. In each iteration of the time-varying ADP algorithm, the optimal control law for the time-varying linear system is obtained. For an arbitrary initial state, it is proven that states of the time-varying linear systems converge to the states of discrete-time affine nonlinear systems. It is also shown that the iterative value functions and the iterative control laws converge to the optimal value function and the optimal control law, respectively. Finally, numerical results are presented to verify the effectiveness of the present method.

关键词： Time-varying systems Nonlinear systems Optimal control Heuristic algorithms dynamic programming Neural networks Linear systems Adaptive dynamic programming approximate dynamic programming finite horizon nonlinear systems optimal control pseudolinear approximation

来源：评论

学校读者我要写书评

暂无评论

Adaptive dynamic programming for Energy-Efficient Base Station Cell Switching 59

Adaptive Dynamic Programming for Energy-Efficient Base Stati...

引用

59th Annual IEEE International Conference on Communications (IEEE ICC)

作者： Luo, Junliang Xu, Yi Tian Wu, Di Jenkin, Michael Liu, Xue Dudek, Gregory Samsung AI Ctr Montreal Montreal PQ Canada

ISBN: (纸本)9798350304060;9798350304053

Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.

关键词： Energy Saving Base station sleeping Switching cell on/off approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control

引用

IEEE/CAA Journal of Automatica Sinica 2022年第7期9卷 1262-1272页

作者： Mingming Ha Ding Wang Derong Liu School of Automation and Electrical Engineering University of Science and Technology BeijingBeijing 100083China Faculty of Information Technology the Beijing Key Laboratory of Computational Intelligence and Intelligent Systemthe Beijing Laboratory of Smart Environmental Protectionand the Beijing Institute of Artificial IntelligenceBeijing University of TechnologyBeijing 100124China Department of Electrical and Computer Engineering University of Illinois at ChicagoChicago IL 60607 USA IEEE

The core task of tracking control is to make the controlled plant track a desired *** traditional performance index used in previous studies cannot eliminate completely the tracking error as the number of time steps *** this paper,a new cost function is introduced to develop the value-iteration-based adaptive critic framework to solve the tracking control *** the regulator problem,the iterative value function of tracking control problem cannot be regarded as a Lyapunov function.A novel stability analysis method is developed to guarantee that the tracking error converges to *** discounted iterative scheme under the new cost function for the special case of linear systems is ***,the tracking performance of the present scheme is demonstrated by numerical results and compared with those of the traditional approaches.

关键词： Adaptive critic design adaptive dynamic programming(ADP) approximate dynamic programming discrete-time nonlinear systems reinforcement learning stability analysis tracking control value iteration(VI)

来源：评论

学校读者我要写书评

暂无评论

Indirect Shared Control Through Non-Zero Sum Differential Game for Cooperative Automated Driving

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2022年第9期23卷 15980-15992页

作者： Li, Wenyu Li, Qingkun Li, Shengbo Eben Li, Renjie Ren, Yangang Wang, Wenjun Nankai Univ Coll Artificial Intelligence Tianjin 300350 Peoples R China Tsinghua Univ Sch Vehicle & Mobil Beijing 100084 Peoples R China

Cooperative driving of human driver and automated system can effectively reduce the necessity of extremely accurate environment perception of highly automated vehicles, and enhance the robustness of decision-making and motion control. However, due to the two players' different intentions, severe conflicts may exist during the cooperation, which often result in negative consequences on driving safety and maneuverability. This paper presents an indirect shared control method to model the situation and improve the driving performance, which focus on the affine input nonlinear vehicle dynamic system for shared controller design under the framework of non-zero sum differential game. The Nash equilibria strategy indicates the best response for the automated system, which can guide the automated controller to act more safely and comfortably. Aimed to obtain fast solution for practical application, approximate dynamic programming is utilized to find the Nash equilibria, which is represented by deep neural networks and solved iteratively. Driver-in-the-loop tests on a driving simulator were conducted to verify the performance of the proposed method under highway driving scenarios. The results show that the designed controller is able to reduce the driving workload and ensure the driving safety.

关键词： Vehicles Vehicle dynamics Wheels Differential games Nonlinear dynamical systems Games Safety approximate dynamic programming automated vehicle cooperative driving differential game shared control

来源：评论

学校读者我要写书评

暂无评论

Tactical UAV path optimization under radar threat using deep reinforcement learning

引用

NEURAL COMPUTING & APPLICATIONS 2022年第7期34卷 5649-5664页

作者： Alpdemir, M. Nedim TUBITAK Informat & Informat Secur Res Ctr BILGEM Gebze Turkey

The majority of the research efforts that aim to solve UAV path optimization problems in a Reinforcement Learning (RL) setting focus on closed spaces or urban areas as the operating environment. The problem of Tactical UAV (TUAV) path planning under hostile radar tracking threat has some peculiarities that distinguish it from other typical UAV path optimization problems. Particularly, 1-spatial regions delineated by threat probabilities may be legitimately penetrable under certain conditions that do not impair the survivability of the UAV and 2-A TUAV is detectable by a radar via its Radar Cross Section (RCS) which is a function of multiple parameters such as the radar operating frequency, the shape of the UAV and more importantly the engagement geometry between the radar and the UAV. The latter suggests that any maneuver performed by the UAV may change multiple angles that specify the engagement geometry. The work presented in this paper proposes a RL based solution to this complex problem in a novel way by 1-Implementing a Markov Decision Process (MDP) compliant RL environment with comprehensive probabilistic radar behavior models incorporated into it and 2-Integrating a core RL algorithm (namely DQN with Prioritized Experience Replay (DQN-PER) with a specific variant of transfer learning (namely learning from demonstrations (LfD)) in a single framework, demonstrating the utility of combining a core RL algorithm and a machine learning scheme toward boosting the performance of a learning agent, and more importantly to alleviate the sparse reward problem.

关键词： Deep reinforcement learning approximate dynamic programming UAV path optimization Machine learning Simulation-based optimization Modeling and simulation

来源：评论

学校读者我要写书评

暂无评论

Sampling-Based Linear approximate Planning for Underwater Space-Time Fair Scheduling 57

Sampling-Based Linear Approximate Planning for Underwater Sp...

引用

57th Asilomar Conference on Signals, Systems and Computers

作者： Peng, Chen Mitra, Urbashi Univ Southern Calif Dept Elect & Comp Engn Los Angeles CA 90007 USA

ISBN: (纸本)9798350325744

This paper investigates scheduling in space and time domains for multi-user underwater acoustic networks under fairness considerations. The problem is formulated as a sequential decision-making problem under the Markov Decision Processes (MDP) framework. Considering the difficulty of collecting data samples in an underwater acoustic channel for exploration, a planning approach is taken instead of online learning. To guarantee fairness among users, the proportional fair measure is employed, which breaks the additive structure between current and future rewards. To this end, a new, fairly weighted, decomposable reward function is proposed, enabling dynamic programming as the solution strategy. Furthermore, a sampling-based approximate planning scheme is developed to resolve the high computation complexity induced by the exponentially large state space. The characteristics of error accumulation in successive approximations are analyzed, and an upper bound on the approximation error is derived. It is shown that the instantaneous error decays with time. Numerical results show that the proposed scheme significantly improves network capacity while maintaining a high level of fairness relative to other schemes.

关键词： Underwater acoustic network fairness approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：