检索结果-内蒙古大学图书馆

Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances

ieee TRANSACTIONS ON CYBERNETICS 2016年第5期46卷 1041-1050页

作者： Song, Ruizhuo Lewis, Frank L. Wei, Qinglai Zhang, Huaguang Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Univ Texas Arlington UTA Res Inst Ft Worth TX 76118 USA Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110004 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Peoples R China

An optimal control method is developed for unknown continuous-time systems with unknown disturbances in this paper. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.

关键词： adaptive critic designs adaptive/approximate dynamic programming (ADP) dynamic programming off-policy optimal control unknown system

来源：评论

学校读者我要写书评

暂无评论

adaptive learning solution of the nonzero-sum differential game with unknown dynamics using adaptive dynamic programming 28

Adaptive learning solution of the nonzero-sum differential g...

引用

28th Chinese Control and Decision Conference

作者： Qin, Chunbin Sun, Hongfei Liu, Xianxing Chen, Jiaqi Henan Univ Sch Comp & Informat Engn Kaifeng 475004 Peoples R China Henan Univ Coll Environm & Planning Kaifeng 475004 Peoples R China Henan Univ Sch Software Kaifeng 475004 Peoples R China

ISBN: (纸本)9781467397148

In this paper, a novel partially model-free adaptive dynamic programming (ADP) algorithm is presented to solve online the nonzero-sum differential games of continuous-time linear systems with unknown drift dynamics. Firstly, by using the integral reinforcement learning technique, the partially model-free ADP algorithm is developed to solve online the set of coupled algebraic Riccati equation (ARE) underlying the game problem without the requirement of the complete knowledge of the system dynamics. And then, the convergence of the partially model-free ADP algorithm is proved by demonstrating that it is mathematically equivalent to the extended Kleiman's algorithm, previously proposed in the literature, that solves in an offline sense the set of coupled algebraic Riccati equation using the complete knowledge of the system dynamics. Finally, one example is given to demonstrate the efficiency of the proposed algorithm.

关键词： Nonzero-sum differential game adaptive dynamic programming Unknown drift dynamics

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Generalized Policy Iteration ADP Algorithm With Approximation Errors

Discrete-Time Generalized Policy Iteration ADP Algorithm Wit...

引用

ieee symposium Series on Computational Intelligence

作者： Qinglai Wei Benkai Li Ruizhuo Song The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences Beijing China School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing China

This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.

关键词： adaptive critic designs adaptive dynamic programming Approximate dynamic programming Neuro-dynamic programming Generalized policy iteration Nonlinear systems Optimal control Neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Towards An Integrated learning Framework for Behavior Modeling of adaptive CGFs 9

Towards An Integrated Learning Framework for Behavior Modeli...

引用

9th International symposium on Computational Intelligence and Design (ISCID)

作者： Zhang, Qi Yin, Quanjun Xu, Kai Natl Univ Def Technol Coll Informat Syst & Management Changsha Hunan Peoples R China

ISBN: (纸本)9781509035588

Computer generated forces (CGFs) are autonomous or semi-autonomous actors within military, simulation based, training and analyzing applications. Rapid, realistic and adaptive behavior modeling for CGFs is imperative and challenging. Traditional modeling approaches like rule-based script usually need time-consuming, repetitive endeavor and result in rigid, predictable behavior performance. Recent developments introducing Machine learning (ML) techniques, such as dynamic script or neural network models, always present as black box systems, which are difficult to understand and revise for subject matter experts (SMEs). To overcome these limitations, we propose an integrated learning framework to facilitate adaptive CGF behavior modeling. The framework represents domain knowledge explicitly as Behavior Trees (BTs), and integrates learning BTs automatically from demonstration and reinforcement learning (RL) node into BTs. Besides, a CBR-style planner is adopted to retrieve executable behavior for diverse situations encountered at runtime. Through aforementioned components, the framework can make full use of the advantages of various learning approaches and knowledge sources to generate realistic and adaptive behaviors for CGFs easily.

关键词： Computer Generated Forces (CGFs) adaptive Behavior Modeling Behavior Trees (BTs) learning from Demonstration (LFD) learning from Experience

来源：评论

学校读者我要写书评

暂无评论

Countering Improvised Explosive Devices With adaptive Sensor Networks

Countering Improvised Explosive Devices With Adaptive Sensor...

引用

ieee symposium on Technologies for Homeland Security (HST)

作者： Buenfil, Jorge R. Ramirez-Marquez, Jose US Army ARDEC Picatinny Arsenal NJ USA Stevens Inst Technol Sch Syst & Enterprises Hoboken NJ USA

ISBN: (纸本)9781509007707

The design and architecture of a system for automatic Improvised Explosive Devices detection to protect sensitive areas with minimal human interaction is presented. The system, called ACE for "Army Counter IED Enhanced", employs a variety of statistical analysis, pattern recognition and human machine interface in conjunction with adaptive mechanisms. ACE combines four different kinds of inputs: image processing, nonvisual inputs, pattern recognition, and Kalman filters. ACE produces three kinds of outputs: a visualization of the area under surveillance, a highlight of potential threats, and alarms that trigger traffic control devices to contain the threat while security forces proceed to confirm and neutralize the threat. Data fusion of the inputs is conducted with a dynamic system assigning weights to the values provided by each input, adding results into a threat assessment value (TAV), in order to compare it to thresholds for alerts and alarms.

关键词： adaptive Systems Bayesian rule data fusion dynamic system EOD HMI IED Kalman filter OODA loop orthogonal sensors pattern recognition raspberry pi reinforcement learning threat assessment

来源：评论

学校读者我要写书评

暂无评论

ATM: Approximate Task Memoization in the Runtime System

ATM: Approximate Task Memoization in the Runtime System

引用

International symposium on Parallel and Distributed Processing (IPDPS)

作者： Iulian Brumar Marc Casas Miquel Moreto Mateo Valero Gurindar S. Sohi Barcelona Supercomputing Center (BSC) Barcelona Spain University of Wisconsin-Madison USA

Redundant computations appear during the execution of real programs. Multiple factors contribute to these unnecessary computations, such as repetitive inputs and patterns, calling functions with the same parameters or bad programming habits. Compilers minimize non useful code with static analysis. However, redundant execution might be dynamic and there are no current approaches to reduce these inefficiencies. Additionally, many algorithms can be computed with different levels of accuracy. Approximate computing exploits this fact to reduce execution time at the cost of slightly less accurate results. In this case, expert developers determine the desired tradeoff between performance and accuracy for each application. In this paper, we present Approximate Task Memoization (ATM), a novel approach in the runtime system that transparently exploits both dynamic redundancy and approximation at the task granularity of a parallel application. Memoization of previous task executions allows predicting the results of future tasks without having to execute them and without losing accuracy. To further increase performance improvements, the runtime system can memoize similar tasks, which leads to task approximate computing. By defining how to measure task similarity and correctness, we present an adaptive algorithm in the runtime system that automatically decides if task approximation is beneficial or not. When evaluated on a real 8-core processor with applications from different domains (financial analysis, stencil-computation, machine-learning and linear-algebra), ATM achieves a 1.4x average speedup when only applying memoization techniques. When adding task approximation, ATM achieves a 2.5x average speedup with an average 0.7% accuracy loss (maximum of 3.2%).

关键词： Runtime programming Approximate computing History Redundancy Data structures Parallel processing

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning Approach for Cost- and Energy-Aware Mobile Data Offloading 18

A Reinforcement Learning Approach for Cost- and Energy-Aware...

引用

18th Asia-Pacific Network Operations and Management symposium (APNOMS)

作者： Zhang, Cheng Gu, Bo Liu, Zhi Yamori, Kyoko Tanaka, Yoshiaki Waseda Univ Dept Comp Sci & Commun Engn Tokyo 1690072 Japan Kogakuin Univ Dept Informat & Commun Engn Tokyo 1920015 Japan Waseda Univ Global Informat & Telecommun Inst Tokyo 1698555 Japan Asahi Univ Dept Management Informat Mizuho 5010296 Japan Waseda Univ Dept Commun & Comp Engn Tokyo 1698555 Japan

ISBN: (纸本)9784885523045

\With rapid increases in demand for mobile data, mobile network operators are trying to expand wireless network capacity by deploying WiFi hotspots to offload their mobile traffic. However, these network-centric methods usually do not fulfill interests of mobile users (MUs). MUs consider many problems to decide whether to offload their traffic to a complementary WiFi network. In this paper, we study the WiFi offloading problem from MU's perspective by considering delay-tolerance of traffic, monetary cost, energy consumption as well as the availability of MU's mobility pattern. We first formulate the WiFi offloading problem as a finite-horizon discrete-time Markov decision process (FDTMDP) with known MU's mobility pattern and propose a dynamic programming based offloading algorithm. Since MU's mobility pattern may not be known in advance, we then propose a reinforcement learning based offloading algorithm, which can work well with unknown MU's mobility pattern. Extensive simulations are conducted to validate our proposed offloading algorithms.

关键词： WiFi mobile data offloading reinforcement learning energy-aware

来源：评论

学校读者我要写书评

暂无评论

Proceedings - 11th International symposium on Software Engineering for adaptive and Self-Managing Systems, SEAMS 2016

Proceedings - 11th International Symposium on Software Engin...

引用

11th International symposium on Software Engineering for adaptive and Self-Managing Systems, SEAMS 2016

ISBN: (纸本)9781450341875

The proceedings contain 19 papers. The topics discussed include: reusable self-adaptation through bidirectional programming;automatically hardening a self-adaptive system against uncertainty;data-driven continuous evolution of smart systems;privacy dynamics: learning privacy norms for social software;TARL: modeling topology adaptations for networking applications;adapting heterogeneous devices into an IoT context-aware infrastructure;self-adaptation for cyber-physical systems: a systematic literature review;model problem and testbed for experiments with adaptation in smart cyber-physical systems;assured and correct dynamic update of controllers;towards adaptive compliance;using dynamic adaptive systems in safety-critical domains;and clonal plasticity: a method for decentralized adaptation in multi-agent systems.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Analytical Greedy Control and Q-learning for Optimal Power Management of Plug-in Hybrid Electric Vehicles

Analytical Greedy Control and Q-Learning for Optimal Power M...

引用

ieee symposium Series on Computational Intelligence

作者： Chang Liu Yi Lu Murphey Department of Electrical and Computer Engineering University of Michigan - Dearborn Dearborn MI USA University of Michigan Dearborn Dearborn MI US

In this paper, we present two solutions for achieving the optimal control of PHEVs on short trips. We prove, mathematically, that a greedy control policy is optimal for those short trips where the battery State-of-Charge (SoC) will not drop below its minimum threshold level. A closed-form greedy control solution is derived from the PHEV powertrain model. Furthermore, we provide a Q-learning based approach which has the capability of in-vehicle learning and is model-free. Our algorithm, combining the Neuro-dynamic programming (NDP) with estimated future trip information, can robustly converge to the optimal policy on both fixed and randomly selected drive cycles.

关键词： Plug-in Hybrid Electric Vehicles (PHEVs) Power Management Energy Optimization reinforcement learning Q-learning

来源：评论

学校读者我要写书评

暂无评论

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS Special Section on Deep reinforcement learning and adaptive dynamic programming

引用

ieee Transactions on Neural Networks and learning Systems 2016年第12期27卷 2776-2776页

Prospective authors are requested to submit new, unpublished manuscripts for inclusion in the upcoming event described in this call for papers.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：