检索结果-内蒙古大学图书馆

49th ieee Conference on Decision and Control (CDC)

作者： Vrabie, Draguna Lewis, Frank Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424477463

This paper presents an Approximate/adaptive dynamic programming (ADP) algorithm that finds online the Nash equilibrium for two-player nonzero-sum differential games with linear dynamics and infinite horizon quadratic cost. Each of the game players is using the procedure of Integral reinforcement learning (IRL) to calculate online the infinite horizon value function that it associates with every given set of feedback control policies. It will be shown that the online algorithm is mathematically equivalent to an offline iterative method, previously introduced in the literature, that solves the set of coupled algebraic Riccati equations (ARE) underlying the game problem using complete knowledge on the system dynamics. Here we show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The two participants in the continuous-time differential game are competing in real-time and the feedback Nash control strategies will be determined based on online measured data from the system. The algorithm is built on interplay between a learning phase, where each of the players is learning online the value that they associate with a given set of play policies, and a policy update step, performed by each of the payers towards decreasing the value of their cost. The players are learning concurrently. The feasibility of the ADP scheme is demonstrated in simulation.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Supervised reinforcement learning for human-like adaptive cruise control

引用

4th International symposium on Computational Intelligence and Industrial Applications, ISCIIA 2010

作者： Hu, Zhaohui Zhao, Dongbin

ISBN: (纸本)9787121113154

This paper proposes a supervised reinforcement learning (SRL) algorithm for the adaptive Cruise Control system (ACC) with human-like driving habit needs, which can be thought of as a dynamic programming problem with stochastic demands. In short, the human-like ACC problem can be deemed as the host vehicle adopts different control parameters (accelerations in the upper controller, brakes and throttles in the bottom controller) in the process of following or other driving situations according to the driver's behavior. We discrete the relative velocity as well as the relative distance to construct the two dimensional state, and map it to a one dimensional state space;discrete the acceleration to generate the action set;design additional velocity improvement shaping reward and distance improvement shaping reward to construct the supervisor. We apply the SRL algorithm to the human-like ACC problem in different scenarios. The results show the higher robustness of the SRL control policy in the human-like driving mode compared with other traditional control methods, and the control system can have sufficient control accuracy in both the velocity and the distance.

关键词： Controllers

来源：评论

学校读者我要写书评

暂无评论

Evaluating supervised machine learning for adapting enterprise DRE systems

Evaluating supervised machine learning for adapting enterpri...

引用

International symposium on Intelligence Information Processing and Trusted Computing

作者： Hoffert, Joe Schmidt, Douglas Vanderbilt University EECS Department Nashville TN United States

ISBN: (纸本)9780769541969

Several adaptation approaches, such as policy-based and reinforcement learning, have been devised to ensure end-to-end quality-of-service (QoS) for enterprise distributed systems in dynamic operating environments. Not all approaches are applicable for distributed real-time and embedded (DRE) systems, however, which have stringent accuracy, timeliness, and development complexity requirements. Supervised machine learning techniques, such as artificial neural networks (ANNs), are a promising approach to address time complexity concerns of adaptive enterprise DRE systems. Likewise, ANNs address the development complexity of adaptive DRE systems by ensuring that adaptations are appropriate for the operating environment. This paper empirically evaluates the accuracy and timeliness of the ANN machine learning technique for environments on which it has been trained. Our results show ANNs are highly accurate in determining correct adaptations and provide predictable time complexity, e.g., with response times less than 6 μseconds. © 2010 ieee.

关键词： Neural networks

来源：评论

学校读者我要写书评

暂无评论

Tutor learning using linear constraints in approximate dynamic programming

Tutor learning using linear constraints in approximate dynam...

引用

48th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2010

作者： Di Castro, Dotan Mannor, Shie Faculty of Electrical Engineering Technion - Israel Institute of Technology 32000 Haifa Israel

ISBN: (纸本)9781424482146

In adaptive control, agents interacting with Markov Decision Processes typically face two types of setups. In the first setup, the environment's model is known and dynamic programming and related methods are used to obtain the optimal control. In the second setup, the environment's model is unknown and reinforcement learning methods are used. In this work we investigate a new setup that is a mix of the two mentioned setups: only part of the environment's model is known and additional information regarding the environment is provided by a tutor. We formalize this problem using linear function approximation in order to overcome the "curse of dimensionality" phenomenon. In addition, using the Envelope Theorem, we show how one can tune the approximation basis in order to get a locally optimal results. Finally, the suggested methods are demonstrated in simulations. ©2010 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

dynamic routing optimization based on real time adaptive delay estimation for wireless networks

Dynamic routing optimization based on real time adaptive del...

引用

15th ieee symposium on Computers and Communications, ISCC 2010

作者： Ziane, Saïda Mellouk, Abdelhamid JUT CreteillVitry 120-122 Rue Paul Armangot 94400 Vitry sur Seine France

ISBN: (纸本)9781424477555

With the wide emergence of real time applications in mobile ad hoc networks, delay guarantees become increasingly required. Many routing protocols are proposed, in the few last years, for improving the overall delay in mobile ad hoc networks. In this paper, we propose an extension of our earlier QoS routing protocol called "AMDR" (adaptive Mean Delay Routing) which is based on an adaptive approach using mean delay estimated proactively by each node. AMDR is built around two modules: Delay Estimation Module and adaptive Routing Module. The first one calculates proactively mean delay at each node without any packets exchange. The second one will then exploit mean delay value. It uses two exploration agents to discover best available routes between a given pair of nodes. Numerical results obtained with NS simulator for different levels of traffic's load show that AMDR improves clearly performances compared to other approaches. © 2010 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A comparative study of urban traffic signal control with reinforcement learning and adaptive dynamic programming

A comparative study of urban traffic signal control with rei...

引用

2010 6th ieee World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010

作者： Dai, Yujie Zhao, Dongbin Yi, Jianqiang Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy of Sciences No.95 Zhongguancun East Road Haidian District Beijing 100190 China

ISBN: (纸本)9781424469178

This paper proposes a new algorithm that employs adaptive dynamic programming(ADP) to solve the distributed control problem of urban traffic with an infinite horizon. Urban traffic congestions lead to a lot of time consumption and exhaust emissions. So alleviating congested situation will have a good impact on both economy and environment. The signal control at urban intersections is an effective and most important way to reduce the traffic jams and collisions. A lot of control theories including traditional mathematical ways and modern artificial intelligent ways have been exploited. ADP is an effective and amiable intelligent control method. We proposed an algorithm to adjust the signal time plan at urban traffic intersections based on ADP theory. Simulations are taken under a microscopic traffic simulation software, TSIS(Traffic Software Integrated System). Several criteria named MOEs(Measures of Effectiveness) are collected to compare with the widely used pre-timed control, actuated control, also with a machine learning method Q-learning control. Results show that ADP control method have a better adaptability to the various traffic simulating real traffic flows. © 2010 ieee.

关键词： Computer software

来源：评论

学校读者我要写书评

暂无评论

A hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming

A hierarchical learning architecture with multiple-goal repr...

引用

ieee International Conference on Networking, Sensing and Control

作者： Haibo He Bo Liu Department of Electrical Computer and Biomedical Engineering University of Rhode Island Kingston RI USA Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken NJ USA

ISBN: (纸本)9781424464500;9781424464531

In this paper we propose a hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming (ADP). The key idea of this architecture is to integrate a reference network to provide the internal reinforcement representation (secondary reinforcement signal) to interact with the operation of the learning system. Such a reference network serves an important role to build the internal goal representations. Furthermore, motivated by recent research in neurobiological and psychology research, the proposed ADP architecture can be designed in a hierarchical way, in which different levels of internal reinforcement signals can be developed to represent multi-level goals for the intelligent system. Detailed system level architecture, learning and adaptation principle, and simulation results are presented in this work to demonstrate the effectiveness of this work.

关键词： dynamic programming learning systems Intelligent robots Recurrent neural networks Signal design Intelligent systems Cost function Control systems Biological neural networks Backpropagation

来源：评论

学校读者我要写书评

暂无评论

Integral reinforcement learning for Online Computation of Feedback Nash Strategies of Nonzero-Sum Differential Games

Integral Reinforcement Learning for Online Computation of Fe...

引用

2010 49th ieee Conference on Decision and Control

作者： Draguna Vrabie Frank Lewis Automation and Robotics Research Institute University of Texas at Arlington 7300 Jack Newell Blvd. S. Fort Worth TX 76118 USA

ISBN: (纸本)9781424477456

关键词： differential games Players System dynamics Nash equilibrium game player Policies Riccati equations learning infinite horizon

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning and adaptive dynamic programming for Feedback Control

引用

ieee CIRCUITS AND SYSTEMS MAGAZINE 2009年第3期9卷 32-50页

作者： Lewis, Frank L. Vrabie, Draguna Univ Texas Arlington Automat & Robot Res Inst Arlington TX USA S China Univ Technol Guangzhou Guangdong Peoples R China Shanghai Jiao Tong Univ Shanghai Peoples R China

Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcement learning can capture notions of optimal behavior occurring in natural systems. We describe mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming. These give us insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.

关键词： learning Programmable control adaptive control dynamic programming Feedback control Organisms Optimal control Control systems Design engineering Systems engineering and theory

来源：评论

学校读者我要写书评

暂无评论

Feature Discovery in Approximate dynamic programming

Feature Discovery in Approximate Dynamic Programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Preux, Philippe Girgin, Sertan Loth, Manuel Univ Lille Lab Informat Fondamentale Lille Comp Sci Lab CNRS Lille France INRIA Paris France

ISBN: (纸本)9781424427611

Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcement learning in particular. Based on our recent work on feature discovery in the context of reinforcement learning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamic programming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：