检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Tom Erez William D. Smart Washington University Saint Louis MO USA

This paper proposes a novel approach for coupling perception and action through minimax dynamic programming. We tackle domains where the agent has some control over the observation process (e.g. via the manipulation of some sensors), and show how to transform the system so that an optimal control solution can be sought with standard algorithms. We demonstrate our method in a toy domain, where an agent guides two point masses (ldquohandsrdquo) to a target in a 2D scene with obstacles. The agent can direct the gaze of a virtual ldquoeyerdquo to different parts of the scene, thereby reducing the observation noise for elements of the scene in that vicinity and improving the quality of feedback control. In this manner, motor control of the eye allots attentional resources. We propose a unified framework that treats both perception and action as interdependent components of the same optimal control task. The implications of uncertainty on task performance are uncovered by deploying an adversary whose strength to do harm is proportional to the instantaneous level of state uncertainty. We transform the partially-observable system to a fully-observable by coupling the state dynamics with a state-estimation filter, and so augment the state space to include an explicit representation of the instantaneous state uncertainty. The augmented system is high-dimensional, but through minimax differential dynamic programming, a local method that is less susceptible to the curse of dimensionality, we are able to solve for the optimal control of the hands and the eye at the same time, allowing for the emergence of interesting phenomena such as hand-eye coordination, saccades and smooth pursuit.

关键词： Minimax techniques Optimal control Layout Uncertainty dynamic programming Control systems Sensor systems Noise reduction Feedback control Motor drives

来源：评论

学校读者我要写书评

暂无评论

Cooperative retransmissions using Markov decision process with reinforcement learning

Cooperative retransmissions using Markov decision process wi...

引用

ieee International symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)

作者： Ghasem Naddafzadeh Shirazi Peng-Yong Kong Chen-Khong Tham Institute for Infocomm Research Agency for Science Technology & Research (A*STAR) Singapore

In cooperative retransmissions, nodes with better channel qualities help other nodes in retransmitting a failed packet to its intended destination. In this paper, we propose a cooperative retransmission scheme where each node makes local decision to cooperate or not to cooperate at what transmission power using a Markov decision process with reinforcement learning. With the reinforcement learning, the proposed scheme avoids solving an Markov decision process with a large number of states. Through simulations, we show that the proposed scheme is robust to collisions, is scalable with regard to the network size, and can provide significant cooperative diversity.

关键词： learning Relays Throughput Wireless networks Transmitters Automatic repeat request dynamic programming Bismuth Poles and towers Robustness

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 2007 ieee symposium on Approximate dynamic programming and reinforcement learning (ADPRL 2007)

Proceedings of the 2007 IEEE Symposium on Approximate Dynami...

引用

2007 ieee symposium on Approximate dynamic programming and reinforcement learning, ADPRL 2007

ISBN: (纸本)1424407060

The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcement learning approach in two-level intelligent control of 3-DOF robot manipulators;knowledge transfer using local features;particle swarm optimization adaptive dynamic programming;discrete-time nonlinear HJB solution using approximation dynamic programming: convergence proof;dual representations for dynamic programming and reinforcement learning;an optimal ADP algorithm for a high-dimensional stochastic control problem;convergence of model-based temporal difference learning for control;the effect of bootstrapping in multi-automata reinforcement learning;and a theoretical analysis of cooperative behavior in multi-agent Q-learning.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

adaptive autonomous control using online value iteration with gaussian processes

Adaptive autonomous control using online value iteration wit...

引用

ieee International Conference on Robotics and Automation (ICRA)

作者： Axel Rottmann Wolfram Burgard Department of Computer Science University of Freiburg Freiburg im Breisgau Germany

In this paper, we present a novel approach to controlling a robotic system online from scratch based on the reinforcement learning principle. In contrast to other approaches, our method learns the system dynamics and the value function separately, which permits to identify the individual characteristics and is, therefore, easily adaptable to changing conditions. The major problem in the context of learning control policies lies in high-dimensional state and action spaces, that needs to be explored in order to identify the optimal policy. In this paper, we propose an approach that learns the system dynamics and the value function in an alternating fashion based on Gaussian process models. Additionally, to reduce computation time and to make the system applicable to online learning, we present an efficient sparsification method. In experiments carried out with a real miniature blimp we demonstrate that our approach can learn height control online. Further results obtained with an inverted pendulum show that our method requires less data to achieve the same performance as an off-line learning approach.

关键词： Programmable control adaptive control Gaussian processes Control systems Optimal control dynamic programming Robotics and automation Automatic control learning systems Runtime

来源：评论

学校读者我要写书评

暂无评论

Special issue on adaptive dynamic programming and reinforcement learning in feedback control

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2008年第4期38卷 896-897页

作者： Lewis, F. L. Liu, Derong Lendaris, George G. Univ Texas Arlington Dept Elect Engn Automat & Robot Res Inst Arlington TX 76019 USA Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA Portland State Univ Dept Elect & Comp Engn Syst Sci Grad Program Portland OR 97207 USA

The 18 papers in this special issue focus on adaptive dynamic programming and reinforcement learning in feedback control.

关键词： Programmable control adaptive control dynamic programming learning Feedback control Riccati equations Control systems Optimal control Neural networks Function approximation

来源：评论

学校读者我要写书评

暂无评论

2007 ieee international symposium on approximate dynamic programming and reinforcement learning

Proceedings of the 2007 IEEE Symposium on Approximate Dynami...

引用

Proceedings of the 2007 ieee symposium on Approximate dynamic programming and reinforcement learning, ADPRL 2007 2007年

作者： Liu, Derong Munos, Remi Si, Jennie Wunsch, II, Donald C.

No abstract available

ISBN: (纸本)1424407060

No abstract available

关键词：

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning of adaptive Longitudinal Vehicle Control for dynamic Collaborative Driving

Reinforcement Learning of Adaptive Longitudinal Vehicle Cont...

引用

ieee Intelligent Vehicles symposium

作者： Ng, Luke Clark, Christopher M. Huissoon, Jan P. Univ Waterloo Dept Mech & Mechatron Engn Waterloo ON N2L 3G1 Canada Calif Polytech State Univ San Luis Obispo Dept Comp Sci San Luis Obispo CA 93407 USA

ISBN: (纸本)9781424425686

dynamic collaborative driving involves the motion coordination of multiple vehicles using shared information from vehicles instrumented to perceive their surroundings in order to improve road usage and safety. A basic requirement of any vehicle participating in dynamic collaborative driving is longitudinal control. Without this capability, higher-level coordination is not possible. This paper focuses on the problem of longitudinal motion control. A detailed nonlinear longitudinal vehicle model which serves as the control system design platform is used to develop a longitudinal adaptive control system based on Monte Carlo reinforcement learning. The results of the reinforcement learning phase and the performance of the adaptive control system for a single automobile as well as the performance in a multi-vehicle platoon is presented.

关键词： adaptive control systems

来源：评论

学校读者我要写书评

暂无评论

adaptive critic-based neurofuzzy controller for the steam generator water level

引用

ieee TRANSACTIONS ON NUCLEAR SCIENCE 2008年第3期55卷 1678-1685页

作者： Fakhrazari, Amin Boroushaki, Mehrdad Sharif Univ Technol Dept Mech Engn Tehran Iran

In this paper, an adaptive critic-based neurofuzzy controller is presented for water level regulation of nuclear steam generators. The problem has been of great concern for many years as the steam generator is a highly nonlinear system showing inverse response dynamics especially at low operating power levels. Fuzzy critic-based learning is a reinforcement learning method based on dynamic programming. The only information available for the critic agent is the system feedback which is interpreted as the last action the controller has performed in the previous state. The signal produced by the critic agent is used alongside the backpropagation of error algorithm to tune online conclusion parts of the fuzzy inference rules. The critic agent here has a proportional-derivative structure and the fuzzy rule base has nine rules. The proposed controller shows satisfactory transient responses, disturbance rejection and robustness to model uncertainty. Its simple design procedure and structure, nominates it as one of the suitable controller designs for the steam generator water level control in nuclear power plant industry.

关键词： adaptive critic-based design fuzzy logic reinforcement learning vertical U-tube steam generator

来源：评论

学校读者我要写书评

暂无评论

Higher level application of ADP: A next phase for the control field?

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2008年第4期38卷 901-912页

作者： Lendaris, George G. Portland State Univ Dept Elect & Comp Engn NW Computat Intelligence Lab Syst Sci Grad Program Portland OR 97207 USA

Two distinguishing features of humanlike control vis-a-vis current technological control are the ability to make use of experience while selecting a control policy for distinct situations and the ability to do so faster and faster as more experience is gained (in contrast to current technological implementations that slow down as more knowledge is stored). The notions of context and context discernment are important to understanding this human ability. Whereas methods known as adaptive control and learning control focus on modifying the design of a controller as changes in context occur, experience-based (EB) control entails selecting a previously designed controller that is appropriate to the current situation. Developing the EB approach entails a shift of the technologist's focus "up a level" away from designing individual (optimal) controllers to that of developing online algorithms that efficiently and effectively select designs from a repository of existing controller solutions. A key component of the notions presented here is that of higher level learning algorithm. This is a new application of reinforcement learning and, in particular, approximate dynamic programming, with its focus shifted to the posited higher level, and is employed, with very promising results. The author's hope for this paper is to inspire and guide future work in this promising area.

关键词： approximate dynamic programming (ADP) artificial intelligence (AI) context context discernment experience-based identification and control (EBIC) neural networks (NNs) optimal control reinforcement learning (RL) system identification (SID)

来源：评论

学校读者我要写书评

暂无评论

Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2008年第4期38卷 994-1001页

作者： Yang, Qinmin Vance, Jonathan Blake Jagannathan, S. Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA

A nonaffine discrete-time system represented by the nonlinear autoregressive moving average with eXogenous input (NARMAX) representation with unknown nonlinear system dynamics is considered. An equivalent affinelike representation in terms of the tracking error dynamics, is first obtained from the original nonaffine nonlinear discrete-time system so that reinforcement-learning-based near-optimal neural network (NN) controller can be developed. The control scheme consists of two linearly parameterized NNs. One NN is designated as the critic NN, which approximates a predefined long-term cost function, and an action NN is employed to derive a near-optimal control signal for the system to track a desired trajectory while minimizing the cost function simultaneously. The NN weights are tuned online. By using the standard Lyapunov approach, the stability of the closed-loop system is shown. The net result is a supervised actor-critic NN controller scheme which can be applied to a general nonaffine nonlinear discrete-time system without needing the affinelike representation. Simulation results demonstrate satisfactory performance of the controller.

关键词： adaptive critic adaptive dynamic programming Lyapunov stability neural network control reinforcement learning control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：