检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Sachiko Soga Ichiro Kobayashi Advanced Sciences Ochanomizu University Tokyo

In the case that a robot controller is trained by means of evolutionary computation, the robot will be able to behave sufficiently in the environment where the robot has been trained. However, if the robot is put in an environment which is more complex than a training environment, it cannot behave sufficiently and is required to be trained again so as it fits to the complex environment. Based on this fact, we build a training environment for a robot controller with the partial components of a more complex environment than the training environment and aim to obtain a controller which makes a robot be able to act in the complex environment by only training the controller at a simpler environment. We clarify a way of building a training environment which functions effectively for training a robot controller and discuss how much training is necessary in the training environment for a robot to be able to behave under a more complex environment.

关键词： Robot sensing systems Clocks Biological cells Silicon

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning in the game of Othello: learning against a fixed opponent and learning from self-play

Reinforcement learning in the game of Othello: Learning agai...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Michiel van der Ree Marco Wiering Faculty of Mathematics and Natural Sciences University of Groningen Institute of Artificial Intelligence and Cognitive Engineering The Netherlands

This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learn to play the game of Othello. The three strategies that are compared are: learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed opponent while learning from the opponent's moves as well. These issues are considered for the algorithms Q-learning, Sarsa and TD-learning. These three reinforcement learning algorithms are combined with multi-layer perceptrons and trained and tested against three fixed opponents. It is found that the best strategy of learning differs per algorithm. Q-learning and Sarsa perform best when trained against the fixed opponent they are also tested against, whereas TD-learning performs best when trained through self-play. Surprisingly, Q-learning and Sarsa outperform TD-learning against the stronger fixed opponents, when all methods use their best strategy. learning from the opponent's moves as well leads to worse results compared to learning only from the learning agent's own moves.

关键词： Games Training learning (artificial intelligence) Artificial neural networks Heuristic algorithms Testing

来源：评论

学校读者我要写书评

暂无评论

Optimistic planning for continuous-action deterministic systems

Optimistic planning for continuous-action deterministic syst...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Lucian Buşoniu Alexander Daniels Rémi Munos Robert Babuška Department of Automation Technical University of Cluj-Napoca Romania France DCSC Delft University of Technology the Netherlands Team SequeL INRIA Lille-Nord Europe France

We consider the class of online planning algorithms for optimal control, which compared to dynamic programming are relatively unaffected by large state dimensionality. We introduce a novel planning algorithm called SOOP that works for deterministic systems with continuous states and actions. SOOP is the first method to explore the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. SOOP can be used parameter-free at the cost of more model calls, but we also propose a more practical variant tuned by a parameter α, which balances finer discretization with longer planning horizons. Experiments on three problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization.

关键词： Planning Upper bound Optimization dynamic programming Measurement Heuristic algorithms Aerospace electronics

来源：评论

学校读者我要写书评

暂无评论

Delayed insertion and rule effect moderation of domain knowledge for reinforcement learning

Delayed insertion and rule effect moderation of domain knowl...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Teck-Hou Teng Ah-Hwee Tan School of Computer Engineering Center for Computational Intelligence School of Computer Engineering Nanyang Technological University

Though not a fundamental pre-requisite to efficient machine learning, insertion of domain knowledge into adaptive virtual agent is nonetheless known to improve learning efficiency and reduce model complexity. Conventionally, domain knowledge is inserted prior to learning. Despite being effective, such approach may not always be feasible. Firstly, the effect of domain knowledge is assumed and can be inaccurate. Also, domain knowledge may not be available prior to learning. In addition, the insertion of domain knowledge can frame learning and hamper the discovery of more effective knowledge. Therefore, this work advances the use of domain knowledge by proposing to delay the insertion and moderate the effect of domain knowledge to reduce the framing effect while still benefiting from the use of domain knowledge. Using a non-trivial pursuit-evasion problem domain, experiments are first conducted to illustrate the impact of domain knowledge with different degrees of truth. The next set of experiments illustrates how delayed insertion of such domain knowledge can impact learning. The final set of experiments is conducted to illustrate how delaying the insertion and moderating the assumed effect of domain knowledge can ensure the robustness and versatility of reinforcement learning.

关键词： Vectors Knowledge engineering learning (artificial intelligence) Adaptation models Educational institutions Computational modeling Neural networks

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs

Reinforcement learning to train Ms. Pac-Man using higher-ord...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Luuk Bom Ruud Henken Marco Wiering Faculty of Mathematics and Natural Sciences University of Groningen The Netherlands

reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic environments. In this paper we will study the use of reinforcement learning on the popular arcade video game Ms. Pac-Man. In order to let Ms. Pac-Man quickly learn, we designed particular smart feature extraction algorithms that produce higher-order inputs from the game-state. These inputs are then given to a neural network that is trained using Q-learning. We constructed higher-order features which are relative to the action of Ms. Pac-Man. These relative inputs are then given to a single neural network which sequentially propagates the action-relative inputs to obtain the different Q-values of different actions. The experimental results show that this approach allows the use of only 7 input units in the neural network, while still quickly obtaining very good playing behavior. Furthermore, the experiments show that our approach enables Ms. Pac-Man to successfully transfer its learned policy to a different maze on which it was not trained before.

关键词： Games learning (artificial intelligence) Biological neural networks Neurons Heuristic algorithms Training

来源：评论

学校读者我要写书评

暂无评论

Analyzing collective behavior in evolutionary swarm robotic systems based on an ethological approach

Analyzing collective behavior in evolutionary swarm robotic ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Toshiyuki Yasuda Nanami Wada Kazuhiro Ohkura Yoshiyuki Matsumura Graduate School of Engineering Hiroshima University Higashi-Hiroshima JAPAN Faculty of Textile Science and Technology Shinshu University Ueda Nagano JAPAN

Swarm robotic systems are a type of multi-robot systems which generally consist of many homogeneous autonomous robots without any type of global controllers. Swarm robotics aims at designing desired collective behaviors through many interactions with other robots or their environment. Since a robotic swarm is controlled by an emergent way such as a result of self-organization by using robot learning or artificial evolution, no method has been known to grasp the macroscopic collective behavior in a practical sense, according to the best of our knowledge. In this paper, we propose a novel method for analyzing the collective behavior by introducing the concept of behavioral sequence, which stems from ethology. Analysis about behavioral sequence reveals the transition of robot's action from the viewpoint of specialization and helps us to understand the role of subgroups in a robotic swarm. Applying this method, we observe collective behavior in a foraging task of autonomous mobile robots.

关键词： Robot kinematics Robot sensing systems Mobile robots Vectors Resource management dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Robust adaptive dynamic programming With an Application to Power Systems

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2013年第7期24卷 1150-1156页

作者： Jiang, Yu Jiang, Zhong-Ping NYU Polytech Inst Dept Elect & Comp Engn Brooklyn NY 11201 USA

This brief presents a novel framework of robust adaptive dynamic programming (robust-ADP) aimed at computing globally stabilizing and suboptimal control policies in the presence of dynamic uncertainties. A key strategy is to integrate ADP theory with techniques in modern nonlinear control with a unique objective of filling up a gap in the past literature of ADP without taking into account dynamic uncertainties. Neither the system dynamics nor the system order are required to be precisely known. As an illustrative example, the computational algorithm is applied to the controller design of a two-machine power system.

关键词： Nonlinear uncertain systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Cooperative off-policy prediction of Markov decision processes in adaptive networks

Cooperative off-policy prediction of Markov decision process...

引用

2013 38th ieee International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013

作者： Macua, Sergio Valcarcel Chen, Jianshu Zazo, Santiago Sayed, Ali H. Escuela Técnica Superior de Ingenieros de Telecomunicación Universidad Politécnica de Madrid Madrid 28040 Spain Department of Electrical Engineering University of California Los Angeles CA 90095 United States

ISBN: (纸本)9781479903566

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation. © 2013 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Optimal tracking control scheme for discrete-time nonlinear systems with approximation errors

Optimal tracking control scheme for discrete-time nonlinear ...

引用

10th International symposium on Neural Networks, ISNN 2013

作者： Wei, Qinglai Liu, Derong State Key Laboratory of Management and Control for Complex Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 China

ISBN: (纸本)9783642390678

In this paper, we aim to solve an infinite-time optimal tracking control problem for a class of discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) algorithm. When the iterative tracking control law and the iterative performance index function in each iteration cannot be accurately obtained, a new convergence analysis method is developed to obtain the convergence conditions of the iterative ADP algorithm according to the properties of the finite approximation errors. If the convergence conditions are satisfied, it is shown that the iterative performance index functions converge to a finite neighborhood of the greatest lower bound of all performance index functions under some mild assumptions. Neural networks are used to approximate the performance index function and compute the optimal tracking control policy, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, a simulation example is given to illustrate the performance of the present method. © 2013 Springer-Verlag Berlin Heidelberg.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

COOPERATIVE OFF-POLICY PREDICTION OF MARKOV DECISION PROCESSES IN adaptive NETWORKS

COOPERATIVE OFF-POLICY PREDICTION OF MARKOV DECISION PROCESS...

引用

ieee International Conference on Acoustics, Speech, and Signal Processing

作者： Sergio Valcarcel Macua Jianshu Chen Santiago Zazo Ali H. Sayed Escuela Tecnica Superior de Ingenieros de Telecomunicacion Universidad Politecnica de Madrid Madrid 28040 Spain Department of Electrical Engineering University of California Los Angeles CA 90095 USA

ISBN: (纸本)9781479903573

关键词： adaptive networks dynamic programming diffusion strategies gradient temporal difference mean-square-error reinforcement learning Mean square error learning (artificial intelligence) dynamic programming adaptive networking Network Error of prediction Markov chain learning Agents

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：