检索结果-内蒙古大学图书馆

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Cai, Yifan Yang, Simon X. Xu, Xin Univ Guelph Sch Engn Guelph ON N1G 2W1 Canada Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9781467359252

Effective cooperation of multi-robots in unknown environments is essential in many robotic applications, such as environment exploration and target searching. In this paper, a combined hierarchical reinforcement learning approach, together with a designed cooperation strategy, is proposed for the real-time cooperation of multi-robots in completely unknown environments. Unlike other algorithms that need an explicit environment model or select parameters by trial and error, the proposed cooperation method obtains all the required parameters automatically through learning. By integrating segmental options with the traditional MAXQ algorithm, the cooperation hierarchy is built. In new tasks, the designed cooperation method can control the multi-robot system to complete the task effectively. The simulation results demonstrate that the proposed scheme is able to effectively and efficiently lead a team of robots to cooperatively accomplish target searching tasks in completely unknown environments.

关键词： Hierarchical reinforcement learning multi-robot cooperation complex unknown environment target searching

来源：评论

学校读者我要写书评

暂无评论

Optimistic Planning for Continuous-Action Deterministic Systems

Optimistic Planning for Continuous-Action Deterministic Syst...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Busoniu, Lucian Daniels, Alexander Munos, Remi Babuska, Robert Univ Lorraine CRAN UMR 7039 Nancy France CNRS CRAN UMR 7039 Nancy France Delft Univ Technol DCSC Delft Netherlands INRIA Lille Nord Europe Team SequeL Lille France

ISBN: (纸本)9781467359252

We consider the class of online planning algorithms for optimal control, which compared to dynamic programming are relatively unaffected by large state dimensionality. We introduce a novel planning algorithm called SOOP that works for deterministic systems with continuous states and actions. SOOP is the first method to explore the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. SOOP can be used parameter-free at the cost of more model calls, but we also propose a more practical variant tuned by a parameter a, which balances finer discretization with longer planning horizons. Experiments on three problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization.

关键词： Solution space dynamic programming Smoothness deterministic Optimal control continuous action Discretization Horizon Planning

来源：评论

学校读者我要写书评

暂无评论

Delayed Insertion and Rule Effect Moderation of Domain Knowledge for reinforcement learning

Delayed Insertion and Rule Effect Moderation of Domain Knowl...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Teng, Teck-Hou Tan, Ah-Hwee Nanyang Technol Univ Sch Comp Engn Ctr Computat Intelligence Singapore Singapore Nanyang Technol Univ Sch Comp Engn Singapore Singapore

ISBN: (纸本)9781467359252

Though not a fundamental pre-requisite to efficient machine learning, insertion of domain knowledge into adaptive virtual agent is nonetheless known to improve learning efficiency and reduce model complexity. Conventionally, domain knowledge is inserted prior to learning. Despite being effective, such approach may not always be feasible. Firstly, the effect of domain knowledge is assumed and can be inaccurate. Also, domain knowledge may not be available prior to learning. In addition, the insertion of domain knowledge can frame learning and hamper the discovery of more effective knowledge. Therefore, this work advances the use of domain knowledge by proposing to delay the insertion and moderate the effect of domain knowledge to reduce the framing effect while still benefiting from the use of domain knowledge. Using a non-trivial pursuit-evasion problem domain, experiments are first conducted to illustrate the impact of domain knowledge with different degrees of truth. The next set of experiments illustrates how delayed insertion of such domain knowledge can impact learning. The final set of experiments is conducted to illustrate how delaying the insertion and moderating the assumed effect of domain knowledge can ensure the robustness and versatility of reinforcement learning.

关键词： domain knowledge learning (artificial intelligence) final sets learning Insertion Mutation

来源：评论

学校读者我要写书评

暂无评论

The Second Order Temporal Difference Error for Sarsa(λ)

The Second Order Temporal Difference Error for Sarsa(λ)

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Fu, Qiming Liu, Quan Xiao, Fei Chen, Guixin Soochow Univ Dept Comp Sci & Technol Suzhou Peoples R China

ISBN: (纸本)9781467359252

Traditional reinforcement learning algorithms, such as Q-learning, Q(lambda), Sarsa, and Sarsa(lambda), update the action value function using temporal difference (TD) error, which is computed by the last action value function. From the perspective of the TD error, and with respect to the problems of low efficiency and slow convergence of the traditional Sarsa(lambda) algorithm, this paper defines the nth order TD Error, applies it in the traditional Sarsa(lambda) algorithm, and develops a fast Sarsa(lambda) algorithm based on the 2nd order TD Error. The algorithm adjusts the Q value with the second-order TD Error and broadcasts the TD Error into the whole state-action space, which speeds up the convergence of the algorithm. This paper also analyzes the convergence rate, and under the condition of one-step update, the results show that the number of iteration depends primarily on gamma, epsilon. Finally, using the proposed algorithm on the traditional reinforcement learning problems, the results show that the algorithm has both a faster convergence rate and better convergence performance.

关键词： reinforcement learning Markov Decision Process Second Order TD Error Eligibility Trace Sarsa(lambda) Algorithm

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning to Train Ms. Pac-Man Using Higher-order Action-relative Inputs

Reinforcement Learning to Train Ms. Pac-Man Using Higher-ord...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Bom, Luuk Henken, Ruud Wiering, Marco Univ Groningen Inst Artificial Intelligence & Cognit Engn Fac Math & Nat Sci NL-9700 AB Groningen Netherlands

ISBN: (纸本)9781467359252

reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic environments. In this paper we will study the use of reinforcement learning on the popular arcade video game Ms. Pac-Man. In order to let Ms. Pac-Man quickly learn, we designed particular smart feature extraction algorithms that produce higher-order inputs from the game-state. These inputs are then given to a neural network that is trained using Q-learning. We constructed higher-order features which are relative to the action of Ms. Pac-Man. These relative inputs are then given to a single neural network which sequentially propagates the action-relative inputs to obtain the different Q-values of different actions. The experimental results show that this approach allows the use of only 7 input units in the neural network, while still quickly obtaining very good playing behavior. Furthermore, the experiments show that our approach enables Ms. Pac-Man to successfully transfer its learned policy to a different maze on which it was not trained before.

关键词： learning (artificial intelligence) higher-order learning Neural network MILLISECOND

来源：评论

学校读者我要写书评

暂无评论

A novel approach for constructing basis functions in approximate dynamic programming for feedback control

A novel approach for constructing basis functions in approxi...

引用

2013 4th ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2013

作者： Wang, Jian Huang, Zhenhua Xu, Xin College of Mechatronics and Automation National University of Defense Tech Changsha 410073 China Xi'An Air Force Military Representative Office Xi'an China

ISBN: (纸本)9781467359252

This paper presents a novel approach for constructing basis functions in approximate dynamic programming (ADP) through the locally linear embedding (LLE) process. It considers the experience (sample) data as a high-dimensional space and the basis functions to be solved as a low-dimensional space. Through mapping the high-dimensional data into a single global coordinate system of lower dimensionality, the solved basis functions in low-dimensional space have the property that nearby experience data in the high dimensional space remain nearby and similarly co-located with respect to one in the low dimensional space. Thus, the obtained basis functions can precisely approximate the real value/action-value function. The simulation results show that the basis functions obtained by LLE can represent the final policy with a higher precision. © 2013 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

adaptive learning in Tracking Control Based on the Dual Critic Network Design

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2013年第6期24卷 913-928页

作者： Ni, Zhen He, Haibo Wen, Jinyu Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Huazhong Univ Sci & Technol Coll Elect Elect & Engn Wuhan 430074 Peoples R China

In this paper, we present a new adaptive dynamic programming approach by integrating a reference network that provides an internal goal representation to help the systems learning and optimization. Specifically, we build the reference network on top of the critic network to form a dual critic network design that contains the detailed internal goal representation to help approximate the value function. This internal goal signal, working as the reinforcement signal for the critic network in our design, is adaptively generated by the reference network and can also be adjusted automatically. In this way, we provide an alternative choice rather than crafting the reinforcement signal manually from prior knowledge. In this paper, we adopt the online action-dependent heuristic dynamic programming (ADHDP) design and provide the detailed design of the dual critic network structure. Detailed Lyapunov stability analysis for our proposed approach is presented to support the proposed structure from a theoretical point of view. Furthermore, we also develop a virtual reality platform to demonstrate the real-time simulation of our approach under different disturbance situations. The overall adaptive learning performance has been tested on two tracking control benchmarks with a tracking filter. For comparative studies, we also present the tracking performance with the typical ADHDP, and the simulation results justify the improved performance with our approach.

关键词： adaptive critic design (ACD) adaptive dynamic programming (ADP) internal goal lyapunov stability analysis online learning reinforcement learning tracking control virtual reality

来源：评论

学校读者我要写书评

暂无评论

Goal Representation Heuristic dynamic programming on Maze Navigation

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2013年第12期24卷 2038-2050页

作者： Ni, Zhen He, Haibo Wen, Jinyu Xu, Xin Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Huazhong Univ Sci & Technol Sch Elect & Elect Engn State Key Lab Adv Electromagnet Engn & Technol Wuhan 430074 Peoples R China Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China

Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely Sarsa(lambda) and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach.

关键词： adaptive dynamic programming goal representation heuristic dynamic programming maze navigation/path planning Markov decision process reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning and approximate dynamic programming for feedback control /

引用

2013年

作者： edited by Frank L. Lewis Derong Liu.

来源：内蒙古大学图书馆图书评论

学校读者我要写书评

暂无评论

Design and real-time implementation of optimal power system wide area system-centric controller based on temporal difference learning

Design and real-time implementation of optimal power system ...

引用

Conference Record of the ieee Industry Applications Society Annual Meeting (IAS)

作者： Reza Yousefian Sukumar Kamalasadan Department of Electrical and Computer Engineering University of North Carolina at Charlotte Charlotte NC

In this paper a new method for designing and implementing coordinated wide area controller architecture is presented and tested using real-time digital simulation on a benchmark two area power system model for improved power system dynamic stability. The algorithm is an optimal Wide Area System-Centric Controller and Observer (WASCCO) based on reinforcement and temporal difference learning which allows the system to learn from interaction and predict future states. The controller design uses a powerful technique of the adaptive critic design (ACD) family called dual heuristic programming (DHP). The DHP controllers training and testing are implemented on the Innovative Integration Picolo card consisting of the TMS320C28335 processor. The main advantage of this design is its ability to learn from the past using eligibility traces and predict the optimal trajectory through temporal difference learning in the format of Receding Horizon Control(RHC). Results on a two area system provides better response compared to conventional schemes.

关键词： Power system stability Generators Artificial neural networks Delays Real-time systems Mathematical model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：