检索结果-内蒙古大学图书馆

2008 American Control Conference (ACC), vol.11

作者： Tao Li Dongbin Zhao Jianqiang Yi Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China University of Arizona Tucson USA

In traditional adaptive dynamic programming (ADP), only one step estimate is considered for training process, Thus, learning efficiency is lower. If more steps estimates are included, learning process will be speed up. Eligibility traces record the past and current gradients of estimation. It can be used to work with ADP for speeding up learning. In this paper, Heuristic dynamic programming (HDP) which is a typical structure of ADP is considered. An algorithm, HDP(lambda), integrating HDP with eligibility traces is presented. The algorithm is illustrated from both forward view and back view for clear comprehension. Equivalency of two views is analyzed. Furthermore, differences between HDP and HDP(lambda) are considered from both aspects of theoretic analysis and simulation results. The problem of balancing a pendulum robot (pendubot) is adopted as a benchmark. The results indicate that compared to HDP, HDP(lambda) shows higher convergence rate and training efficiency.

关键词： Heuristic dynamic programming adaptive dynamic programming Eligibility trace Pendulum robot

来源：评论

学校读者我要写书评

暂无评论

Antomated web navigation using multiagent adaptive dynamic programming

引用

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS 2003年第3期33卷 412-417页

作者： Varghese, J Mukhopadhyay, S Indiana Univ Purdue IN 46202 USA

Today a massive amount of information available on the WWW often makes searching for information of interest a long and tedious task. Chasing hyperlinks to rind relevant information may be daunting. To overcome such a problem, a learning system, cognizant of a user's interests, can be employed to automatically search for and retrieve relevant information by following appropriate hyperlinks. In this paper, we describe the design of such a learning system for automated Web navigation using adaptive dynamic programming methods. To improve the performance of the learning system, we introduce the notion of multiple model-based learning agents operating in parallel, and describe methods for combining their models. Experimental results on the WWW navigation problem are presented to indicate that combining multiple learning agents, relying on user feedback, is a promising direction to improve learning speed in automated WWW navigation.

关键词： adaptive dynamic programming multi-agent learning relevance feedback vector-space model Web navigation

来源：评论

学校读者我要写书评

暂无评论

An adaptive dynamic programming algorithm for dynamic fleet management, II: Multiperiod travel times

引用

TRANSPORTATION SCIENCE 2002年第1期36卷 40-54页

作者： Godfrey, GA Powell, WB Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

In a companion paper (Godfrey and Powell 2002) we introduced an adaptive dynamic programming algorithm for stochastic dynamic resource allocation problems, which arise in the context of logistics and distribution, fleet management, and other allocation problems. The method depends on estimating separable nonlinear approximations of value functions, using a dynamic programming framework. That paper considered only the case in which the time to complete an action was always a single time period. Experiments with this technique quickly showed that when the basic algorithm was applied to problems with multiperiod travel times, the results were very poor. In this paper, we illustrate why this behavior arose, and propose a modified algorithm that addresses the issue. Experimental work demonstrates that the modified algorithm works on problems with multiperiod travel times, with results that are almost as good as the original algorithm applied to single period travel times.

关键词： .DETERMINISTIC approximations separable time period travel times value functions adaptive dynamic programming eet management applied to problems rolling-horizon using the CAVE

来源：评论

学校读者我要写书评

暂无评论

Multi-agent adaptive dynamic programming

引用

Mexican International Conference on Artificial Intelligence (MICAI 2000)

作者： Mukhopadhyay, S Varghese, J Indiana Univ Purdue Univ Dept Comp & Informat Sci Indianapolis IN 46202 USA

ISBN: (纸本)3540673547

dynamic programming offers an exact, general solution method for completely known sequential decision problems, formulated as Markov Decision Processes (MDP), with a finite number of states. Recently, there has been a great amount of interest in the adaptive version of the problem, where the task to be solved is not completely known a priori. In such a case, an agent has to acquire the necessary knowledge through learning, while simultaneously solving the optimal control or decision problem. A large variety of algorithms, variously known as adaptive dynamic programming (ADP) or Reinforcement Learning (RL), has been proposed in the literature. However, almost invariably such algorithms suffer from slow convergence in terms of the number of experiments needed. In this paper Re investigate how the learning speed can be considerably improved by exploiting and combining knowledge accumulated by multiple agents. These agents operate in the same task environment but follow possibly different trajectories. We discuss methods of combining the knowledge structures associated with the multiple agents and different strategies (with varying overheads) for knowledge communication between agents. Results of simulation experiments are also presented to indicate that combining multiple learning agents is a promising direction to improve learning speed. The method also performs significantly better than some of the fastest MDP learning algorithms such as the prioritized sweeping.

关键词： adaptive dynamic programming Markov decision process reinforcement learning multiple learning agents knowledge combining

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：