检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

140 篇 会议
7 篇 期刊文献

馆藏范围

147 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

71 篇 工学
- 66 篇 计算机科学与技术...
- 15 篇 软件工程
- 11 篇 电气工程
- 9 篇 控制科学与工程
- 2 篇 仪器科学与技术
- 2 篇 信息与通信工程
- 1 篇 力学（可授工学、理...
- 1 篇 机械工程
- 1 篇 建筑学
11 篇 理学
- 10 篇 数学
- 2 篇 系统科学
- 2 篇 统计学（可授理学、...
5 篇 管理学
- 4 篇 管理科学与工程(可...
- 3 篇 工商管理
- 1 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学

主题

76 篇 dynamic programm...
39 篇 learning
26 篇 optimal control
25 篇 reinforcement le...
15 篇 function approxi...
15 篇 control systems
14 篇 approximation al...
14 篇 equations
13 篇 neural networks
13 篇 stochastic proce...
12 篇 convergence
10 篇 state-space meth...
10 篇 cost function
9 篇 mathematical mod...
8 篇 trajectory
8 篇 approximation me...
7 篇 approximate dyna...
7 篇 algorithm design...
7 篇 adaptive control
7 篇 heuristic algori...

机构

4 篇 school of inform...
4 篇 department of in...
3 篇 department of el...
3 篇 northeastern uni...
3 篇 univ texas autom...
3 篇 arizona state un...
3 篇 robotics institu...
3 篇 univ illinois de...
2 篇 princeton univ d...
2 篇 national science...
2 篇 college of mecha...
2 篇 key laboratory o...
2 篇 univ utrecht dep...
2 篇 department of op...
1 篇 inria
1 篇 computational le...
1 篇 school of automa...
1 篇 univ cincinnati ...
1 篇 toyota technol c...
1 篇 neuroinformatics...

作者

5 篇 liu derong
4 篇 xu xin
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 marco a. wiering
4 篇 zhang huaguang
4 篇 si jennie
4 篇 derong liu
3 篇 hado van hasselt
3 篇 lewis frank l.
3 篇 dongbin zhao
3 篇 powell warren b.
3 篇 warren b. powell
3 篇 riedmiller marti...
2 篇 manuel loth
2 篇 van hasselt hado
2 篇 preux philippe
2 篇 hu dewen
2 篇 jennie si
2 篇 philippe preux

语言

142 篇 英文
5 篇 其他

检索条件"任意字段=2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007"

共 147 条记录，以下是111-120 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design

Using ADP to Understand and Replicate Brain Intelligence: th...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Paul J. Werbos National Science Foundation Arlington VA USA

Since the 1960's the author proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptive dynamic programming (ADP) - like "reinforcement learning" but based on approximating the Bellman equation and allowing the controller to know its utility function. Growing empirical evidence on the brain supports this approach. Adaptive critic systems now meet tough engineering challenges and provide a kind of first-generation model of the brain. Lewis, Prokhorov and myself have early second-generation work. Mammal brains possess three core capabilities - creativity/imagination and ways to manage spatial and temporal complexity - even beyond the second generation. This paper reviews previous progress, and describes new tools and approaches to overcome the spatial complexity gap.

关键词： Adaptive systems Intelligent structures Buildings Programmable control Adaptive control dynamic programming learning Equations Control systems Brain modeling

来源：评论

学校读者我要写书评

暂无评论

Two Novel On-policy reinforcement learning Algorithms based on TD(λ)-methods

Two Novel On-policy Reinforcement Learning Algorithms based ...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Hado van Hasselt Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms

关键词： learning automata Neural networks dynamic programming Intelligent systems State estimation Probability distribution Stochastic systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

Model-Based reinforcement learning in Factored-State MDPs

Model-Based Reinforcement Learning in Factored-State MDPs

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Alexander L. Strehl Department of Computer Science Rutgers University Piscataway NJ USA

We consider the problem of learning in a factored-state Markov decision process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Roller for their DBN-E 3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the interval estimation approach to exploration and can be expected to outperform factored Rmax on most domains

关键词： learning Polynomials Algorithm design and analysis State-space methods dynamic programming Computer science Mathematical model Performance analysis Bayesian methods Linear approximation

来源：评论

学校读者我要写书评

暂无评论

Efficient learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem

Efficient Learning in Cellular Simultaneous Recurrent Neural...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Roman Ilin Robert Kozma Paul J. Werbos Department of Mathematical Sciences University of Memphis Memphis TN USA National Science Foundation Arlington VA USA

Cellular simultaneous recurrent neural networks (SRN) show great promise in solving complex function approximation problems. In particular, approximate dynamic programming is an important application area where SRNs have significant potential advantages compared to other approximation methods. learning in SRNs, however, proved to be a notoriously difficult problem, which prevented their broader use. This paper introduces an extended Kalman filter approach to train SRNs. Using the two-dimensional maze navigation problem as a testbed, we illustrate the operation of the method and demonstrate its benefits in generalization and testing performance

关键词： Cellular networks Recurrent neural networks Motion planning Function approximation dynamic programming Electronic mail Testing Cost function Equations Feedforward systems

来源：评论

学校读者我要写书评

暂无评论

A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning

A Theoretical Analysis of Cooperative Behavior in Multi-agen...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Ludo Waltman Uzay Kaymak Erasmus Erasmus University Rotterdam Rotterdam Netherlands

A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This paper provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner's dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisoner's dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results presented in this paper are quite robust to violations of the underlying assumptions

关键词： Helium Oligopoly Nash equilibrium dynamic programming learning Environmental economics Robustness Performance analysis Algorithm design and analysis Microeconomics

来源：评论

学校读者我要写书评

暂无评论

Using Reward-weighted Regression for reinforcement learning of Task Space Control

Using Reward-weighted Regression for Reinforcement Learning ...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Jan Peters Stefan Schaal University of Southern California Los Angeles CA USA

Many robot control problems of practical importance, including task or operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots

关键词： learning Orbital robotics Robot kinematics Control systems Robot control Optimal control Robot sensing systems Manipulators Anthropomorphism Acceleration

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Alexandre Sztykgold Gilles Coppin Olivier Hudry GET/ENST-Bretagne LUSSI Department France GET/ENST Computer Science Department France

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to temporal differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues

关键词： Game theory dynamic programming learning Decision making Computer science Military computing Stochastic processes Shortest path problem Qualifications Graph theory

来源：评论

学校读者我要写书评

暂无评论

Fitted Q Iteration with CMACs

Fitted Q Iteration with CMACs

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Stephan Timmer Martin Riedmiller Department of Computer Science University of Osnabrück Osnabruck Germany

A major issue in model-free reinforcement learning is how to efficiently exploit the data collected by an exploration strategy. This is especially important in case of continuous, high dimensional state spaces, since it is impossible to explore such spaces exhaustively. A simple but promising approach is to fix the number of state transitions which are sampled from the underlying Markov decision process. For several kernel-based learning algorithms there exist convergence proofs and notable empirical results, if a fixed set of transition instances is used. In this article, we will analyze how function approximators similar to the CMAC-architecture can be combined with this idea. We will show both analytically and empirically the potential power of the CMAC architecture combined with an offline version of Q-learning

关键词： Inference algorithms State-space methods Convergence Computer science Algorithm design and analysis dynamic programming Space exploration Interleaved codes Supervised learning Sampling methods

来源：评论

学校读者我要写书评

暂无评论

Opposition-Based reinforcement learning in the Management of Water Resources

Opposition-Based Reinforcement Learning in the Management of...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： M. Mahootchi H. R. Tizhoosh K. Ponnambalam Systems Design Engineering University of Waterloo Waterloo ONT Canada

Opposition-based learning (OBL) is a new scheme in machine intelligence. In this paper, an OBL version Q-learning which exploits opposite quantities to accelerate the learning is used for management of single reservoir operations. In this method, an agent takes an action, receives reward, and updates its knowledge in terms of action-value functions. Furthermore, the transition function which is the balance equation in the optimization model determines the next state and updates the action-value function pertinent to opposite action. Two type of opposite actions will be defined. It will be demonstrated that using OBL can significantly improve the efficiency of the operating policy within limited iterations. It is also shown that this technique is more robust than Q-learning

关键词： Resource management Water resources Reservoirs dynamic programming Stochastic processes Machine learning Neural networks Design engineering Systems engineering and theory Machine intelligence

来源：评论

学校读者我要写书评

暂无评论

Value-Iteration Based Fitted Policy Iteration: learning with a Single Trajectory

Value-Iteration Based Fitted Policy Iteration: Learning with...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Andras Antos Csaba Szepesvari Remi Munos Computer and Automation Research Inst. Hungarian Academy of Sciences Budapest Hungary University of Alberta Edmonton Canada SequeL team INRIA Futurs University of Lille (USTL) Villeneuve d'Ascq France

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian decision problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian decision problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

关键词： learning Training data Algorithm design and analysis dynamic programming Automation Polynomials State-space methods Control systems Interleaved codes Extraterrestrial measurements

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共15页 << < 6 7 8 9 10 11 12 13 14 15 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：