检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

751 篇 会议
272 篇 期刊文献
4 册 图书

馆藏范围

1,027 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

719 篇 工学
- 523 篇 计算机科学与技术...
- 385 篇 电气工程
- 284 篇 控制科学与工程
- 153 篇 软件工程
- 83 篇 信息与通信工程
- 41 篇 交通运输工程
- 24 篇 仪器科学与技术
- 21 篇 机械工程
- 9 篇 电子科学与技术（可...
- 9 篇 生物工程
- 7 篇 力学（可授工学、理...
- 7 篇 土木工程
- 7 篇 石油与天然气工程
- 6 篇 动力工程及工程热...
- 4 篇 材料科学与工程（可...
- 4 篇 生物医学工程（可授...
- 4 篇 安全科学与工程
- 3 篇 化学工程与技术
- 3 篇 航空宇航科学与技...
120 篇 理学
- 98 篇 数学
- 31 篇 系统科学
- 22 篇 统计学（可授理学、...
- 10 篇 生物学
- 9 篇 物理学
- 5 篇 化学
68 篇 管理学
- 65 篇 管理科学与工程(可...
- 14 篇 工商管理
- 7 篇 图书情报与档案管...
5 篇 经济学
- 4 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 医学
1 篇 教育学

主题

315 篇 reinforcement le...
216 篇 dynamic programm...
206 篇 optimal control
110 篇 adaptive dynamic...
105 篇 adaptive dynamic...
97 篇 learning
88 篇 neural networks
79 篇 heuristic algori...
67 篇 reinforcement le...
58 篇 learning (artifi...
54 篇 nonlinear system...
52 篇 convergence
52 篇 control systems
51 篇 mathematical mod...
48 篇 approximate dyna...
44 篇 approximation al...
43 篇 equations
42 篇 adaptive control
41 篇 cost function
40 篇 artificial neura...

机构

41 篇 chinese acad sci...
27 篇 univ rhode isl d...
17 篇 tianjin univ sch...
16 篇 northeastern uni...
16 篇 univ sci & techn...
16 篇 univ illinois de...
14 篇 beijing normal u...
13 篇 northeastern uni...
13 篇 guangdong univ t...
12 篇 northeastern uni...
9 篇 natl univ def te...
8 篇 ieee
8 篇 univ chinese aca...
7 篇 univ chinese aca...
7 篇 cent south univ ...
7 篇 southern univ sc...
7 篇 beijing univ tec...
6 篇 chinese acad sci...
6 篇 missouri univ sc...
5 篇 nanjing univ pos...

作者

55 篇 liu derong
37 篇 wei qinglai
29 篇 he haibo
22 篇 wang ding
21 篇 xu xin
19 篇 jiang zhong-ping
17 篇 lewis frank l.
17 篇 yang xiong
17 篇 zhang huaguang
17 篇 ni zhen
16 篇 zhao bo
16 篇 gao weinan
14 篇 zhao dongbin
13 篇 zhong xiangnan
12 篇 si jennie
12 篇 derong liu
11 篇 song ruizhuo
10 篇 jagannathan s.
10 篇 dongbin zhao
9 篇 abouheaf mohamme...

语言

970 篇 英文
51 篇 其他
6 篇 中文

检索条件"任意字段=IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning"

共 1027 条记录，以下是891-900 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

A scalable model-free recurrent neural network framework for solving POMDPs

A scalable model-free recurrent neural network framework for...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Liu, Zhenzhen Elhanany, Itamar Univ Tennessee Dept Elect & Comp Engn Knoxville TN 37996 USA

ISBN: (纸本)9781424407064

This paper presents a framework for obtaining an optimal policy in model-free Partially Observable Markov Decision Problems (POMDPs) using a recurrent neural network (RNN). A Q-function approximation approach is taken, utilizing a novel RNN architecture with computation and storage requirements that are dramatically reduced when compared to existing schemes. A scalable online training algorithm, derived from the real-time recurrent learning (RTRL) algorithm, is employed. Moreover, stochastic meta-descent (SMD), an adaptive step size scheme for stochastic gradient-descent problems, is utilized as means of incorporating curvature information to accelerate the learning process. We consider case studies of POMDPs where state information is not directly available to the agent. Particularly, we investigate scenarios in which the agent receives indentical observations for multiple states, thereby relying on temporal dependencies captured by the RNN to obtain the optimal policy. Simulation results illustrate the effectiveness of the approach along with substantial improvement in convergence rate when compared to existing schemes. Index Terms-Recurrent neural networks, real-time recurrent learning (RTRL), constraint optimization.

关键词： Constraint optimization Real-lime recurrent learning (RTRL) Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

Coordinated reinforcement learning for decentralized optimal control

Coordinated reinforcement learning for decentralized optimal...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Yagan, Daniel Tharn, Chen-Khong Natl Univ Singapore Dept Elect & Comp Engn Singapore 117548 Singapore

ISBN: (纸本)9781424407064

We consider a multi-agent system where the overall performance is affected by the joint actions or policies of agents. However, each agent only observes a partial view of the global state condition. This model is known as a Decentralized Partially-Observable Markov Decision Process (DEC-POMDP), which can be considered more applicable in real-world applications such as communication networks. It is known that the exact solution to a DEC-POMDP is NEXP-complete and memory requirements grow exponentially even for finite-horizon problems. In this paper, we propose to address these issues by using an online model-free technique and by exploiting the locality of interaction among agents in order to approximate the joint optimal policy. Simulation results show the effectiveness and convergence of the proposed algorithm in the context of resource allocation for multi-agent wireless multihop networks.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Identifying trajectory classes in dynamic tasks

Identifying trajectory classes in dynamic tasks

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Anderson, Stuart O. Srinivasa, Siddhartha S. Carnegie Mellon Univ Inst Robot 5000 Forbes Ave Pittsburgh PA 15213 USA Intel Res Pittsburgh Pittsburgh PA 15213 USA

ISBN: (纸本)9781424407064

Using domain knowledge to decompose difficult control problems is a widely used technique in robotics. Previous work has automated the process of identifying some qualitative behaviors of a system, finding a decomposition of the system based on that behavior, and constructing a control policy based on that decomposition. We introduce a novel method for auto matically finding decompositions of a task based on observing the behavior of a preexisting controller. Unlike previous work, these decompositions define reparameterizations of the state space that can permit simplified control of the system.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Sztykgold, Alexandre Coppin, Gilles Hudry, Olivier GET ENST Bretagne LUSSI Dept CNRS TAMCICUMR 2872 Bretagne Germany GET ENST Bretagne Dept Comp Sci CNRS LTCI UMR 5141 Bretagne Germany

ISBN: (纸本)9781424407064

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to Temporal Differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues.

关键词： decision aid game theory graph theory viability theory Temporal Differences methods approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A recurrent control neural network for data efficient reinforcement learning

A recurrent control neural network for data efficient reinfo...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Schaefer, Anton Maximilian Udluft, Steffen Zimmermann, Hans-Georg Univ Ulm Dept Optimisat & Operat Res D-89069 Ulm Germany Corp Technol Seimens AG Dept Learning Syst Informat & Commun D-81739 Munich Germany

ISBN: (纸本)9781424407064

In this paper we introduce a new model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Our architecture is based on a recurrent neural network (RNN) with dynamically consistent overshooting, which we extend by an additional control network. The latter has the particular task to learn the optimal policy. This approach has the advantage that by using a neural network we can easily deal with high-dimensions and consequently are able to break Bellman's curse of dimensionality. Further due to the high system-identification quality of RNN our method is highly data-efficient. Because of its properties we refer to our new model as recurrent control neural network (RCNN). The network is tested on a standard reinforcement learning problem, namely the cart-pole balancing, where it shows especially in terms of data-efficiency outstanding results.

关键词： Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

Robust dynamic programming for discounted infinite-horizon Markov decision processes with uncertain stationary transition matrice

Robust dynamic programming for discounted infinite-horizon M...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Li, Baohua Si, Jennie Arizona State Univ Dept Elect Engn Tempe AZ 85287 USA

ISBN: (纸本)9781424407064

In this paper, finite-state, Saite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimahty criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimahty criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Evolutionary computation on multitask reinforcement learning problems

Evolutionary computation on multitask reinforcement learning...

引用

ieee International Conference on Networking, Sensing and Control

作者： Handa, Hisashi Okayama Univ Grad Sch Nat Sci & Technol Okayama 7008530 Japan

ISBN: (纸本)9781424410750

Recently, Multitask learning, which can cope with several tasks, has attracted much attention. Multitask reinforcement learning introduced by Tanaka et al is a problem class where number of problem instances of Markov Decision Processes sampled from the same probability distributions is sequentially given to reinforcement learning agents. The purpose of solving this problem is to realize adaptive agents for newly given environments by using knowledge acquired from past experience. Evolutionary Algorithms are often used to solve reinforcement learning problems if problem classes are quite different with Markov Decision Processes or state-action space is quite huge. From the viewpoint of Evolutionary Algorithms studies, the Multitask reinforcement learning problems are regarded as dynamic problems whose fitness landscape has changed temporally. In this paper, a memory-based Evolutionary programming which is suitable for Multitask reinforcement learning problems is proposed.

关键词： multitask reinforcement learning problems evolutionary algorithms dynamic environments

来源：评论

学校读者我要写书评

暂无评论

Value-iteration based fitted policy iteration:: learning with a single trajectory

Value-iteration based fitted policy iteration:: Learning wit...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Antos, Andras Szepesvari, Csaba Munos, Remi Hungarian Acad Sci Comp & Automat Res Inst Kendu U 13-17 H-1111 Budapest Hungary Univ Alberta Dept Comput Sci Edmonton AB Canada

ISBN: (纸本)9781424407064

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee nearoptimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A theoretical analysis of cooperative behaviorin multi-agent Q-learning

A theoretical analysis of cooperative behaviorin multi-agent...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Waltman, Ludo Kaymak, Uzay Erasmus Univ Erasmus Sch Econ POB 1738 NL-3000 DR Rotterdam Netherlands

ISBN: (纸本)9781424407064

A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Qlearning. In some studies cooperative behavior did emerge, in others it did not. This paper provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner's dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisoner's dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results presented in this paper are quite robust to violations of the underlying assumptions.

关键词： Multi agent systems

来源：评论

学校读者我要写书评

暂无评论

Evaluation of policy gradient methods and variants on the cart-pole benchmark

Evaluation of policy gradient methods and variants on the ca...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Riedmiller, Martin Peters, Jan Schaal, Stefan Univ Osnabruck Neuroinformat Grp D-4500 Osnabruck Germany Univ Southern Calif Computat Learning & Motor Control Los Angeles CA 90007 USA

ISBN: (纸本)9781424407064

In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, 'vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms;thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

关键词： Gradient methods

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共103页 << < 86 87 88 89 90 91 92 93 94 95 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：