检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

229 篇 会议
18 篇 期刊文献

馆藏范围

247 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

113 篇 工学
- 103 篇 计算机科学与技术...
- 42 篇 软件工程
- 38 篇 电气工程
- 23 篇 控制科学与工程
- 5 篇 信息与通信工程
- 3 篇 机械工程
- 2 篇 力学（可授工学、理...
- 1 篇 仪器科学与技术
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
27 篇 理学
- 25 篇 数学
- 7 篇 系统科学
- 6 篇 统计学（可授理学、...
- 1 篇 物理学
- 1 篇 化学
- 1 篇 大气科学
10 篇 管理学
- 8 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
54 篇 optimal control
51 篇 learning
44 篇 reinforcement le...
35 篇 learning (artifi...
27 篇 equations
25 篇 neural networks
22 篇 heuristic algori...
20 篇 convergence
20 篇 control systems
18 篇 function approxi...
18 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
15 篇 cost function
14 篇 markov processes
14 篇 nonlinear system...
14 篇 artificial neura...
13 篇 stochastic proce...
12 篇 adaptive dynamic...

机构

10 篇 chinese acad sci...
5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 department of el...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 natl univ def te...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...

作者

13 篇 liu derong
7 篇 hado van hasselt
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 zhao dongbin
5 篇 xu xin
5 篇 lewis frank l.
5 篇 huaguang zhang
5 篇 wei qinglai
5 篇 derong liu
5 篇 warren b. powell
4 篇 haibo he
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 zhang huaguang
4 篇 ni zhen
4 篇 yanhong luo
4 篇 wang ding
4 篇 he haibo
4 篇 damien ernst

语言

246 篇 英文
1 篇 其他

检索条件"任意字段=2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014"

共 247 条记录，以下是61-70 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Particle swarm optimized adaptive dynamic programming

Particle swarm optimized adaptive dynamic programming

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Dongbin Zhao Jianqiang Yi Liu, Derong Chinese Acad Sci Inst Automat Key Lab Complex Syst & Intelligence Sci Beijing 100080 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

ISBN: (纸本)9781424407064

Particle swarm optimization is used for the training of the action network and critic network of the adaptive dynamic programming approach. The typical structures of the adaptive dynamic programming and particle swarm optimization are adopted for comparison to other learning algorithms such as gradient descent method. Besides simulation on the balancing of a cart pole plant, a more complex plant pendulum robot (pendubot) is tested for the learning performance. Compared to traditional adaptive dynamic programming approaches, the proposed evolutionary learning strategy is verified as faster convergence and higher efficiency. Furthermore, the structure becomes simple because the plant model does not need to be identified beforehand.

关键词： adaptive dynamic programming Particle swarm optimization Pendubot Pole balancing

来源：评论

学校读者我要写书评

暂无评论

Higher-level application of adaptive dynamic programming/reinforcement learning - A next phase for controls and system identification?

Higher-level application of Adaptive Dynamic Programming/Rei...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Lendaris, George G. Systems Science Graduate Program Portland State University Portland OR United States

ISBN: (纸本)9781424498888

In previous work it was shown that adaptive-Critic-type Approximate dynamic programming could be applied in a higher-level way to create autonomous agents capable of using experience to discern context and select optimal, context-dependent control policies. Early experiments with this approach were based on full a priori knowledge of the system being monitored. The experiments reported in this paper, using small neural networks representing families of mappings, were designed to explore what happens when knowledge of the system is less precise. Results of these experiments show that agents trained with this approach perform well when subject to even large amounts of noise or when employing (slightly) imperfect models. The results also suggest that aspects of this method of context discernment are consistent with our intuition about human learning. The insights gained from these explorations can be used to guide further efforts for developing this approach into a general methodology for solving arbitrary identification and control problems. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming with balanced weights seeking strategy

Adaptive dynamic programming with balanced weights seeking s...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Fu, Jian He, Haibo Ni, Zhen School of Automation Wuhan University of Technology Wuhan Hubei 430070 China Department of Electrical Computerand Biomedical Engineering University of Rhode Island Kingston RI 02881 United States

ISBN: (纸本)9781424498888

In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptive dynamic programming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consider a balanced weight updating strategy with the consideration of both robustness and convergence during the online learning process. Specifically, a modified recursive Levenberg-Marquardt (LM) method is integrated into both the action network and critic network of the ADP design, and a detailed learning algorithm is proposed to implement this approach. We test the performance of our approach based on the triple link inverted pendulum, a popular benchmark in the community, to demonstrate online learning and control strategy. Experimental results and comparative study under different noise conditions demonstrate the effectiveness of this approach. © 2011 ieee.

关键词： Inverted pendulum

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems

Adaptive dynamic programming for optimal control of unknown ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Liu, Derong Wang, Ding Zhao, Dongbin Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy of Sciences Beijing 100190 China

ISBN: (纸本)9781424498888

An intelligent optimal control scheme for unknown nonlinear discrete-time systems with discount factor in the cost function is proposed in this paper. An iterative adaptive dynamic programming (ADP) algorithm via globalized dual heuristic programming (GDHP) technique is developed to obtain the optimal controller with convergence analysis. Three neural networks are used as parametric structures to facilitate the implementation of the iterative algorithm, which will approximate at each iteration the cost function, the optimal control law, and the unknown nonlinear system, respectively. Two simulation examples are provided to verify the effectiveness of the presented optimal control approach. © 2011 ieee.

关键词： Cost functions

来源：评论

学校读者我要写书评

暂无评论

Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures

Enhancing the episodic natural actor-critic algorithm by a r...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Witsch, Andreas Reichle, Roland Geihs, Kurt Lange, Sascha Riedmiller, Martin Distributed Systems Group Universität Kassel Germany Machine Learning Lab Albert-Ludwigs-Universität Freiburg Germany

ISBN: (纸本)9781424498888

Incomplete or imprecise models of control systems make it difficult to find an appropriate structure and parameter set for a corresponding control policy. These problems are addressed by reinforcement learning algorithms like policy gradient methods. We describe how to stabilise the policy gradient descent by introducing a regularisation term to enhance the episodic natural actor-critic approach. This allows a more policy independent usage. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive sample collection using active learning for kernel-based approximate policy iteration

Adaptive sample collection using active learning for kernel-...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Liu, Chunming Xu, Xin Haiyun Hu Dai, Bin College of Mechatronics and Automation National University of Defense Technology Changsha 410073 China

ISBN: (纸本)9781424498888

Approximate policy iteration (API) has been shown to be a class of reinforcement learning methods with stability and sample efficiency. However, sample collection is still an open problem which is critical to the performance of API methods. In this paper, a novel adaptive sample collection strategy using active learning-based exploration is proposed to enhance the performance of kernel-based API. In this strategy, an online kernel-based least squares policy iteration (KLSPI) method is adopted to construct nonlinear features and approximate the Q-function simultaneously. Therefore, more representative samples can be obtained for value function approximation. Simulation results on typical learning control problems illustrate that by using the proposed strategy, the performance of KLSPI can be improved remarkably. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning approach for sequential mastery testing

A reinforcement learning approach for sequential mastery tes...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： El-Alfy, El-Sayed M. College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Dhahran 31261 Saudi Arabia

ISBN: (纸本)9781424498888

This paper explores a novel application for reinforcement learning (RL) techniques to sequential mastery testing. In such systems, the goal is to classify each examined person, using the minimal number of test items, as master or non-master. Using RL, an intelligent agent autonomously learns from interactions to administer more informative and effective variable-length tests. Empirical results are also provided to evaluate the performance of the proposed approach as compared to two common approaches for variable-length testing (Bayesian decision and sequential probability ratio test) as well as to the fixed-length testing. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Feedback controller parameterizations for reinforcement learning

Feedback controller parameterizations for Reinforcement Lear...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Roberts, John W. Manchester, Ian R. Tedrake, Russ CSAIL MIT Cambridge MA 02139 United States

ISBN: (纸本)9781424498888

reinforcement learning offers a very general framework for learning controllers, but its effectiveness is closely tied to the controller parameterization used. Especially when learning feedback controllers for weakly stable systems, ineffective parameterizations can result in unstable controllers and poor performance both in terms of learning convergence and in the cost of the resulting policy. In this paper we explore four linear controller parameterizations in the context of REINFORCE, applying them to the control of a reaching task with a linearized flexible manipulator. We find that some natural but naive parameterizations perform very poorly, while the Youla Parameterization (a popular parameterization from the controls literature) offers a number of robustness and performance advantages. © 2011 ieee.

关键词： Parameterization

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming based controller for an underactuated 6DoF quadrotor

An approximate Dynamic Programming based controller for an u...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Stingu, Emanuel Lewis, Frank L. Automation and Robotics Research Institute University of Texas at Arlington Arlington TX United States

ISBN: (纸本)9781424498888

This paper discusses how the principles of adaptive dynamic programming (ADP) can be applied to the control of a quadrotor helicopter platform flying in an uncontrolled environment and subjected to various disturbances and model uncertainties. ADP is based on reinforcement learning using an actor-critic structure. Due to the complexity of the quadrotor system, the learning process has to use as much information as possible about the system and the environment. Various methods to improve the learning speed and efficiency are presented. Neural networks with local activation functions are used as function approximators because the state-space can not be explored efficiently due to its size and the limited time available. The complex dynamics is controlled by a single critic and by multiple actors thus avoiding the curse of dimensionality. After a number of iterations, the overall actor-critic structure stores information (knowledge) about the system dynamics and the optimal controller that can accomplish the explicit or implicit goal specified in the cost function. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Policy Gradient Approaches for Multi-Objective Sequential Decision Making: A Comparison

Policy Gradient Approaches for Multi-Objective Sequential De...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Parisi, Simone Pirotta, Matteo Smacchia, Nicola Bascetta, Luca Restelli, Marcello Politecn Milan Dept Elect Informat & Bioengn Piazza Leonardo da Vinci 32 I-20133 Milan Italy

ISBN: (纸本)9781479945528

This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective reinforcement-learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.

关键词： Pareto optimisation approximation theory decision making evolutionary computation gradient methods learning (artificial intelligence) MOMDPs MORL approaches Pareto following Pareto frontier approximation gradient-ascent algorithms gradient-based policy-search procedures multiobjective Markov decision processes multiobjective evolutionary algorithms multiobjective optimization problems multiobjective reinforcement-learning approaches multiobjective sequential decision making nondominated policies performance criterion policy gradient approaches policy-gradient algorithms radial following Algorithm design and analysis Approximation algorithms Approximation methods Manifolds Measurement Optimization Water resources evolutionary algorithm Performance metrics Pareto optimisation Algorithm design and analysis Manifolds Approximation method gradient methods Approximation Theory Approximation algorithms Water Resources Policies decision making

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共25页 << < 3 4 5 6 7 8 9 10 11 12 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：