检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

228 篇 会议
4 篇 期刊文献

馆藏范围

232 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

98 篇 工学
- 93 篇 计算机科学与技术...
- 40 篇 软件工程
- 25 篇 电气工程
- 14 篇 控制科学与工程
- 4 篇 机械工程
- 1 篇 力学（可授工学、理...
- 1 篇 信息与通信工程
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
23 篇 理学
- 23 篇 数学
- 6 篇 统计学（可授理学、...
- 4 篇 系统科学
- 1 篇 化学
- 1 篇 大气科学
9 篇 管理学
- 7 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
52 篇 learning
46 篇 optimal control
37 篇 reinforcement le...
34 篇 learning (artifi...
27 篇 equations
22 篇 heuristic algori...
21 篇 control systems
20 篇 convergence
19 篇 neural networks
18 篇 function approxi...
17 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
14 篇 markov processes
14 篇 artificial neura...
14 篇 cost function
13 篇 stochastic proce...
12 篇 algorithm design...
12 篇 adaptive control

机构

5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 northeastern uni...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...
2 篇 college of infor...
2 篇 school of automa...

作者

7 篇 hado van hasselt
7 篇 lewis frank l.
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 liu derong
5 篇 huaguang zhang
5 篇 zhang huaguang
5 篇 derong liu
5 篇 warren b. powell
4 篇 xu xin
4 篇 vrabie draguna
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 yanhong luo
4 篇 damien ernst
4 篇 jan peters
4 篇 peters jan
4 篇 zhao dongbin
3 篇 xu hao
3 篇 martin riedmille...

语言

232 篇 英文

检索条件"任意字段=2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009"

共 232 条记录，以下是111-120 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Iterative local dynamic programming

Iterative local dynamic programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Emanuel Todorov Yuval Tassa Department of Cognitive Science University of California San Diego USA Center of Neural Computation Hebrew University of Jerusalem Israel

We develop an iterative local dynamic programming method (iLDP) applicable to stochastic optimal control problems in continuous high-dimensional state and action spaces. Such problems are common in the control of biological movement, but cannot be handled by existing methods. iLDP can be considered a generalization of differential dynamic programming, in as much as: (a) we use general basis functions rather than quadratics to approximate the optimal value function; (b) we introduce a collocation method that dispenses with explicit differentiation of the cost and dynamics and ties iLDP to the unscented Kalman filter; (c) we adapt the local function approximator to the propagated state covariance, thus increasing accuracy at more likely states. Convergence is similar to quasi-Newton methods. We illustrate iLDP on several problems including the ldquoswimmerrdquo dynamical system which has 14 state and 4 control variables.

关键词： dynamic programming Function approximation Optimal control Open loop systems Costs Iterative methods Stochastic processes Control systems Stochastic resonance learning

来源：评论

学校读者我要写书评

暂无评论

Neural-network-based reinforcement learning controller for nonlinear systems with non-symmetric dead-zone inputs

Neural-network-based reinforcement learning controller for n...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xin Zhang Huaguang Zhang Derong Liu Yongsu Kim School of Information Science and Engineering Northeastern University Shenyang Liaoning China Department of Electrical and Computer Engineering University of Illinois Chicago Chicago IL USA

A novel adaptive-critic-based NN controller using reinforcement learning is developed for a class of nonlinear systems with non-symmetric dead-zone inputs. The adaptive critic NN controller uses two NNs: the critic NN is used to approximate the strategic utility function, and the output of action NN is used to approximate the unknown nonlinear function and to minimize the strategic utility function. The tuning of the NNs is performed online without an explicit offline learning phase. The uniformly ultimate boundedness of the close-loop tracking error is derived by using using the Lyapunov method. Finally, a numerical example is included to show the effectiveness of the theoretical results.

关键词： learning Nonlinear control systems Control systems Nonlinear systems Neural networks adaptive control Programmable control Lyapunov method Actuators Servomechanisms

来源：评论

学校读者我要写书评

暂无评论

A unified framework for temporal difference methods

A unified framework for temporal difference methods

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Dimitri P. Bertsekas Laboratory of Information and Decision Systems (LIDS) Massachusetts Institute of Technology MA USA

We propose a unified framework for a broad class of methods to solve projected equations that approximate the solution of a high-dimensional fixed point problem within a subspace S spanned by a small number of basis functions or features. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Our framework is based on a connection with projection methods for monotone variational inequalities, which involve alternative representations of the subspace S (feature scaling). Our methods admit simulation-based implementations, and even when specialized to DP problems, include extensions/new versions of the standard TD algorithms, which offer some special implementation advantages and reduced overhead.

关键词： Least squares approximation Difference equations Jacobian matrices Laboratories dynamic programming Probability distribution Costs Books Least squares methods Linear matrix inequalities

来源：评论

学校读者我要写书评

暂无评论

Near-optimality bounds for greedy periodic policies with application to grid-level storage

Near-optimality bounds for greedy periodic policies with app...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yuhai Hu Boris Defourny Department of Industrial & Systems Engineering Lehigh University USA

This paper is concerned with periodic Markov Decision Processes, as a simplified but already rich model for nonstationary infinite-horizon problems involving seasonal effects. Considering the class of policies greedy for periodic approximate value functions, we establish improved near-optimality bounds for such policies, and derive a corresponding value-iteration algorithm suitable for periodic problems. The effectiveness of a parallel implementation of the algorithm is demonstrated on a grid-level storage control problem that involves stochastic electricity prices following a daily cycle.

关键词： Silicon Markov processes Approximation algorithms Approximation methods Modeling Electricity dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

A convergent recursive least squares approximate policy iter...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Jun Ma Warren B. Powell Department of Operations Research and Financial Engineering Princeton University Princeton NJ USA

In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinite-horizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.

关键词： Least squares approximation Function approximation Convergence Approximation algorithms dynamic programming Infinite horizon Least squares methods Acoustic noise State-space methods

来源：评论

学校读者我要写书评

暂无评论

Integrating sporadic imitation in reinforcement learning robots

Integrating sporadic imitation in Reinforcement Learning rob...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Willi Richert Ulrich Scheller Markus Koch Bernd Kleinjohann Claudius Stern Faculty of of Computer Science Electrical Engineering and Mathematics University of Paderborn Paderborn Germany

Although the combination of reinforcement learning and imitation has been already considered in recent research, it always revolved around fixed settings where demonstrator and imitator are fixed and the imitation process is a well-defined period of time. What is missing is the investigation of approaches that also work in scenarios where imitation is only sporadically possible. This means that in a multi-robot scenario a robot is now allowed to interrupt another robot by asking to repeat certain actions, but can only observe and integrate information bits delivered occasionally. In this paper we present how that can be done in continuous and noisy environment within an SMDP context.

关键词： learning Orbital robotics Space exploration Humans Neurons Educational robots Working environment noise Animals Mirrors Fires

来源：评论

学校读者我要写书评

暂无评论

Finite-horizon optimal control design for uncertain linear discrete-time systems

Finite-horizon optimal control design for uncertain linear d...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Qiming Zhao Hao Xu S. Jagannathan Department of Electrical and Computer Engineering Missouri University of Science and Technology Rolla MO USA

In this paper, the finite-horizon optimal adaptive control design for linear discrete-time systems with unknown system dynamics by using adaptive dynamic programming (ADP) is presented. In the presence of full state feedback, the terminal state constraint is incorporated in solving the optimal feedback control via the Bellman equation. The optimal regulation of the uncertain linear system is solved in a forward-in-time and online manner without using value and/or policy iterations. Due to the nature of finite horizon, the stability of the closed-loop system is involved but verified by using Lyapunov theory. The effectiveness of the proposed method is verified by simulation results.

关键词： Optimal control Vectors Equations dynamic programming Mathematical model Linear systems learning (artificial intelligence)

来源：评论

学校读者我要写书评

暂无评论

High-order local dynamic programming

High-order local dynamic programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yuval Tassa Emanuel Todorov Interdisciplinary Center of Neural Computation Hebrew University Jerusalem Israel Applied Mathematics and Computer Science & Engineering University of Washington Seattle USA

We describe a new local dynamic programming algorithm for solving stochastic continuous Optimal Control problems. We use cubature integration to both propagate the state distribution and perform the Bellman backup. The algorithm can approximate the local policy and cost-to-go with arbitrary function bases. We compare the classic quadratic cost-to-go/linear-feedback controller to a cubic cost-to-go/quadratic policy controller on a 10-dimensional simulated swimming robot, and find that the higher order approximation yields a more general policy with a larger basin of attraction.

关键词： Approximation methods Heuristic algorithms Mathematical model Equations Trajectory dynamic programming Noise

来源：评论

学校读者我要写书评

暂无评论

An adaptive dynamic programming algorithm to solve optimal control of uncertain nonlinear systems

An adaptive dynamic programming algorithm to solve optimal c...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xiaohong Cui Yanhong Luo Huaguang Zhang School of Information Science and Engineering Northeastern University Shenyang Liaoning China

ISBN: (纸本)9781479945511

In this paper, an approximate optimal control method based on adaptive dynamic programming(ADP) is discussed for completely unknown nonlinear system. An online critic-action-identifier algorithm is developed using neural network systems, where the criticaction networks approximate the optimal value function and optimal control and the other two neural networks approximates the unknown system. Furthermore the adaptive tuning laws are given based on Lyapunov approach, which ensures the uniform ultimate bounded stability of the closed-loop system. Finally, the effectiveness is demonstrated by a simulation example.

关键词： Optimal control Artificial neural networks Mathematical model Equations Heuristic algorithms Function approximation

来源：评论

学校读者我要写书评

暂无评论

A data-based online reinforcement learning algorithm with high-efficient exploration

A data-based online reinforcement learning algorithm with hi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yuanheng Zhu Dongbin Zhao The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781479945511

An online reinforcement learning algorithm is proposed in this paper to directly utilizes online data efficiently for continuous deterministic systems without system parameters. The dependence on some specific approximation structures is crucial to limit the wide application of online reinforcement learning algorithms. We utilize the online data directly with the kd-tree technique to remove this limitation. Moreover, we design the algorithm in the Probably Approximately Correct principle. Two examples are simulated to verify its good performance.

关键词： Approximation algorithms learning (artificial intelligence) Approximation methods Optimal control Upper bound Partitioning algorithms DC motors

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共24页 << < 8 9 10 11 12 13 14 15 16 17 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：