检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

229 篇 会议
18 篇 期刊文献

馆藏范围

247 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

113 篇 工学
- 103 篇 计算机科学与技术...
- 42 篇 软件工程
- 38 篇 电气工程
- 23 篇 控制科学与工程
- 5 篇 信息与通信工程
- 3 篇 机械工程
- 2 篇 力学（可授工学、理...
- 1 篇 仪器科学与技术
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
27 篇 理学
- 25 篇 数学
- 7 篇 系统科学
- 6 篇 统计学（可授理学、...
- 1 篇 物理学
- 1 篇 化学
- 1 篇 大气科学
10 篇 管理学
- 8 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
54 篇 optimal control
51 篇 learning
44 篇 reinforcement le...
35 篇 learning (artifi...
27 篇 equations
25 篇 neural networks
22 篇 heuristic algori...
20 篇 convergence
20 篇 control systems
18 篇 function approxi...
18 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
15 篇 cost function
14 篇 markov processes
14 篇 nonlinear system...
14 篇 artificial neura...
13 篇 stochastic proce...
12 篇 adaptive dynamic...

机构

10 篇 chinese acad sci...
5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 department of el...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 natl univ def te...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...

作者

13 篇 liu derong
7 篇 hado van hasselt
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 zhao dongbin
5 篇 xu xin
5 篇 lewis frank l.
5 篇 huaguang zhang
5 篇 wei qinglai
5 篇 derong liu
5 篇 warren b. powell
4 篇 haibo he
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 zhang huaguang
4 篇 ni zhen
4 篇 yanhong luo
4 篇 wang ding
4 篇 he haibo
4 篇 damien ernst

语言

246 篇 英文
1 篇 其他

检索条件"任意字段=2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014"

共 247 条记录，以下是51-60 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

The QV Family Compared to Other reinforcement learning Algorithms

The QV Family Compared to Other Reinforcement Learning Algor...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Wiering, Marco A. van Hasselt, Hado Univ Groningen Dept Artificial Intelligence NL-9700 AB Groningen Netherlands Univ Utrecht Intelligent Syst Grp NL-3508 TC Utrecht Netherlands

ISBN: (纸本)9781424427611

This paper describes several new online model-free reinforcement learning (RL) algorithms. We designed three new reinforcement algorithms, namely: QV2, QVMAX, and QV-MAX2, that are all based on the QV-learning algorithm, but in contrary to QV-learning, QVMAX and QVMAX2 are off-policy RL algorithms and QV2 is a new on-policy RL algorithm. We experimentally compare these algorithms to a large number of different RL algorithms, namely: Q-learning, Sarsa, R-learning, Actor-Critic, QV-learning, and ACLA. We show experiments on five maze problems of varying complexity. Furthermore, we show experimental results on the cart pole balancing problem. The results show that for different problems, there can be large performance differences between the different algorithms, and that there is not a single RL algorithm that always performs best, although on average QV-learning scores highest.

关键词： learning algorithms

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-based Optimal Control Considering L Computation Time Delay of Linear Discrete-time Systems

Reinforcement Learning-based Optimal Control Considering <i>...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Fujita, Taishi Ushio, Toshimitsu

ISBN: (纸本)9781479945528

In embedded control systems, the control input is computed based on sensing data of a plant in a processor and there is a delay, called the computation time delay, due to the computation and the data transmission. When we design an optimal controller, we need to take the delay into account to achieve its optimality. Moreover, in the case where it is difficult to identify a mathematical model of the plant, a model free approach is useful. Especially, the reinforcement learning-based approach has been much attention to in the design of an adaptive optimal controller. In this paper, we assume that the plant is a linear system but the parameters of the plant are unknown. Then, we apply the reinforcement learning to the design of an adaptive optimal digital controller with taking the computation time delay into consideration. First, we consider the case where all states of the plant are observed, and it takes L times to update the control input. An optimal feedback gain is learned from sequences of a pair of the state and the control input. Next, we consider the case where the control input is determined from outputs of the plant. We cannot use an observer to estimate the state of the plant since the parameters of the plant are unknown. So, we use a data-based control approach for the estimation. Finally, we apply the proposed adaptive optimal controller to attitude control of a quadrotor at the hovering state and show its efficiency by simulation.

关键词： adaptive control control engineering computing control system synthesis data communication delays discrete time systems embedded systems feedback learning (artificial intelligence) linear systems optimal control parameter estimation state estimation L-computation time delay adaptive optimal digital controller attitude control data transmission data-based control approach embedded control systems linear discrete-time systems linear system mathematical model model free approach optimal feedback gain reinforcement learning Adaptation models Delay effects Optimal control Output feedback Propellers State feedback discrete time systems Linear system Optimal control Parameter estimation learning (artificial intelligence) attitude control data transmission control engineering computing PROPELLER Delay effects control input control system synthesis data communication State feedback plants

来源：评论

学校读者我要写书评

暂无评论

Supervised adaptive dynamic programming based adaptive cruise control

Supervised adaptive dynamic programming based adaptive cruis...

引用

symposium Series on Computational Intelligence, ieee SSCI2011 - 2011 ieee symposium on adaptive dynamic programming and reinforcement learning, adprl 2011

作者： Zhao, Dongbin Hu, Zhaohui Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy of Sciences Beijing 100190 China

ISBN: (纸本)9781424498888

This paper proposes a supervised adaptive dynamic programming (SADP) algorithm for the full range adaptive cruise control (ACC) system. The full range ACC system considers both the ACC situation in highway system and the stop and go (SG) situation in urban street way system. It can autonomously drive the host vehicle with desired speed and distance to the preceding vehicle in both situations. A traditional adaptive dynamic programming (ADP) algorithm is suited for this problem, but it suffers from the low learning efficiency. We propose the concept of inducing range to construct the supervisor and finally formulate the SADP algorithm, which greatly speeds up the learning efficiency. Several driving scenarios are designed and tested with the trained controller compared to traditional ones by simulation results, showing that trained SADP performs very well in all the scenarios, so that it provides an effective approach for the full range ACC problem. © 2011 ieee.

关键词： adaptive cruise control

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming strategy for responsive traffic signal control

An approximate dynamic programming strategy for responsive t...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Cai, Chen Univ Coll London Ctr Transport Studies London WC1E 6BT England

ISBN: (纸本)9781424407064

This paper proposes an approximate dynamic programming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of value function. The proposed value function approximation is separable and exogenous factor independent. The algorithm updates the approximated value function progressively in operation, while preserving the structural property of the control problem. The convergence and performance of the algorithm have been tested in a range of experiments. It has been concluded that the new strategy is as good as the best existing control strategies while being efficient and simple in computation. It also has the potential of being extended to multi-phase signal control at isolate junction and to decentralized network operation.

关键词： dynamic programming Traffic control Function approximation Communication system traffic control adaptive control Roads learning Testing Delay Vehicle safety

来源：评论

学校读者我要写书评

暂无评论

Iterative Local dynamic programming

Iterative Local Dynamic Programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Todorov, Emanuel Tassa, Yuval Univ Calif San Diego Dept Cognit Sci La Jolla CA 92093 USA Hebrew Univ Jerusalem Ctr Neural Computat IL-91905 Jerusalem Israel

ISBN: (纸本)9781424427611

We develop an iterative local dynamic programming method (iLDP) applicable to stochastic optimal control problems in continuous high-dimensional state and action spaces. Such problems are common in the control of biological movement, but cannot be handled by existing methods. iLDP can be considered a generalization of Differential dynamic programming, inasmuch as: (a) we use general basis functions rather than quadratics to approximate the optimal value function;(b) we introduce a collocation method that dispenses with explicit differentiation of the cost and dynamics and ties iLDP to the Unscented Kalman filter;(c) we adapt the local function approximator to the propagated state covariance, thus increasing accuracy at more likely states. Convergence is similar to quasi-Netwon methods. We illustrate iLDP on several problems including the "swimmer" dynamical system which has 14 state and 4 control variables.

关键词： dynamical systems

来源：评论

学校读者我要写书评

暂无评论

Multi-Objective reinforcement learning for AUV Thruster Failure Recovery

Multi-Objective Reinforcement Learning for AUV Thruster Fail...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Ahmadzadeh, Seyed Reza Kormushev, Petar Caldwell, Darwin G. Ist Italiano Tecnol Dept Adv Robot Via Morego 30 I-16163 Genoa Italy

ISBN: (纸本)9781479945528

This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes under-actuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.

关键词： autonomous underwater vehicles closed loop systems control engineering computing fault diagnosis learning (artificial intelligence) mobile robots optimal control state feedback AUV state feedback AUV thruster failure recovery Girona500 AUV closed-loop conflicting objective fault detection fault-tolerant control policy faulty thruster model-based direct policy search multiobjective reinforcement learning approach on-board simulated model optimal solution Optimization Sociology Statistics Trajectory Vectors Vehicle dynamics Vehicles Autonomous underwater vehicles control engineering computing Closed loop systems State feedback optimal solution trajectory Sociology vehicle Vehicle dynamics Mobile robots Defect detection Fault diagnosis learning (artificial intelligence) Optimal control CLOSED LOOP

来源：评论

学校读者我要写书评

暂无评论

Beyond Exponential Utility Functions: A Variance-Adjusted Approach for Risk-Averse reinforcement learning

Beyond Exponential Utility Functions: A Variance-Adjusted Ap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Gosavi, Abhijit A. Das, Sajal K. Murray, Susan L. Missouri Univ Sci & Technol Dept Engn Management & Syst Engn Rolla MO 65409 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

ISBN: (纸本)9781479945528

Utility theory has served as a bedrock for modeling risk in economics. Where risk is involved in decision-making, for solving Markov decision processes (MDPs) via utility theory, the exponential utility (EU) function has been used in the literature as an objective function for capturing risk-averse behavior. The EU function framework uses a so-called risk-averseness coefficient (RAC) that seeks to quantify the risk appetite of the decision-maker. Unfortunately, as we show in this paper, the EU framework suffers from computational deficiencies that prevent it from being useful in practice for solution methods based on reinforcement learning (RL). In particular, the value function becomes very large and typically the computer overflows. We provide a simple example to demonstrate this. Further, we show empirically how a variance-adjusted (VA) approach, which approximates the EU function objective for reasonable values of the RAC, can be used in the RL algorithm. The VA framework in a sense has two objectives: maximize expected returns and minimize variance. We conduct empirical studies on a VA-based RL algorithm on the semi-MDP (SMDP), which is a more general version of the MDP. We conclude with a mathematical proof of the boundedness of the iterates in our algorithm.

关键词： Markov processes decision making economics learning (artificial intelligence) mathematical analysis risk analysis utility theory EU function MDP Markov decision process RAC VA approach exponential utility functions mathematical proof risk-averse reinforcement learning risk-averseness coefficient variance-adjusted approach Computers Equations learning (artificial intelligence) Linear programming Mathematical model Measurement Markov chain utility theory formal proof economics AKT1 gene Computers decision making mathematical analysis linear programming Risk Management risk analysis learning (artificial intelligence) Mathematical Model

来源：评论

学校读者我要写书评

暂无评论

Using reward-weighted imitations for robot reinforcement learning

Using reward-weighted imitations for robot reinforcement lea...

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, adprl 2009

作者： Peters, Jan Kober, Jens Department of Empirical Inference and Machine Leartling Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tlibingen Germany

ISBN: (纸本)9781424427611

reinforcement learning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcement learning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-aCup task using a real Barrett WAMTM robot arm) and learning task-space control. © 2009 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Efficient Data Reuse in Value Function Approximation.

Efficient Data Reuse in Value Function Approximation.

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Hachiya, Hirotaka Akiyama, Takayuki Sugiyama, Masashi Peters, Jan Tokyo Inst Technol Dept Comp Sci Meguro Ku 2-12-1 O Okayama Tokyo 1528552 Japan Max Planck Inst Biol Cybernet Dept Scholkopf D-72076 Tubingen Germany

ISBN: (纸本)9781424427611

Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy that is different from the currently optimized policy. A common approach is to use importance sampling techniques for compensating for the bias of value function estimators caused by the difference between the data-sampling policy and the target policy. However, existing off-policy methods often do not take the variance of the value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a variant of cross-validation. The usefulness of the proposed approach is demonstrated through simulated swing-up inverted-pendulum problem.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Bayesian active learning with basis functions

Bayesian active learning with basis functions

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Ryzhov, Ilya O. Powell, Warren B. Operations Research and Financial Engineering Princeton University Princeton NJ 08544 United States

ISBN: (纸本)9781424498888

A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the exploration/exploitation dilemma: an inaccurate approximation may lead to poor decisions, making it necessary to sometimes explore actions that appear to be suboptimal. We propose a Bayesian strategy for active learning with basis functions, based on the knowledge gradient concept from the optimal learning literature. The new method performs well in numerical experiments conducted on an energy storage problem. © 2011 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共25页 << < 2 3 4 5 6 7 8 9 10 11 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：