检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

228 篇 会议
4 篇 期刊文献

馆藏范围

232 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

98 篇 工学
- 93 篇 计算机科学与技术...
- 40 篇 软件工程
- 25 篇 电气工程
- 14 篇 控制科学与工程
- 4 篇 机械工程
- 1 篇 力学（可授工学、理...
- 1 篇 信息与通信工程
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
23 篇 理学
- 23 篇 数学
- 6 篇 统计学（可授理学、...
- 4 篇 系统科学
- 1 篇 化学
- 1 篇 大气科学
9 篇 管理学
- 7 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
52 篇 learning
46 篇 optimal control
37 篇 reinforcement le...
34 篇 learning (artifi...
27 篇 equations
22 篇 heuristic algori...
21 篇 control systems
20 篇 convergence
19 篇 neural networks
18 篇 function approxi...
17 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
14 篇 markov processes
14 篇 artificial neura...
14 篇 cost function
13 篇 stochastic proce...
12 篇 algorithm design...
12 篇 adaptive control

机构

5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 northeastern uni...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...
2 篇 college of infor...
2 篇 school of automa...

作者

7 篇 hado van hasselt
7 篇 lewis frank l.
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 liu derong
5 篇 huaguang zhang
5 篇 zhang huaguang
5 篇 derong liu
5 篇 warren b. powell
4 篇 xu xin
4 篇 vrabie draguna
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 yanhong luo
4 篇 damien ernst
4 篇 jan peters
4 篇 peters jan
4 篇 zhao dongbin
3 篇 xu hao
3 篇 martin riedmille...

语言

232 篇 英文

检索条件"任意字段=2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009"

共 232 条记录，以下是31-40 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Multi-Objective reinforcement learning for AUV Thruster Failure Recovery

Multi-Objective Reinforcement Learning for AUV Thruster Fail...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Ahmadzadeh, Seyed Reza Kormushev, Petar Caldwell, Darwin G. Ist Italiano Tecnol Dept Adv Robot Via Morego 30 I-16163 Genoa Italy

ISBN: (纸本)9781479945528

This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes under-actuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.

关键词： autonomous underwater vehicles closed loop systems control engineering computing fault diagnosis learning (artificial intelligence) mobile robots optimal control state feedback AUV state feedback AUV thruster failure recovery Girona500 AUV closed-loop conflicting objective fault detection fault-tolerant control policy faulty thruster model-based direct policy search multiobjective reinforcement learning approach on-board simulated model optimal solution Optimization Sociology Statistics Trajectory Vectors Vehicle dynamics Vehicles Autonomous underwater vehicles control engineering computing Closed loop systems State feedback optimal solution trajectory Sociology vehicle Vehicle dynamics Mobile robots Defect detection Fault diagnosis learning (artificial intelligence) Optimal control CLOSED LOOP

来源：评论

学校读者我要写书评

暂无评论

Policy Gradient Approaches for Multi-Objective Sequential Decision Making: A Comparison

Policy Gradient Approaches for Multi-Objective Sequential De...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Parisi, Simone Pirotta, Matteo Smacchia, Nicola Bascetta, Luca Restelli, Marcello Politecn Milan Dept Elect Informat & Bioengn Piazza Leonardo da Vinci 32 I-20133 Milan Italy

ISBN: (纸本)9781479945528

This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective reinforcement-learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.

关键词： Pareto optimisation approximation theory decision making evolutionary computation gradient methods learning (artificial intelligence) MOMDPs MORL approaches Pareto following Pareto frontier approximation gradient-ascent algorithms gradient-based policy-search procedures multiobjective Markov decision processes multiobjective evolutionary algorithms multiobjective optimization problems multiobjective reinforcement-learning approaches multiobjective sequential decision making nondominated policies performance criterion policy gradient approaches policy-gradient algorithms radial following Algorithm design and analysis Approximation algorithms Approximation methods Manifolds Measurement Optimization Water resources evolutionary algorithm Performance metrics Pareto optimisation Algorithm design and analysis Manifolds Approximation method gradient methods Approximation Theory Approximation algorithms Water Resources Policies decision making

来源：评论

学校读者我要写书评

暂无评论

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorith...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Yahyaa, Saba Q. Drugan, Madalina M. Manderick, Bernard Vrije Univ Brussel Dept Comp Sci Pl Laan 2 B-1050 Brussels Belgium

ISBN: (纸本)9781479945528

In the stochastic multi-objective multi-armed bandit (or MOMAB), arms generate a vector of stochastic rewards, one per objective, instead of a single scalar reward. As a result, there is not only one optimal arm, but there is a set of optimal arms (Pareto front) of reward vectors using the Pareto dominance relation and there is a trade-off between finding the optimal arm set (exploration) and selecting fairly or evenly the optimal arms (exploitation). To trade-off between exploration and exploitation, either Pareto knowledge gradient (or Pareto-KG for short), or Pareto upper confidence bound (or Pareto-UCB1 for short) can be used. They combine the KG-policy and UCB1-policy, respectively with the Pareto dominance relation. In this paper, we propose Pareto Thompson sampling that uses Pareto dominance relation to find the Pareto front. We also propose annealing-Pareto algorithm that trades-off between the exploration and exploitation by using a decaying parameter epsilon(t) in combination with Pareto dominance relation. The annealing-Pareto algorithm uses the decaying parameter to explore the Pareto optimal arms and uses Pareto dominance relation to exploit the Pareto front. We experimentally compare Pareto-KG, Pareto-UCB1, Pareto Thompson sampling and the annealing-Pareto algorithms on multi-objective Bernoulli distribution problems and we conclude that the annealing-Pareto is the best performing algorithm.

关键词： Pareto optimisation sampling methods simulated annealing stochastic programming KG-policy MOMAB Pareto Thompson sampling Pareto dominance relation Pareto front Pareto knowledge gradient Pareto optimal arms Pareto upper confidence bound Pareto-KG Pareto-UCB1 UCB1-policy annealing-Pareto multiobjective multiarmed bandit algorithm decaying parameter multiobjective Bernoulli distribution problems multiobjective multiarmed bandit reward vectors stochastic rewards Annealing Entropy Heuristic algorithms Nickel Pareto optimization Probability distribution Vectors Pareto optimisation Heuristic algorithms Probability distribution sampling methods simulated annealing Arm spiral arm entropy Exploration Nickel annealing Arms stochastic programming Stochastic models Cloning Vectors

来源：评论

学校读者我要写书评

暂无评论

Theoretical analysis of a reinforcement learning based switching scheme

Theoretical analysis of a reinforcement learning based switc...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Ali Heydari Mechanical Engineering Department South Dakota School of Mines and Technology Rapid City SD

A reinforcement learning based scheme for optimal switching with an infinite-horizon cost function is briefly proposed in this paper. Several theoretical questions are shown to arise regarding its convergence, optimality of the result, and continuity of the limit function, to be uniformly approximated using parametric function approximators. The main contribution of the paper is providing rigorous answers for the questions, where, sufficient conditions for convergence, optimality, and continuity are provided.

关键词： Switches Cost function Convergence Artificial neural networks Schedules Approximation methods Optimal control

来源：评论

学校读者我要写书评

暂无评论

An adaptive dynamic programming algorithm to solve optimal control of uncertain nonlinear systems

An adaptive dynamic programming algorithm to solve optimal c...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xiaohong Cui Yanhong Luo Huaguang Zhang School of Information Science and Engineering Northeastern University Shenyang Liaoning China

ISBN: (纸本)9781479945511

In this paper, an approximate optimal control method based on adaptive dynamic programming(ADP) is discussed for completely unknown nonlinear system. An online critic-action-identifier algorithm is developed using neural network systems, where the criticaction networks approximate the optimal value function and optimal control and the other two neural networks approximates the unknown system. Furthermore the adaptive tuning laws are given based on Lyapunov approach, which ensures the uniform ultimate bounded stability of the closed-loop system. Finally, the effectiveness is demonstrated by a simulation example.

关键词： Optimal control Artificial neural networks Mathematical model Equations Heuristic algorithms Function approximation

来源：评论

学校读者我要写书评

暂无评论

Near-optimality bounds for greedy periodic policies with application to grid-level storage

Near-optimality bounds for greedy periodic policies with app...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yuhai Hu Boris Defourny Department of Industrial & Systems Engineering Lehigh University USA

This paper is concerned with periodic Markov Decision Processes, as a simplified but already rich model for nonstationary infinite-horizon problems involving seasonal effects. Considering the class of policies greedy for periodic approximate value functions, we establish improved near-optimality bounds for such policies, and derive a corresponding value-iteration algorithm suitable for periodic problems. The effectiveness of a parallel implementation of the algorithm is demonstrated on a grid-level storage control problem that involves stochastic electricity prices following a daily cycle.

关键词： Silicon Markov processes Approximation algorithms Approximation methods Modeling Electricity dynamic programming

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming-based optimal tracking control for nonlinear systems using general value iteration

Adaptive dynamic programming-based optimal tracking control ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xiaofeng Lin Qiang Ding Weikai Kong Chunning Song Qingbao Huang School of Electrical Engineering Guangxi University Nanning China

ISBN: (纸本)9781479945511

For the optimal tracking control problem of affine nonlinear systems, a general value iteration algorithm based on adaptive dynamic programming is proposed in this paper. By system transformation, the optimal tracking problem is converted into the optimal regulating problem for the tracking error dynamics. Then, general value iteration algorithm is developed to obtain the optimal control with convergence analysis. Considering the advantages of echo state network, we use three echo state networks with levenberg-Marquardt (LM) adjusting algorithm to approximate the system, the cost function and the control law. A simulation example is given to demonstrate the effectiveness of the presented scheme.

关键词： Cost function Nonlinear systems Optimal control Trajectory dynamic programming Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

A data-based online reinforcement learning algorithm with high-efficient exploration

A data-based online reinforcement learning algorithm with hi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yuanheng Zhu Dongbin Zhao The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781479945511

An online reinforcement learning algorithm is proposed in this paper to directly utilizes online data efficiently for continuous deterministic systems without system parameters. The dependence on some specific approximation structures is crucial to limit the wide application of online reinforcement learning algorithms. We utilize the online data directly with the kd-tree technique to remove this limitation. Moreover, we design the algorithm in the Probably Approximately Correct principle. Two examples are simulated to verify its good performance.

关键词： Approximation algorithms learning (artificial intelligence) Approximation methods Optimal control Upper bound Partitioning algorithms DC motors

来源：评论

学校读者我要写书评

暂无评论

ADP-based optimal control for a class of nonlinear discrete-time systems with inequality constraints

ADP-based optimal control for a class of nonlinear discrete-...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yanhong Luo Geyang Xiao College of Information Science and Engineering Northeastern University

ISBN: (纸本)9781479945511

In this paper, the adaptive dynamic programming (ADP) approach is utilized to design a neural-network-based optimal controller for a class of nonlinear discrete-time (DT) systems with inequality constraints. To begin with, the initial constrained optimal control problem is transformed into an infinite horizon optimal control problem by introducing the penalty function. Then, the iterative ADP algorithm is developed to handle the nonlinear optimal control problem with two neural networks. The two neural networks are aimed at generating the optimal cost and the optimal control policy respectively. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

关键词： Optimal control Biological neural networks Nonlinear systems dynamic programming Cost function

来源：评论

学校读者我要写书评

暂无评论

Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning

Using supervised training signals of observable state dynami...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Daniel L Elliott Charles Anderson Dept of Computer Science Colorado State University

A common complaint about reinforcement learning (RL) is that it is too slow to learn a value function which gives good performance. This issue is exacerbated in continuous state spaces. This paper presents a straight-forward approach to speeding-up and even improving RL solutions by reusing features learned during a pre-training phase prior to Q-learning. During pre-training, the agent is taught to predict state change given a state/action pair. The effect of pre-training is examined using the model-free Q-learning approach but could readily be applied to a number of RL approaches including model-based RL. The analysis of the results provides ample evidence that the features learned during pre-training is the reason behind the improved RL performance.

关键词： Artificial neural networks Data models Training learning (artificial intelligence) Heuristic algorithms Supervised learning Computational modeling

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共24页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：