检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

228 篇 会议
4 篇 期刊文献

馆藏范围

232 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

98 篇 工学
- 93 篇 计算机科学与技术...
- 40 篇 软件工程
- 25 篇 电气工程
- 14 篇 控制科学与工程
- 4 篇 机械工程
- 1 篇 力学（可授工学、理...
- 1 篇 信息与通信工程
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
23 篇 理学
- 23 篇 数学
- 6 篇 统计学（可授理学、...
- 4 篇 系统科学
- 1 篇 化学
- 1 篇 大气科学
9 篇 管理学
- 7 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
52 篇 learning
46 篇 optimal control
37 篇 reinforcement le...
34 篇 learning (artifi...
27 篇 equations
22 篇 heuristic algori...
21 篇 control systems
20 篇 convergence
19 篇 neural networks
18 篇 function approxi...
17 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
14 篇 markov processes
14 篇 artificial neura...
14 篇 cost function
13 篇 stochastic proce...
12 篇 algorithm design...
12 篇 adaptive control

机构

5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 northeastern uni...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...
2 篇 college of infor...
2 篇 school of automa...

作者

7 篇 hado van hasselt
7 篇 lewis frank l.
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 liu derong
5 篇 huaguang zhang
5 篇 zhang huaguang
5 篇 derong liu
5 篇 warren b. powell
4 篇 xu xin
4 篇 vrabie draguna
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 yanhong luo
4 篇 damien ernst
4 篇 jan peters
4 篇 peters jan
4 篇 zhao dongbin
3 篇 xu hao
3 篇 martin riedmille...

语言

232 篇 英文

检索条件"任意字段=2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009"

共 232 条记录，以下是151-160 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Bias-corrected Q-learning to control max-operator bias in Q-learning

Bias-corrected Q-learning to control max-operator bias in Q-...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Donghun Lee Boris Defourny Warren B. Powell Department of Computer Science Princeton University Princeton NJ USA Operations Research and Financial Engineering Princeton University Princeton NJ USA

We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.

关键词： Convergence Reactive power Random variables Standards dynamic programming learning (artificial intelligence) Educational institutions

来源：评论

学校读者我要写书评

暂无评论

A novel approach for constructing basis functions in approximate dynamic programming for feedback control

A novel approach for constructing basis functions in approxi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Jian Wang Zhenhua Huang Xin Xu College of Mechatronics and Automation National University of Defense Tech Changsha P. R. China

This paper presents a novel approach for constructing basis functions in approximate dynamic programming (ADP) through the locally linear embedding (LLE) process. It considers the experience (sample) data as a high-dimensional space and the basis functions to be solved as a low-dimensional space. Through mapping the high-dimensional data into a single global coordinate system of lower dimensionality, the solved basis functions in low-dimensional space have the property that nearby experience data in the high dimensional space remain nearby and similarly co-located with respect to one in the low dimensional space. Thus, the obtained basis functions can precisely approximate the real value/action-value function. The simulation results show that the basis functions obtained by LLE can represent the final policy with a higher precision.

关键词： learning (artificial intelligence) Function approximation dynamic programming Equations Linear approximation Vectors

来源：评论

学校读者我要写书评

暂无评论

Online adaptive learning of optimal control solutions using integral reinforcement learning

Online adaptive learning of optimal control solutions using ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kyriakos G. Vamvoudakis Draguna Vrabie Frank L. Lewis Automation and Robotics Research Institute University of Texas Arlington Fort Worth TX USA

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system's drift dynamics. The adaptive algorithm is based on policy iteration, and it is implemented on an actor/critic structure. Both actor and critic neural networks are adapted simultaneously a persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Scalarized multi-objective reinforcement learning: Novel design techniques

Scalarized multi-objective reinforcement learning: Novel des...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kristof Van Moffaert Madalina M. Drugan Ann Nowé Department of Computer Science Vrije Universiteit Brussel Brussels Belgium

In multi-objective problems, it is key to find compromising solutions that balance different objectives. The linear scalarization function is often utilized to translate the multi-objective nature of a problem into a standard, single-objective problem. Generally, it is noted that such as linear combination can only find solutions in convex areas of the Pareto front, therefore making the method inapplicable in situations where the shape of the front is not known beforehand, as is often the case. We propose a non-linear scalarization function, called the Chebyshev scalarization function, as a basis for action selection strategies in multi-objective reinforcement learning. The Chebyshev scalarization method overcomes the flaws of the linear scalarization function as it can (i) discover Pareto optimal solutions regardless of the shape of the front, i.e. convex as well as non-convex , (ii) obtain a better spread amongst the set of Pareto optimal solutions and (iii) is not particularly dependent on the actual weights used.

关键词： Pareto optimization learning (artificial intelligence) Chebyshev approximation Benchmark testing Measurement Shape

来源：评论

学校读者我要写书评

暂无评论

Heuristics for multiagent reinforcement learning in decentralized decision problems

Heuristics for multiagent reinforcement learning in decentra...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Martin W. Allen David Hahn Douglas C. MacFarland Computer Science Department University of Wisconsin-La Crosse La Crosse Wisconsin Computer Science Department Worcester Polytechnic Institute Worcester Massachusetts

Decentralized partially observable Markov decision processes (Dec-POMDPs) model cooperative multiagent scenarios, providing a powerful general framework for team-based artificial intelligence. While optimal algorithms exist for Dec-POMDPs, theoretical and empirical results demonstrate that they are impractical for many problems of real interest. We examine the use of reinforcement learning (RL) as a means to generate adequate, if not optimal, joint policies for Dec-POMDPs. It is easily demonstrated (and expected) that single-agent RL produces results of little joint utility. We therefore investigate heuristic methods, based upon the dynamics of the Dec-POMDP formulation, that bias the learning process to produce coordinated action. Empirical tests on a benchmark problem show that these heuristics significantly enhance learning performance, even out-performing a hand-crafted heuristic in cases where the learning process converges quickly.

关键词： Joints Heuristic algorithms learning (artificial intelligence) Equations Markov processes Benchmark testing Complexity theory

来源：评论

学校读者我要写书评

暂无评论

A combined hierarchical reinforcement learning based approach for multi-robot cooperative target searching in complex unknown environments

A combined hierarchical reinforcement learning based approac...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yifan Cai Simon X. Yang Xin Xu The School of Engineering University of Guelph Guelph Ontario Canada The College of Mechatronics and Automation National University of Defense Technology Changsha Hunan Province China

Effective cooperation of multi-robots in unknown environments is essential in many robotic applications, such as environment exploration and target searching. In this paper, a combined hierarchical reinforcement learning approach, together with a designed cooperation strategy, is proposed for the real-time cooperation of multi-robots in completely unknown environments. Unlike other algorithms that need an explicit environment model or select parameters by trial and error, the proposed cooperation method obtains all the required parameters automatically through learning. By integrating segmental options with the traditional MAXQ algorithm, the cooperation hierarchy is built. In new tasks, the designed cooperation method can control the multi-robot system to complete the task effectively. The simulation results demonstrate that the proposed scheme is able to effectively and efficiently lead a team of robots to cooperatively accomplish target searching tasks in completely unknown environments.

关键词： Robot kinematics learning (artificial intelligence) Real-time systems Algorithm design and analysis dynamic programming Robot sensing systems

来源：评论

学校读者我要写书评

暂无评论

adaptive optimal control for nonlinear discrete-time systems

Adaptive optimal control for nonlinear discrete-time systems

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Chunbin Qin Huaguang Zhang Yanhong Luo School of Information Science and Engineering Northeastern University Shenyang China Basic Experiment Teaching Center Henan University Kaifeng China

This paper proposes an on-line near-optimal control scheme based on capabilities of neural networks (NNs), in function approximation, to attain the on-line solution of optimal control problem for nonlinear discrete-time systems. First, to solve the Hamilton-Jacobi-Bellman (HJB) equation forward-in-time appearing in the optimal control problem, two neural networks are used to approximate the cost function and to compute the optimal control policy, respectively. And then, according to the Bellman's optimality principle and the adaptive technology, the on-line weight updating laws for the critic network and action network are derived, respectively. Further, considering NNs approximative errors, the stability analysis of the closed-loop system is demonstrated by Lyapunov theory. At last, a numerical example is provided to demonstrate the effectiveness of the proposed method.

关键词： Artificial neural networks Equations Optimal control Mathematical model dynamic programming Approximation methods Discrete-time systems

来源：评论

学校读者我要写书评

暂无评论

Approximate reinforcement learning: An overview

Approximate reinforcement learning: An overview

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Lucian Buşoniu Damien Ernst Bart De Schutter Robert Babuška Delft Center of Systems & Control Delft University of Technnology Netherlands FRS-FNRS Systems and Modeling Unit University of Liège Belgium

reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms.

关键词： Approximation algorithms Equations Function approximation Trajectory Markov processes Mathematical model

来源：评论

学校读者我要写书评

暂无评论

Algorithm and stability of ATC receding horizon control

Algorithm and stability of ATC receding horizon control

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Hongwei Zhang Jie Huang Frank L. Lewis Department of Mechanical and Automation Engineering Chinese University of Hong Kong New Territories Hong Kong China Automation and Robotics Research Institute University of Texas Arlington Fort Worth TX USA

Receding horizon control (RHC), also known as model predictive control (MPC), is a suboptimal control scheme that solves a finite horizon open-loop optimal control problem in an infinite horizon context and yields a measured state feedback control law. A lot of efforts have been made to study the closed-loop stability, leading to various stability conditions involving constraints on either the terminal state, or the terminal cost, or the horizon size, or their different combinations. In this paper, we propose a modified RHC scheme, called adaptive terminal cost RHC (ATC-RHC). The control law generated by ATC-RHC algorithm converges to the solution of the infinite horizon optimal control problem. Moreover, it ensures the closed-loop system to be uniformly ultimately exponentially stable without imposing any constraints on the terminal state, the horizon size, or the terminal cost. Finally we show that when the horizon size is one, the underlying problems of ATC-RHC and heuristic dynamic programming (HDP) are the same. Thus, ATC-RHC can be implemented using HDP techniques without knowing the system matrix A.

关键词： Stability Optimal control Open loop systems Costs Infinite horizon Context modeling Predictive models Predictive control State feedback dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Real-time tracking on adaptive critic design with uniformly ultimately bounded condition

Real-time tracking on adaptive critic design with uniformly ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Zhen Ni Xiao Fang Haibo He Dongbin Zhao Xin Xu Department of Electrical University of Rhode Island Kingston RI USA Institute of Automation Chinese Academy of Sciences Beijing China Institute of Automation National University of Defense Technology Changsha China

In this paper, we proposed a new nonlinear tracking controller based on heuristic dynamic programming (HDP) with the tracking filter. Specifically, we integrate a goal network into the regular HDP design and provide the critic network with detailed internal reward signal to help the value function approximation. The architecture is explicitly explained with the tracking filter, goal network, critic network and action network, respectively. We provide the stability analysis of our proposed controller with Lyapunov approach. It is shown that the filtered tracking errors and the weights estimation errors in neural networks are all uniformly ultimately bounded (UUB) under certain conditions. Finally, we compare our proposed approach with regular HDP approach in virtual reality (VR)/Simulink environment to justify the improved control performance.

关键词： dynamic programming Stability analysis Function approximation Nonlinear systems Vectors Computer architecture Neural networks

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共24页 << < 12 13 14 15 16 17 18 19 20 21 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：