检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

228 篇 会议
4 篇 期刊文献

馆藏范围

232 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

98 篇 工学
- 93 篇 计算机科学与技术...
- 40 篇 软件工程
- 25 篇 电气工程
- 14 篇 控制科学与工程
- 4 篇 机械工程
- 1 篇 力学（可授工学、理...
- 1 篇 信息与通信工程
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
23 篇 理学
- 23 篇 数学
- 6 篇 统计学（可授理学、...
- 4 篇 系统科学
- 1 篇 化学
- 1 篇 大气科学
9 篇 管理学
- 7 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
52 篇 learning
46 篇 optimal control
37 篇 reinforcement le...
34 篇 learning (artifi...
27 篇 equations
22 篇 heuristic algori...
21 篇 control systems
20 篇 convergence
19 篇 neural networks
18 篇 function approxi...
17 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
14 篇 markov processes
14 篇 artificial neura...
14 篇 cost function
13 篇 stochastic proce...
12 篇 algorithm design...
12 篇 adaptive control

机构

5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 northeastern uni...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...
2 篇 college of infor...
2 篇 school of automa...

作者

7 篇 hado van hasselt
7 篇 lewis frank l.
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 liu derong
5 篇 huaguang zhang
5 篇 zhang huaguang
5 篇 derong liu
5 篇 warren b. powell
4 篇 xu xin
4 篇 vrabie draguna
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 yanhong luo
4 篇 damien ernst
4 篇 jan peters
4 篇 peters jan
4 篇 zhao dongbin
3 篇 xu hao
3 篇 martin riedmille...

语言

232 篇 英文

检索条件"任意字段=2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009"

共 232 条记录，以下是41-50 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Algorithm and Stability of ATC Receding Horizon Control

Algorithm and Stability of ATC Receding Horizon Control

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Zhang, Hongwei Huang, Jie Lewis, Frank L. Chinese Univ Hong Kong Dept Mech & Automat Engn Shatin Hong Kong Peoples R China Univ Texas Arlingto Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424427611

Receding horizon control (RHC), also known as model predictive control (MPC), is a suboptimal control scheme that solves a finite horizon open-loop optimal control problem in an infinite horizon context and yields a measured state feedback control law. A lot of efforts have been made to study the closed-loop stability, leading to various stability conditions involving constraints on either the terminal state, or the terminal cost, or the horizon size, or their different combinations. In this paper, we propose a modified RHC scheme, called adaptive terminal cost RHC (ATC-RHC). The control law generated by ATC-RHC algorithm converges to the solution of the infinite horizon optimal control problem. Moreover, it ensures the closed-loop system to be uniformly ultimately exponentially stable without imposing any constraints on the terminal state, the horizon size, or the terminal cost. Finally we show that when the horizon size is one, the underlying problems of ATC-RHC and heuristic dynamic programming (RDP) are the same. Thus, ATC-RHC can be implemented using HDP techniques without knowing the system matrix A.

关键词： Receding horizon control adaptive terminal cost receding horizon control Stability Heuristic dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Partially Observable dynamic Processes Using adaptive dynamic programming

Data-Driven Partially Observable Dynamic Processes Using Ada...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Zhong, Xiangnan Ni, Zhen Tang, Yufei He, Haibo Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

ISBN: (纸本)9781479945528

adaptive dynamic programming (ADP) has been widely recognized as one of the "core methodologies" to achieve optimal control for intelligent systems in Markov decision process (MDP). Generally, ADP control design requires all the information of the system dynamics. However, in many practical situations, the measured input and output data can only represent part of the system states. This means the complete information of the system cannot be available in many real-world cases, which narrows the range of application of the ADP design. In this paper, we propose a data-driven ADP method to stabilize the system with partially observable dynamics based on neural network techniques. A state network is integrated into the typical actor-critic architecture to provide an estimated state from the measured input/output sequences. The theoretical analysis and the stability discussion of this data-driven ADP method are also provided. Two examples are studied to verify our proposed method.

关键词： Markov processes

来源：评论

学校读者我要写书评

暂无评论

Model-Based Multi-Objective reinforcement learning

Model-Based Multi-Objective Reinforcement Learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Wiering, Marco A. Withagen, Maikel Drugan, Madalina M. Univ Groningen Inst Artificial Intelligence NL-9700 AB Groningen Netherlands Vrije Univ Brussel Artificial Intelligence Lab Ixelles Brunei

ISBN: (纸本)9781479945528

This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multi-objective dynamic programming method to compute Pareto optimal policies. The advantage of this model-based multi-objective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic programming method will compute all Pareto optimal policies. Therefore it is important that the agent explores the environment in an intelligent way by using a good exploration strategy. In this paper we have supplied the agent with two different exploration strategies and compare their effectiveness in estimating accurate models within a reasonable amount of time. The experimental results show that our method with the best exploration strategy is able to quickly learn all Pareto optimal policies for the Deep Sea Treasure problem.

关键词： Pareto optimisation decision making dynamic programming learning (artificial intelligence) Pareto optimal policies deep sea treasure problem model-based multiobjective reinforcement learning multiobjective dynamic programming method multiobjective sequential decision making problem Computational modeling dynamic programming Heuristic algorithms learning (artificial intelligence) Markov processes Pareto optimization Vectors Pareto optimisation dynamic programming exploration strategy Heuristic algorithms learning (artificial intelligence) Computational modeling Markov chain Agents optimal strategy decision making

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Discrete-time LQR Optimal Tracking Control Problems with Unknown dynamics

Adaptive Dynamic Programming for Discrete-time LQR Optimal T...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Liu, Yang Luo, Yanhong Zhang, Huaguang Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China

ISBN: (纸本)9781479945528

In this paper, an optimal tracking control approach based on adaptive dynamic programming (ADP) algorithm is proposed to solve the linear quadratic regulation (LQR) problems for unknown discrete-time systems in an online fashion. First, we convert the optimal tracking problem into designing infinite-horizon optimal regulator for the tracking error dynamics based on the system transformation. Then we expand the error state equation by the history data of control and state. The iterative ADP algorithm of policy iteration (PI) and value iteration (VI) are introduced to solve the value function of the controlled system. It is shown that the proposed ADP algorithm solves the LQR without requiring any knowledge of the system dynamics. The simulation results show the convergence and effectiveness of the proposed control scheme.

关键词： Digital control systems

来源：评论

学校读者我要写书评

暂无评论

Structure search of probabilistic models and data correction for EDA-RL

Structure search of probabilistic models and data correction...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Handa, Hisashi Graduate School of Natural Science and Technology Okayama University Tsushima-naka 3-1-1 Okayama 700-8530 Japan

ISBN: (纸本)9781424498888

We have proposed a novel Estimation of Distribution Algorithm for solving reinforcement learning problems: EDA-RL. The EDA-RL can perform well if the complexity of the structure of the probabilistic model is adapted to the difficulty of given problems. Therefore, this paper proposes a structure search method of the probabilistic model in the EDA-RL as in conventional EDA taking account multivariate dependencies. Moreover, a data correction method by eliminating loops of state transitions is also proposed. Computational simulations on maze problems, which have several perceptual aliasing states, show the effectiveness of the proposed method. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Continuous-Time Differential dynamic programming with Terminal Constraints

Continuous-Time Differential Dynamic Programming with Termin...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Sun, Wei Theodorou, Evangelos A. Tsiotras, Panagiotis

ISBN: (纸本)9781479945528

In this work, we revisit the continuous-time Differential dynamic programming (DDP) approach for solving optimal control problems with terminal state constraints. We derive two algorithms, each for different order of expansion of the system dynamics and we investigate their performance in terms of their convergence speed. Compared to previous work, we provide a set of backward differential equations for the value function expansion by relaxing the assumption that the initial nominal control must be very close to the optimal control solution. We apply the derived algorithms to two classical optimal control problems, namely, the inverted pendulum and the Dreyfus rocket problem and show the benefit of second order expansion.

关键词： Inverted pendulum

来源：评论

学校读者我要写书评

暂无评论

Cognitive Control in Cognitive dynamic Systems: A New Way of Thinking Inspired by The Brain

Cognitive Control in Cognitive Dynamic Systems: A New Way of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Haykin, Simon Amiri, Ashkan Fatemi, Mehdi McMaster Univ Cognit Syst Lab Hamilton ON L8S 4K1 Canada

ISBN: (纸本)9781479945528

Briefly, main purpose of the paper is fourfold: a) Cognitive perception, which consists of two functional blocks: improved sparse-coding under the influence of perceptual attention for extracting relevant information from the observables and ignoring irrelevant information, followed by a Bayesian algorithm for state estimation. b) Entropic state of the perceptor, which provides feedback information to the controller. c) Cognitive control, which also consists of two functional blocks: executive learning algorithm computed by processing the entropic state, followed by predictive planning to set the stage for policy to act on the environment, thereby establishing the global perception-action cycle. d) Experimental results for exploiting the perceptual as well as executive attention in a co-operative manner, which is aimed at the first demonstration of risk control in the presence of a severe disturbance in the environment.

关键词： Cognition Cognitive dynamic Systems Cognitive perception Cognitive Control Perceptual attention Executive attention Predictive planning Pre-adaptation

来源：评论

学校读者我要写书评

暂无评论

Using Approximate dynamic programming for Estimating the Revenues of a Hydrogen-based High-Capacity Storage Device

Using Approximate Dynamic Programming for Estimating the Rev...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Francois-Lavet, Vincent Fonteneau, Raphael Ernst, Damien Univ Liege Dept Elect Engn & Comp Sci B-4000 Liege Belgium

ISBN: (纸本)9781479945528

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the dynamic programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be advisable to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity.

关键词： dynamic programming electrolysis fuel cells hydrogen storage power markets Belgian market day-ahead electricity market dynamic programming principle high-capacity storage device hydrogen-based storage devices interseasonal price fluctuations maximum revenue estimation optimal revenue dynamic programming Electricity Electrochemical processes Fuel cells Hydrogen Hydrogen storage

来源：评论

学校读者我要写书评

暂无评论

Higher order Q-learning

Higher order Q-Learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Edwards, Ashley Pottenger, William M. Department of Computer Science University of Georgia Athens GA 30606 United States Department of Computer Science and DIMACS Rutgers University Piscataway NJ 08854 United States

ISBN: (纸本)9781424498888

Higher order learning is a statistical relational learning framework in which relationships between different instances of the same class are leveraged (Ganiz, Lytkin and Pottenger, 2009). learning can be supervised or unsupervised. In contrast, reinforcement learning (Q-learning) is a technique for learning in an unknown state space. Action selection is often based on a greedy, or epsilon greedy approach. The problem with this approach is that there is often a large amount of initial exploration before convergence. In this article we introduce a novel approach to this problem that treats a state space as a collection of data from which latent information can be extrapolated. From this data, we classify actions as leading to a high reward or low reward, and formulate behaviors based on this information. We provide experimental evidence that this technique drastically reduces the amount of exploration required in the initial stages of learning. We evaluate our algorithm in a well-known reinforcement learning domain, grid-world. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Supervised adaptive dynamic programming based adaptive cruise control

Supervised adaptive dynamic programming based adaptive cruis...

引用

symposium Series on Computational Intelligence, ieee SSCI2011 - 2011 ieee symposium on adaptive dynamic programming and reinforcement learning, adprl 2011

作者： Zhao, Dongbin Hu, Zhaohui Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy of Sciences Beijing 100190 China

ISBN: (纸本)9781424498888

This paper proposes a supervised adaptive dynamic programming (SADP) algorithm for the full range adaptive cruise control (ACC) system. The full range ACC system considers both the ACC situation in highway system and the stop and go (SG) situation in urban street way system. It can autonomously drive the host vehicle with desired speed and distance to the preceding vehicle in both situations. A traditional adaptive dynamic programming (ADP) algorithm is suited for this problem, but it suffers from the low learning efficiency. We propose the concept of inducing range to construct the supervisor and finally formulate the SADP algorithm, which greatly speeds up the learning efficiency. Several driving scenarios are designed and tested with the trained controller compared to traditional ones by simulation results, showing that trained SADP performs very well in all the scenarios, so that it provides an effective approach for the full range ACC problem. © 2011 ieee.

关键词： adaptive cruise control

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共24页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：