检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

228 篇 会议
4 篇 期刊文献

馆藏范围

232 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

98 篇 工学
- 93 篇 计算机科学与技术...
- 40 篇 软件工程
- 25 篇 电气工程
- 14 篇 控制科学与工程
- 4 篇 机械工程
- 1 篇 力学（可授工学、理...
- 1 篇 信息与通信工程
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
23 篇 理学
- 23 篇 数学
- 6 篇 统计学（可授理学、...
- 4 篇 系统科学
- 1 篇 化学
- 1 篇 大气科学
9 篇 管理学
- 7 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
52 篇 learning
46 篇 optimal control
37 篇 reinforcement le...
34 篇 learning (artifi...
27 篇 equations
22 篇 heuristic algori...
21 篇 control systems
20 篇 convergence
19 篇 neural networks
18 篇 function approxi...
17 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
14 篇 markov processes
14 篇 artificial neura...
14 篇 cost function
13 篇 stochastic proce...
12 篇 algorithm design...
12 篇 adaptive control

机构

5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 northeastern uni...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...
2 篇 college of infor...
2 篇 school of automa...

作者

7 篇 hado van hasselt
7 篇 lewis frank l.
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 liu derong
5 篇 huaguang zhang
5 篇 zhang huaguang
5 篇 derong liu
5 篇 warren b. powell
4 篇 xu xin
4 篇 vrabie draguna
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 yanhong luo
4 篇 damien ernst
4 篇 jan peters
4 篇 peters jan
4 篇 zhao dongbin
3 篇 xu hao
3 篇 martin riedmille...

语言

232 篇 英文

检索条件"任意字段=2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009"

共 232 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Neural-Network-Based reinforcement learning Controller for Nonlinear Systems with Non-symmetric Dead-zone Inputs

Neural-Network-Based Reinforcement Learning Controller for N...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Zhang, Xin Zhang, Huaguang Liu, Derong Kim, Yongsu Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Liaoning Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

ISBN: (纸本)9781424427611

A novel adaptive-critic-based NN controller using reinforcement learning is developed for a class of nonlinear systems with non-symmetric dead-zone inputs. The adaptive critic NN controller uses two NNs: the critic NN is used to approximate the strategic utility function, and the output of action NN is used to approximate the unknown nonlinear function and to minimize the strategic utility function. The tuning of the NNs is performed online without an explicit offline learning phase. The uniformly ultimate boundedness of the close-loop tracking error is derived by using using the Lyapunov method. Finally, a numerical example is included to show the effectiveness of the theoretical results.

关键词： Nonlinear systems

来源：评论

学校读者我要写书评

暂无评论

Multiagent reinforcement learning in extensive form games with complete information

Multiagent reinforcement learning in extensive form games wi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Akramizadeh, Ali Menhaj, Mohammad-B. Afshar, Ahmad Polytech Univ Tehran EE Dept Ctr Computat Intelligence & Large Scale Syst Tehran Iran

ISBN: (纸本)9781424427611

Recent developments in multiagent reinforcement learning, mostly concentrate on normal form games or restrictive hierarchical form games. In this paper, we use the well known Q-learning in extensive form games which agents have a fixed priority in action selection. We also introduce a new concept called associative Q-values which not only can be used in action selection, leading to a subgame perfect equilibrium, but also can be used in update rule which is proved to be convergent. Associative Q-values are the expected utility of an agent in a game situation which is an estimate of the value of the subgame perfect equilibrium point.

关键词： Multiagent reinforcement learning extensive form game game theory backward induction subgame perfect equilibrium exploration strategies

来源：评论

学校读者我要写书评

暂无评论

Using reward-weighted imitations for robot reinforcement learning

Using reward-weighted imitations for robot reinforcement lea...

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, adprl 2009

作者： Peters, Jan Kober, Jens Department of Empirical Inference and Machine Leartling Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tlibingen Germany

ISBN: (纸本)9781424427611

reinforcement learning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcement learning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-aCup task using a real Barrett WAMTM robot arm) and learning task-space control. © 2009 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Iterative Local dynamic programming

Iterative Local Dynamic Programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Todorov, Emanuel Tassa, Yuval Univ Calif San Diego Dept Cognit Sci La Jolla CA 92093 USA Hebrew Univ Jerusalem Ctr Neural Computat IL-91905 Jerusalem Israel

ISBN: (纸本)9781424427611

We develop an iterative local dynamic programming method (iLDP) applicable to stochastic optimal control problems in continuous high-dimensional state and action spaces. Such problems are common in the control of biological movement, but cannot be handled by existing methods. iLDP can be considered a generalization of Differential dynamic programming, inasmuch as: (a) we use general basis functions rather than quadratics to approximate the optimal value function;(b) we introduce a collocation method that dispenses with explicit differentiation of the cost and dynamics and ties iLDP to the Unscented Kalman filter;(c) we adapt the local function approximator to the propagated state covariance, thus increasing accuracy at more likely states. Convergence is similar to quasi-Netwon methods. We illustrate iLDP on several problems including the "swimmer" dynamical system which has 14 state and 4 control variables.

关键词： dynamical systems

来源：评论

学校读者我要写书评

暂无评论

The QV Family Compared to Other reinforcement learning Algorithms

The QV Family Compared to Other Reinforcement Learning Algor...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Wiering, Marco A. van Hasselt, Hado Univ Groningen Dept Artificial Intelligence NL-9700 AB Groningen Netherlands Univ Utrecht Intelligent Syst Grp NL-3508 TC Utrecht Netherlands

ISBN: (纸本)9781424427611

This paper describes several new online model-free reinforcement learning (RL) algorithms. We designed three new reinforcement algorithms, namely: QV2, QVMAX, and QV-MAX2, that are all based on the QV-learning algorithm, but in contrary to QV-learning, QVMAX and QVMAX2 are off-policy RL algorithms and QV2 is a new on-policy RL algorithm. We experimentally compare these algorithms to a large number of different RL algorithms, namely: Q-learning, Sarsa, R-learning, Actor-Critic, QV-learning, and ACLA. We show experiments on five maze problems of varying complexity. Furthermore, we show experimental results on the cart pole balancing problem. The results show that for different problems, there can be large performance differences between the different algorithms, and that there is not a single RL algorithm that always performs best, although on average QV-learning scores highest.

关键词： learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimal control for a class of nonlinear systems with state delay based on adaptive dynamic programming with ε-error bound

Optimal control for a class of nonlinear systems with state ...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Lin, Xiaofeng Cao, Nuyun Lin, Yuzhang Guangxi Univ Sch Elect Engn Nanning 530004 Peoples R China Tsinghua Univ Dept Elect Engn Beijing Peoples R China

ISBN: (纸本)9781467359252

In this paper, a finite-horizon epsilon-optimal control for a class of nonlinear systems with state delay is proposed by adaptive dynamic programming (ADP) algorithm. First of all, the performance index function is defined and the Hamilton-Jacobi-Bellman (HJB) equation is obtained for the problem, the convergence of the iterative algorithm is also presented. Then, ADP algorithm for finite-horizon optimal control is introduced with an epsilon-error bound so as to get the epsilon-optimal control, and BP neural network is used to implement ADP algorithm. At last, an example is given to demonstrate the effectiveness of the proposed algorithm.

关键词： adaptive dynamic programming state delay epsilon-optimal control finite time nonlinear systems

来源：评论

学校读者我要写书评

暂无评论

adaptive Optimal Control for Nonlinear Discrete-Time Systems

Adaptive Optimal Control for Nonlinear Discrete-Time Systems

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Qin, Chunbin Zhang, Huaguang Luo, Yanhong Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Peoples R China

ISBN: (纸本)9781467359252

This paper proposes an on-line near-optimal control scheme based on capabilities of neural networks (NNs), in function approximation, to attain the on-line solution of optimal control problem for nonlinear discrete-time systems. First, to solve the Hamilton-Jacobi-Bellman (HJB) equation forward-intime appearing in the optimal control problem, two neural networks are used to approximate the cost function and to compute the optimal control policy, respectively. And then, according to the Bellman's optimality principle and the adaptive technology, the on-line weight updating laws for the critic network and action network are derived, respectively. Further, considering NNs approximative errors, the stability analysis of the closed-loop system is demonstrated by Lyapunov theory. At last, a numerical example is provided to demonstrate the effectiveness of the proposed method.

关键词： adaptive optimal control Hamilton-Jacobi-Bellman equation neural network adaptive dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Scalarized Multi-Objective reinforcement learning: Novel Design Techniques

Scalarized Multi-Objective Reinforcement Learning: Novel Des...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Van Moffaert, Kristof Drugan, Madalina M. Nowe, Ann Vrije Univ Brussel Dept Comp Sci B-1050 Brussels Belgium

ISBN: (纸本)9781467359252

In multi-objective problems, it is key to find compromising solutions that balance different objectives. The linear scalarization function is often utilized to translate the multi-objective nature of a problem into a standard, single-objective problem. Generally, it is noted that such as linear combination can only find solutions in convex areas of the Pareto front, therefore making the method inapplicable in situations where the shape of the front is not known beforehand, as is often the case. We propose a non-linear scalarization function, called the Chebyshev scalarization function, as a basis for action selection strategies in multi-objective reinforcement learning. The Chebyshev scalarization method overcomes the flaws of the linear scalarization function as it can (i) discover Pareto optimal solutions regardless of the shape of the front, i. e. convex as well as non-convex, (ii) obtain a better spread amongst the set of Pareto optimal solutions and (iii) is not particularly dependent on the actual weights used.

关键词： design techniques learning (artificial intelligence) optimal solution Convexity flaws learning Linear Combination Solutions

来源：评论

学校读者我要写书评

暂无评论

Finite Horizon Stochastic Optimal Control of Uncertain Linear Networked Control System

Finite Horizon Stochastic Optimal Control of Uncertain Linea...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Xu, Hao Jagannathan, S. Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA

ISBN: (纸本)9781467359252

In this paper, finite horizon stochastic optimal control issue has been studied for linear networked control system (LNCS) in the presence of network imperfections such as network-induced delays and packet losses by using adaptive dynamic programming (ADP) approach. Due to an uncertainty in system dynamics resulting from network imperfections, the stochastic optimal control design uses a novel adaptive estimator (AE) to solve the optimal regulation of uncertain LNCS in a forward-in-time manner in contrast with backward-in-time Riccati equation-based optimal control with known system dynamics. Tuning law for unknown parameters of AE has been derived. Lyapunov theory is used to show that all the signals are uniformly ultimately bounded (UUB) with ultimate bounds being a function of initial values and final time. In addition, the estimated control input converges to optimal control input within finite horizon. Simulation results are included to show the effectiveness of the proposed scheme.

关键词： Networked Control System adaptive dynamics programming and reinforcement learning Finite horizon Stochastic Optimal Control adaptive Estimator

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning algorithm developed to model GenCo strategic bidding behavior in multidimensional and continuous state and action spaces

A reinforcement learning algorithm developed to model GenCo ...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Lau, Alfred Yong Fu Srinivasan, Dipti Reindl, Thomas Natl Univ Singapore Dept Elect Comp Engn 4 Engn Dr 3 Singapore 117576 Singapore Natl Univ Singapore Solar Energy Res Inst Singapore 117574 Singapore

ISBN: (纸本)9781467359252

The electricity market have provided a complex economic environment, and consequently have increased the requirement for advancement of learning methods. In the agent-based modeling and simulation framework of this economic system, the generation company's decision-making is modeled using reinforcement learning. Existing learning methods that models the generation company's strategic bidding behavior are not adapted to the non-stationary and non-Markovian environment involving multidimensional and continuous state and action spaces. This paper proposes a reinforcement learning method to overcome these limitations. The proposed method discovers the input space structure through the self-organizing map, exploits learned experience through Roth-Erev reinforcement learning and the explores through the actor critic map. Simulation results from experiments show that the proposed method outperforms Simulated Annealing Q-learning and Variant Roth-Erev reinforcement learning. The proposed method is a step towards more realistic agent learning in Agent-based Computational Economics.

关键词： reinforcement learning strategic bidding behavior agent-based modeling electricity market

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共24页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：