检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

228 篇 会议
4 篇 期刊文献

馆藏范围

232 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

98 篇 工学
- 93 篇 计算机科学与技术...
- 40 篇 软件工程
- 25 篇 电气工程
- 14 篇 控制科学与工程
- 4 篇 机械工程
- 1 篇 力学（可授工学、理...
- 1 篇 信息与通信工程
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
23 篇 理学
- 23 篇 数学
- 6 篇 统计学（可授理学、...
- 4 篇 系统科学
- 1 篇 化学
- 1 篇 大气科学
9 篇 管理学
- 7 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
52 篇 learning
46 篇 optimal control
37 篇 reinforcement le...
34 篇 learning (artifi...
27 篇 equations
22 篇 heuristic algori...
21 篇 control systems
20 篇 convergence
19 篇 neural networks
18 篇 function approxi...
17 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
14 篇 markov processes
14 篇 artificial neura...
14 篇 cost function
13 篇 stochastic proce...
12 篇 algorithm design...
12 篇 adaptive control

机构

5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 northeastern uni...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...
2 篇 college of infor...
2 篇 school of automa...

作者

7 篇 hado van hasselt
7 篇 lewis frank l.
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 liu derong
5 篇 huaguang zhang
5 篇 zhang huaguang
5 篇 derong liu
5 篇 warren b. powell
4 篇 xu xin
4 篇 vrabie draguna
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 yanhong luo
4 篇 damien ernst
4 篇 jan peters
4 篇 peters jan
4 篇 zhao dongbin
3 篇 xu hao
3 篇 martin riedmille...

语言

232 篇 英文

检索条件"任意字段=2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009"

共 232 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Discrete-time adaptive dynamic programming using wavelet basis function neural networks

Discrete-time adaptive dynamic programming using wavelet bas...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Jin, Ning Liu, Derong Huang, Ting Pang, Zhongyu Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

ISBN: (纸本)9781424407064

dynamic programming for discrete time systems is difficult due to the "curse of dimensionality": one has to find a series of control actions that must be taken in sequence, hoping that this sequence will lead to the optimal performance cost, but the total cost of those actions will be unknown until the end of that sequence. In this paper, we present our work on adaptive dynamic programming (ADP) for nonlinear discrete time system using neural networks. The neural network we adopted here is the wavelet basis function (WBF) neural network. We will exam the performance of an ADP algorithm using WBF neural networks. The comparison shows that when WBF neural networks are employed, the ADP algorithm gives faster training speed than when PBF neural networks are employed.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Using ADP to understand and replicate brain intelligence: the next level design

Using ADP to understand and replicate brain intelligence: th...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Werbos, Paul J. Natl Sci Fdn Arlington VA 22203 USA

ISBN: (纸本)9781424407064

Since the 1960's I proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptive dynamic programming (ADP) - like "reinforcement learning" but based on approximating the Bellman equation and allowing the controller to know its utility function. Growing empirical evidence on the brain supports this approach. adaptive critic systems now meet tough engineering challenges and provide a kind of first-generation model of the brain. Lewis, Prokhorov and myself have early second-generation work. Mammal brains possess three core capabilities creativity/imagination and ways to manage spatial and temporal complexity - even beyond the second generation. This paper reviews previous progress, and describes new tools and approaches to overcome the spatial complexity gap.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Protecting against evaluation overfitting in empirical reinforcement learning

Protecting against evaluation overfitting in empirical reinf...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Whiteson, Shimon Tanner, Brian Taylor, Matthew E. Stone, Peter Informatics Institute University of Amsterdam Netherlands Department of Computing Science University of Alberta Canada Department of Computer Science Lafayette College United States Department of Computer Science University of Texas Austin United States

ISBN: (纸本)9781424498888

Empirical evaluations play an important role in machine learning. However, the usefulness of any evaluation depends on the empirical methodology employed. Designing good empirical methodologies is difficult in part because agents can overfit test evaluations and thereby obtain misleadingly high scores. We argue that reinforcement learning is particularly vulnerable to environment overfitting and propose as a remedy generalized methodologies, in which evaluations are based on multiple environments sampled from a distribution. In addition, we consider how to summarize performance when scores from different environments may not have commensurate values. Finally, we present proof-of-concept results demonstrating how these methodologies can validate an intuitively useful range-adaptive tile coding method. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Convergence of Value Iterations for Total-Cost MDPs and POMDPs with General State and Action Sets

Convergence of Value Iterations for Total-Cost MDPs and POMD...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Feinberg, Eugene A. Kasyanov, Pavlo O. Zgurovsky, Michael Z. SUNY Stony Brook Dept Appl Math & Stat Stony Brook NY 11794 USA Natl Tech Univ Ukraine Kyiv Polytech Inst Inst Appl Syst Anal UA-03056 Kiev Ukraine Natl Tech Univ Ukraine Kyiv Polytech Inst UA-03056 Kiev Ukraine

ISBN: (纸本)9781479945528

This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.

关键词： Markov processes convergence of numerical methods decision making dynamic programming iterative methods Borel state COMDPs Markov decision processes POMDPs action sets completely observable MDPs dynamic programming algorithm general state infinite state partially observable MDPs sufficient condition total-cost MDPs unbounded one-step cost functions value iterations convergence Convergence Cost function Equations Extraterrestrial measurements Kernel Markov chain dynamic programming algorithm convergence of numerical methods Extraterrestrial measurements iterative methods Converge Cost functions dynamic programming SETTING Sufficient conditions Kernel

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning algorithms for solving classification problems

Reinforcement learning algorithms for solving classification...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Wiering, Marco A. Van Hasselt, Hado Pietersma, Auke-Dirk Schomaker, Lambert Dept. of Artificial Intelligence University of Groningen Netherlands Multi-agent and Adaptive Computation Centrum Wiskunde en Informatica Netherlands

ISBN: (纸本)9781424498888

We describe a new framework for applying reinforcement learning (RL) algorithms to solve classification tasks by letting an agent act on the inputs and learn value functions. This paper describes how classification problems can be modeled using classification Markov decision processes and introduces the Max-Min ACLA algorithm, an extension of the novel RL algorithm called actor-critic learning automaton (ACLA). Experiments are performed using 8 datasets from the UCI repository, where our RL method is combined with multi-layer perceptrons that serve as function approximators. The RL method is compared to conventional multi-layer perceptrons and support vector machines and the results show that our method slightly outperforms the multi-layer perceptron and performs equally well as the support vector machine. Finally, many possible extensions are described to our basic method, so that much future research can be done to make the proposed method even better. © 2011 ieee.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Using reward-weighted regression for reinforcement learning of task space control

Using reward-weighted regression for reinforcement learning ...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Peters, Jan Schaal, Stefan Univ So Calif Los Angeles CA 90089 USA

ISBN: (纸本)9781424407064

Many robot control problems of practical importance, including task or operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

dynamic lead time promising

Dynamic lead time promising

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Reindorp, Matthew J. Fu, Michael C. Department of Industrial Engineering and Innovation Sciences Eindhoven University of Technology Netherlands Robert H. Smith School of Business Institute for Systems Research University of Maryland United States

ISBN: (纸本)9781424498888

We consider a make-to-order business that serves customers in multiple priority classes. Orders from customers in higher classes bring greater revenue, but they expect shorter lead times than customers in lower classes. In making lead time promises, the firm must recognize preexisting order commitments, uncertainty over future demand from each class, and the possibility of supply chain disruptions. We model this scenario as a Markov decision problem and use reinforcement learning to determine the firm's lead time policy. In order to achieve tractability on large problems, we utilize a sequential decision-making approach that effectively allows us to eliminate one dimension from the state space of the system. Initial numerical results from the sequential dynamic approach suggest that the resulting policies more closely approximate optimal policies than static optimization approaches. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

On learning with imperfect representations

On learning with imperfect representations

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Kalyanakrishnan, Shivaram Stone, Peter Department of Computer Science University of Texas at Austin 1616 Guadalupe St Austin TX 78701 United States

ISBN: (纸本)9781424498888

In this paper we present a perspective on the relationship between learning and representation in sequential decision making tasks. We undertake a brief survey of existing real-world applications, which demonstrates that the classical tabular representation seldom applies in practice. Specifically, several practical tasks suffer from state aliasing, and most demand some form of generalization and function approximation. Coping with these representational aspects thus becomes an important direction for furthering the advent of reinforcement learning in practice. The central thesis we present in this position paper is that in practice, learning methods specifically developed to work with imperfect representations are likely to perform better than those developed for perfect representations and then applied in imperfect- representation settings. We specify an evaluation criterion for learning methods in practice, and propose a framework for their synthesis. In particular, we highlight the degrees of representational bias prevalent in different learning methods. We reference a variety of relevant literature as a background for this introspective essay. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-based Optimal Control Considering L Computation Time Delay of Linear Discrete-time Systems

Reinforcement Learning-based Optimal Control Considering <i>...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Fujita, Taishi Ushio, Toshimitsu

ISBN: (纸本)9781479945528

In embedded control systems, the control input is computed based on sensing data of a plant in a processor and there is a delay, called the computation time delay, due to the computation and the data transmission. When we design an optimal controller, we need to take the delay into account to achieve its optimality. Moreover, in the case where it is difficult to identify a mathematical model of the plant, a model free approach is useful. Especially, the reinforcement learning-based approach has been much attention to in the design of an adaptive optimal controller. In this paper, we assume that the plant is a linear system but the parameters of the plant are unknown. Then, we apply the reinforcement learning to the design of an adaptive optimal digital controller with taking the computation time delay into consideration. First, we consider the case where all states of the plant are observed, and it takes L times to update the control input. An optimal feedback gain is learned from sequences of a pair of the state and the control input. Next, we consider the case where the control input is determined from outputs of the plant. We cannot use an observer to estimate the state of the plant since the parameters of the plant are unknown. So, we use a data-based control approach for the estimation. Finally, we apply the proposed adaptive optimal controller to attitude control of a quadrotor at the hovering state and show its efficiency by simulation.

关键词： adaptive control control engineering computing control system synthesis data communication delays discrete time systems embedded systems feedback learning (artificial intelligence) linear systems optimal control parameter estimation state estimation L-computation time delay adaptive optimal digital controller attitude control data transmission data-based control approach embedded control systems linear discrete-time systems linear system mathematical model model free approach optimal feedback gain reinforcement learning Adaptation models Delay effects Optimal control Output feedback Propellers State feedback discrete time systems Linear system Optimal control Parameter estimation learning (artificial intelligence) attitude control data transmission control engineering computing PROPELLER Delay effects control input control system synthesis data communication State feedback plants

来源：评论

学校读者我要写书评

暂无评论

Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark

Improved neural fitted Q iteration applied to a novel comput...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Gabel, Thomas Lutz, Christian Riedmiller, Martin Machine Learning Lab Department of Computer Science University of Freiburg 79110 Freiburg Germany

ISBN: (纸本)9781424498888

Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共24页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：