检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

229 篇 会议
18 篇 期刊文献

馆藏范围

247 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

113 篇 工学
- 103 篇 计算机科学与技术...
- 42 篇 软件工程
- 38 篇 电气工程
- 23 篇 控制科学与工程
- 5 篇 信息与通信工程
- 3 篇 机械工程
- 2 篇 力学（可授工学、理...
- 1 篇 仪器科学与技术
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
27 篇 理学
- 25 篇 数学
- 7 篇 系统科学
- 6 篇 统计学（可授理学、...
- 1 篇 物理学
- 1 篇 化学
- 1 篇 大气科学
10 篇 管理学
- 8 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
54 篇 optimal control
51 篇 learning
44 篇 reinforcement le...
35 篇 learning (artifi...
27 篇 equations
25 篇 neural networks
22 篇 heuristic algori...
20 篇 convergence
20 篇 control systems
18 篇 function approxi...
18 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
15 篇 cost function
14 篇 markov processes
14 篇 nonlinear system...
14 篇 artificial neura...
13 篇 stochastic proce...
12 篇 adaptive dynamic...

机构

10 篇 chinese acad sci...
5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 department of el...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 natl univ def te...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...

作者

13 篇 liu derong
7 篇 hado van hasselt
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 zhao dongbin
5 篇 xu xin
5 篇 lewis frank l.
5 篇 huaguang zhang
5 篇 wei qinglai
5 篇 derong liu
5 篇 warren b. powell
4 篇 haibo he
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 zhang huaguang
4 篇 ni zhen
4 篇 yanhong luo
4 篇 wang ding
4 篇 he haibo
4 篇 damien ernst

语言

246 篇 英文
1 篇 其他

检索条件"任意字段=2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014"

共 247 条记录，以下是171-180 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

dynamic lead time promising

Dynamic lead time promising

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Matthew J. Reindorp Michael C. Fu Department of Industrial Engineering and Innovation Sciences Eindhovan University of Technology Netherlands Robert H. Smith School of Business and Institute of Systems Research University of Maryland USA

We consider a make-to-order business that serves customers in multiple priority classes. Orders from customers in higher classes bring greater revenue, but they expect shorter lead times than customers in lower classes. In making lead time promises, the firm must recognize preexisting order commitments, uncertainty over future demand from each class, and the possibility of supply chain disruptions. We model this scenario as a Markov decision problem and use reinforcement learning to determine the firm's lead time policy. In order to achieve tractability on large problems, we utilize a sequential decision-making approach that effectively allows us to eliminate one dimension from the state space of the system. Initial numerical results from the sequential dynamic approach suggest that the resulting policies more closely approximate optimal policies than static optimization approaches.

关键词： Q factor Markov processes Schedules Nickel learning Supply chains

来源：评论

学校读者我要写书评

暂无评论

Analyzing collective behavior in evolutionary swarm robotic systems based on an ethological approach

Analyzing collective behavior in evolutionary swarm robotic ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Toshiyuki Yasuda Nanami Wada Kazuhiro Ohkura Yoshiyuki Matsumura Graduate School of Engineering Hiroshima University Higashi-Hiroshima JAPAN Faculty of Textile Science and Technology Shinshu University Ueda Nagano JAPAN

Swarm robotic systems are a type of multi-robot systems which generally consist of many homogeneous autonomous robots without any type of global controllers. Swarm robotics aims at designing desired collective behaviors through many interactions with other robots or their environment. Since a robotic swarm is controlled by an emergent way such as a result of self-organization by using robot learning or artificial evolution, no method has been known to grasp the macroscopic collective behavior in a practical sense, according to the best of our knowledge. In this paper, we propose a novel method for analyzing the collective behavior by introducing the concept of behavioral sequence, which stems from ethology. Analysis about behavioral sequence reveals the transition of robot's action from the viewpoint of specialization and helps us to understand the role of subgroups in a robotic swarm. Applying this method, we observe collective behavior in a foraging task of autonomous mobile robots.

关键词： Robot kinematics Robot sensing systems Mobile robots Vectors Resource management dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Using reward-weighted imitation for robot reinforcement learning

Using reward-weighted imitation for robot Reinforcement Lear...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Jan Peters Jens Kober Department of Empirical Inference and Machine Learning Max-Planck Institute of Biological Cybernetics Tubingen Germany

reinforcement learning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcement learning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-a-Cup task using a real Barrett WAM robot arm) and learning task-space control.

关键词： Robots learning

来源：评论

学校读者我要写书评

暂无评论

Safe reinforcement learning in high-risk tasks through policy improvement

Safe reinforcement learning in high-risk tasks through polic...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Francisco Javier Garcia Polo Fernando Fernandez Rebollo Computer Science Department Universidad Carlos III de Madrid Madrid Spain

reinforcement learning (RL) methods are widely used for dynamic control tasks. In many cases, these are high risk tasks where the trial and error process may select actions which execution from unsafe states can be catastrophic. In addition, many of these tasks have continuous state and action spaces, making the learning problem harder and unapproachable with conventional RL algorithms. So, when the agent begins to interact with a risky and large state-action space environment, an important question arises: how can we avoid that the exploration of the state-action space causes damages in the learning (or other) systems. In this paper, we define the concept of risk and address the problem of safe exploration in the context of RL. Our notion of safety is concerned with states that can lead to damage. Moreover, we introduce an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. We report experimental results using the helicopter hovering task from the RL Competition.

关键词： Helicopters Computer crashes Trajectory Robots Safety Robustness Mathematical model

来源：评论

学校读者我要写书评

暂无评论

Event-Triggered reinforcement learning Approach for Unknown Nonlinear Continuous-Time System

Event-Triggered Reinforcement Learning Approach for Unknown ...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Zhong, Xiangnan Ni, Zhen He, Haibo Xu, Xin Zhao, Dongbin Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

ISBN: (纸本)9781479914845

This paper provides an adaptive event-triggered method using adaptive dynamic programming (ADP) for the nonlinear continuous-time system. Comparing to the traditional method with fixed sampling period, the event-triggered method samples the state only when an event is triggered and therefore the computational cost is reduced. We demonstrate the theoretical analysis on the stability of the event-triggered method, and integrate it with the ADP approach. The system dynamics are assumed unknown. The corresponding ADP algorithm is given and the neural network techniques are applied to implement this method. The simulation results verify the theoretical analysis and justify the efficiency of the proposed event-triggered technique using the ADP approach.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Feature discovery in approximate dynamic programming

Feature discovery in approximate dynamic programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Philippe Preux Sertan Girgin Manuel Loth Laboratoire dInformatique Fondamentale de Lille (Computer Science Laboratory associated to the CNRS) and the INRIAINRIA Université de Lille France

Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcement learning in particular. Based on our recent work on feature discovery in the context of reinforcement learning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamic programming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.

关键词： dynamic programming Function approximation Games Machine learning Acceleration Computer science Software tools Artificial intelligence Velocity control Control systems

来源：评论

学校读者我要写书评

暂无评论

The Knowledge Gradient Policy for Offline learning with Independent Normal Rewards

The Knowledge Gradient Policy for Offline Learning with Inde...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Peter Frazier Warren Powell Department of Operations Research and Financial Engineering Princeton University Engineering Princeton NJ USA

We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simula... 详细信息

关键词： Mirrors Knowledge engineering Bandwidth Time measurement Response surface methodology Operations research Bayesian methods Performance evaluation dynamic programming learning

来源：评论

学校读者我要写书评

暂无评论

Toward effective combination of off-line and on-line training in ADP framework

Toward effective combination of off-line and on-line trainin...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Danil Prokhorov Toyota Technical Center Ann Arbor MI USA

We are interested in finding the most effective combination between off-line and on-line/real-time training in approximate dynamic programming. We introduce our approach of combining proven off-line methods of training for robustness with a group of on-line methods. Training for robustness is carried out on reasonably accurate models with the multi-stream Kalman filter method (Feldkamp et al., 1998), whereas on-line adaptation is performed either with the help of a critic or by methods resembling reinforcement learning. We also illustrate importance of using recurrent neural networks for both controller/actor and critic

关键词： Neurocontrollers Robustness Recurrent neural networks Neural networks adaptive control dynamic programming Robust control Programmable control learning Uncertainty

来源：评论

学校读者我要写书评

暂无评论

Inferring bounds on the performance of a control policy from a sample of trajectories

Inferring bounds on the performance of a control policy from...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Raphael Fonteneau Susan Murphy Louis Wehenkel Damien Ernst Department of Electrical Engineering and Computer Science University of Liège Belgium University of Michigan USA

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density.

关键词： Control systems Computational modeling Polynomials Optimal control dynamic programming Predictive models Fingers Upper bound Biomedical engineering Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design

Using ADP to Understand and Replicate Brain Intelligence: th...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Paul J. Werbos National Science Foundation Arlington VA USA

Since the 1960's the author proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptive dynamic programming (ADP) - like "reinforcement learning" but based on approximating the Bellman equation and allowing the controller to know its utility function. Growing empirical evidence on the brain supports this approach. adaptive critic systems now meet tough engineering challenges and provide a kind of first-generation model of the brain. Lewis, Prokhorov and myself have early second-generation work. Mammal brains possess three core capabilities - creativity/imagination and ways to manage spatial and temporal complexity - even beyond the second generation. This paper reviews previous progress, and describes new tools and approaches to overcome the spatial complexity gap.

关键词： adaptive systems Intelligent structures Buildings Programmable control adaptive control dynamic programming learning Equations Control systems Brain modeling

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共25页 << < 14 15 16 17 18 19 20 21 22 23 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：