检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

748 篇 会议
271 篇 期刊文献
4 册 图书

馆藏范围

1,023 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

712 篇 工学
- 520 篇 计算机科学与技术...
- 381 篇 电气工程
- 278 篇 控制科学与工程
- 153 篇 软件工程
- 79 篇 信息与通信工程
- 40 篇 交通运输工程
- 23 篇 仪器科学与技术
- 20 篇 机械工程
- 9 篇 生物工程
- 8 篇 电子科学与技术（可...
- 7 篇 力学（可授工学、理...
- 7 篇 土木工程
- 6 篇 动力工程及工程热...
- 6 篇 石油与天然气工程
- 4 篇 生物医学工程（可授...
- 3 篇 材料科学与工程（可...
- 3 篇 化学工程与技术
- 3 篇 航空宇航科学与技...
- 3 篇 安全科学与工程
118 篇 理学
- 98 篇 数学
- 32 篇 系统科学
- 22 篇 统计学（可授理学、...
- 10 篇 生物学
- 8 篇 物理学
- 4 篇 化学
66 篇 管理学
- 63 篇 管理科学与工程(可...
- 14 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 4 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 医学
1 篇 教育学

主题

313 篇 reinforcement le...
216 篇 dynamic programm...
206 篇 optimal control
107 篇 adaptive dynamic...
104 篇 adaptive dynamic...
97 篇 learning
88 篇 neural networks
78 篇 heuristic algori...
68 篇 reinforcement le...
58 篇 learning (artifi...
54 篇 nonlinear system...
53 篇 convergence
51 篇 control systems
51 篇 mathematical mod...
48 篇 approximate dyna...
44 篇 approximation al...
43 篇 equations
42 篇 adaptive control
41 篇 artificial neura...
41 篇 cost function

机构

41 篇 chinese acad sci...
27 篇 univ rhode isl d...
17 篇 tianjin univ sch...
16 篇 univ sci & techn...
16 篇 univ illinois de...
15 篇 northeastern uni...
14 篇 beijing normal u...
13 篇 northeastern uni...
13 篇 guangdong univ t...
12 篇 northeastern uni...
9 篇 natl univ def te...
8 篇 ieee
8 篇 univ chinese aca...
7 篇 univ chinese aca...
7 篇 cent south univ ...
7 篇 southern univ sc...
7 篇 beijing univ tec...
6 篇 chinese acad sci...
6 篇 missouri univ sc...
5 篇 nanjing univ pos...

作者

54 篇 liu derong
37 篇 wei qinglai
29 篇 he haibo
22 篇 wang ding
21 篇 xu xin
19 篇 jiang zhong-ping
17 篇 lewis frank l.
17 篇 yang xiong
17 篇 zhang huaguang
17 篇 ni zhen
16 篇 zhao bo
15 篇 gao weinan
14 篇 zhao dongbin
13 篇 derong liu
13 篇 zhong xiangnan
12 篇 si jennie
10 篇 jagannathan s.
10 篇 dongbin zhao
10 篇 song ruizhuo
9 篇 abouheaf mohamme...

语言

992 篇 英文
25 篇 其他
6 篇 中文

检索条件"任意字段=IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning"

共 1023 条记录，以下是801-810 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

learning continuous-action control policies

Learning continuous-action control policies

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2009

作者： Pazis, Jason G. Lagoudakis, Michail Department of Electronic and Computer Engineering Technical University of Crete Chania Crete Greece

ISBN: (纸本)9781424427611

reinforcement learning for control in stochastic processes has received significant attention in the last few years. Several data-efficient methods, even for continuous state spaces, have been proposed, however most of them assume a small and discrete action space. While continuous action spaces are quite common in real-world problems, the most common approach still employed in practice is coarse discretization of the action space. This paper presents a novel, computationally-efficient method, called adaptive Action Modification, for realizing continuous-action policies, using binary decisions corresponding to adaptive increment or decrement changes in the values of the continuous action variables. The proposed approach essentially approximates any continuous action space to arbitrary resolution and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies. Our approach is coupled with three well-known reinforcement learning algorithms (Q-learning, Fitted Q-Iteration, and Least-Squares Policy Iteration) and its use and properties are thoroughly investigated and demonstrated on the continuous state-action Inverted Pendulum and Bicycle Balancing and Riding domains. © 2009 ieee.

关键词： Inverted pendulum

来源：评论

学校读者我要写书评

暂无评论

A theoretical and empirical analysis of expected sarsa

A theoretical and empirical analysis of expected sarsa

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2009

作者： Harm van, Seijen Hado van, Hasselt Whiteson, Shimon Wiering, Marco Integrated Systems group TNO Defense Safety and Security Hague Netherlands Intelligent Systems Group Utrecht University Utrecht Netherlands Intelligent Autonomous Systems Group University of Amsterdam Amsterdam Netherlands Department of Artificial Intelligence University of Groningen Groningen Netherlands

ISBN: (纸本)9781424427611

This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic onpolicy temporal-difference method for model-free reinforcement learning. Expected Sarsa exploits knowledge about stochasticity in the behavior policy to perform updates with lower variance. Doing so allows for higher learning rates and thus faster learning. In deterministic environments, Expected Sarsa's updates have zero variance, enabling a learning rate of 1. We prove that Expected Sarsa converges under the same conditions as Sarsa and formulate specific hypotheses about when Expected Sarsa will outperform Sarsa and Q-learning. Experiments in multiple domains confirm these hypotheses and demonstrate that Expected Sarsa has significant advantages over these more commonly used methods. © 2009 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

ADHDP(?) Strategies based coordinated ramps metering with queuing consideration

ADHDP(?) Strategies based coordinated ramps metering with qu...

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2009

作者： Bai, Xuerui Zhao, Dongbin Yi, Jianqiang Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy of Sciences. 95 Zhongguancun East Road. Haidian District Beijing 100080 China

ISBN: (纸本)9781424427611

Ramp metering has been developed as a traffic management strategy to alleviate congestion on freeways. Most ramp metering control algorithms are concerned without queuing consideration, because it's still a tough job to deal with the problems of coordinated multiple ramps metering with queuing consideration. In this paper, on the basis of our previous studies, we use action-dependent heuristic dynamic programming based on eligibility traces (ADHDP(λ)) to solve local ramp metering and multiple ramps metering problems with queuing consideration. First, for the local ramp metering problem, we establish a comprehensive performance index which considers both traffic density and on-ramp queue length. Second, for the multiple ramps metering problem, based on ADHDP(λ), the coordinated ramps metering and regulating queue lengths are achieved at the same time. Simulation studies on a hypothetical freeway are reported. It is shown that the proposed control scheme is efficient. © 2009 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Retrospective on adaptive dynamic programming for Control

A Retrospective on Adaptive Dynamic Programming for Control

引用

International Joint Conference on Neural Networks

作者： Lendaris, George G. Portland State Univ Syst Sci Grad Program Portland OR 97201 USA

ISBN: (纸本)9781424435494

Some three decades ago, certain computational intelligence methods of reinforcement learning were recognized as implementing an approximation of Bellman's dynamic programming method, which is known in the controls community as an important tool for designing optimal control policies for nonlinear plants and sequential decision making. Significant theoretical and practical developments have occurred within this arena, mostly in the past decade, with the methodology now usually referred to as adaptive dynamic programming (ADP). The objective of this paper is to provide a retrospective of selected threads of such developments. In addition, a commentary is offered concerning present status of ADP, and threads for future research and development within the controls field are suggested.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Pattern Driven dynamic Scheduling Approach using reinforcement learning

Pattern Driven Dynamic Scheduling Approach using Reinforceme...

引用

ieee International Conference on Automation and Logistics

作者： Wei Yingzi Jiang Xinli Hao Pingbo Gu Kanfeng Shenyang Ligong Univ Shenyang 110168 Peoples R China Chinese Acad Sci Shenyang Inst Automat Shenyang 110016 Peoples R China

ISBN: (纸本)9781424447947

Production scheduling is critical for manufacturing system. Dispatching rules are usually applied dynamically to schedule the job in the dynamic job-shop. The paper presents an adaptive iterative scheduling algorithm that operates dynamically to schedule the job in the dynamic job-shop. In order to get adaptive behavior, the reinforcement learning system is done with the phased Q-learning by defining the intermediate state pattern. We convert the scheduling problem into reinforcement learning problems by constructing a multi-phase dynamic programming process, including the definition of state representation, actions and the reward function. We use five heuristic rules, CNP-CR, CNP-FCFS, CNP-EFT, CNP-EDD and CNP-SPT, as actions and the scheduling objective: minimization of maximum completion time. So a complex dynamic scheduling problem can be divided into a sequential sub-problem easier to solve. We also analyze the time and the solution and present some experimental results.

关键词： reinforcement learning Contract Net Protocol (CNP) State Pattern dynamic Scheduling

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Feedback Control

Adaptive Dynamic Programming for Feedback Control

引用

7th Asian Control Conference (ASCC 2009)

作者： Lewis, Frank L. Vrabie, Draguna Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424454402

Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcement learning can capture notions of optimal behavior occurring in natural systems. We describe mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming. These give us insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior. Relations are show between ADP and adaptive control.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Strategy for Converging dynamic Action Policies

A Strategy for Converging Dynamic Action Policies

引用

ieee symposium on Intelligent Agents

作者： Ribeiro, Richardson Borges, Andre P. Koerich, Alessandro L. Scalabrin, Edson E. Enembreck, Fabricio Univ Contestado UnC2 Av Nereu Ramos 1071 BR-89300000 Mafra SC Brazil Pontifical Catholic Univ Parana PUCPR Grad Program Comp Sci PPGIa B-1155 Parana Brazil

ISBN: (纸本)9781424427673

In this paper we propose a novel strategy for converging dynamic policies generated by adaptive agents, which receive and accumulate rewards for their actions. The goal of the proposed strategy is to speed up the convergence of such agents to a good policy in dynamic environments. Since it is difficult to have the good value for a state due to the continuous changing in the environment, previous policies are kept in memory for reuse in future policies, avoiding delays or unexpected speedups in the agent's learning. Experimental results on dynamic environments with different policies have shown that the proposed strategy is able to speed up the convergence of the agent while achieving good action policies.

关键词： adaptive Agents dynamic Environments reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive computation of optimal nonrandomized policies in constrained average-reward MDPs

Adaptive computation of optimal nonrandomized policies in co...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Eugene A. Feinberg Department of Applied Mathematics and Statistics Stony Brook University Stony Brook NY USA

This paper deals with computation of optimal nonrandomized nonstationary policies and mixed stationary policies for average-reward Markov decision processes with multiple criteria and constraints. We consider problems with finite state and action sets satisfying the unichain condition. The described procedure for computing optimal nonrandomized policies can also be used for adaptive control problems.

关键词： Frequency adaptive control Vectors Linear programming Constraint theory Mathematics Statistics State-space methods Probability distribution

来源：评论

学校读者我要写书评

暂无评论

Feature discovery in approximate dynamic programming

Feature discovery in approximate dynamic programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Philippe Preux Sertan Girgin Manuel Loth Laboratoire dInformatique Fondamentale de Lille (Computer Science Laboratory associated to the CNRS) and the INRIAINRIA Université de Lille France

Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcement learning in particular. Based on our recent work on feature discovery in the context of reinforcement learning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamic programming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.

关键词： dynamic programming Function approximation Games Machine learning Acceleration Computer science Software tools Artificial intelligence Velocity control Control systems

来源：评论

学校读者我要写书评

暂无评论

Using reward-weighted imitation for robot reinforcement learning

Using reward-weighted imitation for robot Reinforcement Lear...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jan Peters Jens Kober Department of Empirical Inference and Machine Learning Max-Planck Institute of Biological Cybernetics Tubingen Germany

reinforcement learning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcement learning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-a-Cup task using a real Barrett WAM robot arm) and learning task-space control.

关键词： Robots learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共103页 << < 77 78 79 80 81 82 83 84 85 86 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：