检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

748 篇 会议
271 篇 期刊文献
4 册 图书

馆藏范围

1,023 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

712 篇 工学
- 520 篇 计算机科学与技术...
- 381 篇 电气工程
- 278 篇 控制科学与工程
- 153 篇 软件工程
- 79 篇 信息与通信工程
- 40 篇 交通运输工程
- 23 篇 仪器科学与技术
- 20 篇 机械工程
- 9 篇 生物工程
- 8 篇 电子科学与技术（可...
- 7 篇 力学（可授工学、理...
- 7 篇 土木工程
- 6 篇 动力工程及工程热...
- 6 篇 石油与天然气工程
- 4 篇 生物医学工程（可授...
- 3 篇 材料科学与工程（可...
- 3 篇 化学工程与技术
- 3 篇 航空宇航科学与技...
- 3 篇 安全科学与工程
118 篇 理学
- 98 篇 数学
- 32 篇 系统科学
- 22 篇 统计学（可授理学、...
- 10 篇 生物学
- 8 篇 物理学
- 4 篇 化学
66 篇 管理学
- 63 篇 管理科学与工程(可...
- 14 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 4 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 医学
1 篇 教育学

主题

313 篇 reinforcement le...
216 篇 dynamic programm...
206 篇 optimal control
107 篇 adaptive dynamic...
104 篇 adaptive dynamic...
97 篇 learning
88 篇 neural networks
78 篇 heuristic algori...
68 篇 reinforcement le...
58 篇 learning (artifi...
54 篇 nonlinear system...
53 篇 convergence
51 篇 control systems
51 篇 mathematical mod...
48 篇 approximate dyna...
44 篇 approximation al...
43 篇 equations
42 篇 adaptive control
41 篇 artificial neura...
41 篇 cost function

机构

41 篇 chinese acad sci...
27 篇 univ rhode isl d...
17 篇 tianjin univ sch...
16 篇 univ sci & techn...
16 篇 univ illinois de...
15 篇 northeastern uni...
14 篇 beijing normal u...
13 篇 northeastern uni...
13 篇 guangdong univ t...
12 篇 northeastern uni...
9 篇 natl univ def te...
8 篇 ieee
8 篇 univ chinese aca...
7 篇 univ chinese aca...
7 篇 cent south univ ...
7 篇 southern univ sc...
7 篇 beijing univ tec...
6 篇 chinese acad sci...
6 篇 missouri univ sc...
5 篇 nanjing univ pos...

作者

54 篇 liu derong
37 篇 wei qinglai
29 篇 he haibo
22 篇 wang ding
21 篇 xu xin
19 篇 jiang zhong-ping
17 篇 lewis frank l.
17 篇 yang xiong
17 篇 zhang huaguang
17 篇 ni zhen
16 篇 zhao bo
15 篇 gao weinan
14 篇 zhao dongbin
13 篇 derong liu
13 篇 zhong xiangnan
12 篇 si jennie
10 篇 jagannathan s.
10 篇 dongbin zhao
10 篇 song ruizhuo
9 篇 abouheaf mohamme...

语言

992 篇 英文
25 篇 其他
6 篇 中文

检索条件"任意字段=IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning"

共 1023 条记录，以下是821-830 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Efficient data reuse in value function approximation

Efficient data reuse in value function approximation

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hirotaka Hachiya Takayuki Akiyama Masashi Sugiyama Jan Peters Department of Computer Science Tokyo Institute of Technology Meguro Tokyo Japan Department Schölkopf Max-Planck Institute of Biological Cybernetics Tubingen Germany

Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy that is different from the currently optimized policy. A common approach is to use importance sampling techniques for compensating for the bias of value function estimators caused by the difference between the data-sampling policy and the target policy. However, existing off-policy methods often do not take the variance of the value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a variant of cross-validation. The usefulness of the proposed approach is demonstrated through simulated swing-up inverted-pendulum problem.

关键词： Function approximation Monte Carlo methods learning Sampling methods Approximation error Programmable control adaptive control Performance evaluation Costs Computer science

来源：评论

学校读者我要写书评

暂无评论

learning continuous-action control policies

Learning continuous-action control policies

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jason Pazis Michail G. Lagoudakis Department of Electronic and Computer Engineering Technical University of Crete Crete Greece

reinforcement learning for control in stochastic processes has received significant attention in the last few years. Several data-efficient methods, even for continuous state spaces, have been proposed, however most of them assume a small and discrete action space. While continuous action spaces are quite common in real-world problems, the most common approach still employed in practice is coarse discretization of the action space. This paper presents a novel, computationally-efficient method, called adaptive action modification, for realizing continuous-action policies, using binary decisions corresponding to adaptive increment or decrement changes in the values of the continuous action variables. The proposed approach essentially approximates any continuous action space to arbitrary resolution and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies. Our approach is coupled with three well-known reinforcement learning algorithms (Q-learning, fitted Q-iteration, and least-squares policy iteration) and its use and properties are thoroughly investigated and demonstrated on the continuous state-action inverted pendulum and bicycle balancing and riding domains.

关键词： learning Stochastic processes State-space methods Process control Bicycles Vents Torque Muscles Vehicles Current supplies

来源：评论

学校读者我要写书评

暂无评论

A theoretical and empirical analysis of Expected Sarsa

A theoretical and empirical analysis of Expected Sarsa

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Harm van Seijen Hado van Hasselt Shimon Whiteson Marco Wiering Integrated Systems group TNO Defence Safety and Security The Hague Netherlands Intelligent Systems Group University of Utrecht Utrecht Netherlands Intelligent Autonomous Systems Group University of Amsterdam Amsterdam Netherlands Department of Artificial Intelligence University of Groningam Groningen Netherlands

This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic on-policy temporal-difference method for model-free reinforcement learning. Expected Sarsa exploits knowledge about stochasticity in the behavior policy to perform updates with lower variance. Doing so allows for higher learning rates and thus faster learning. In deterministic environments, Expected Sarsas updates have zero variance, enabling a learning rate of 1. We prove that Expected Sarsa converges under the same conditions as Sarsa and formulate specific hypotheses about when Expected Sarsa will outperform Sarsa and Q-learning. Experiments in multiple domains confirm these hypotheses and demonstrate that Expected Sarsa has significant advantages over these more commonly used methods.

关键词： Artificial intelligence Convergence Intelligent systems Optimal control Supervised learning Robot control Probability distribution dynamic programming State feedback State estimation

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Control of a Real Mobile Robot Using Approximate Policy Iteration

引用

6th International symposium on Neural Networks

作者： Zhang, Pengchen Xu, Xin Liu, Chunming Yuan, Qiping Natl Univ Def Technol Inst Automat Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783642015120

Machine learning for mobile robots has attracted lots of research interests in recent years. However, there are still many challenges to apply learning techniques in real mobile robots, e.g., generalization ill Continuous spaces, learning efficiency and convergence, etc. In this paper, a reinforcement learning path-following control strategy based oil approximate policy iteration (API) is developed for a real mobile robot. It has some advantages such as optimized control policies call be obtained without Much a Priori knowledge oil dynamic models of mobile robot, etc. Two kinds of API-based control method. i.e.. API with linear approximation and API with kernel machines, are implemented ill the path following control task and the efficiency of the proposed control strategy is illustrated in the experimental studies oil the real mobile robot based oil the Pioneer3-AT platform. Experimental results verify that the API-based learning, controller has better convergence and path following accuracy compared to conventional PD control methods. Finally, the learning control performance of the two API methods is also evaluated and compared.

关键词： Mobile robots Approximate policy iteration reinforcement learning Path following Approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Kalman Temporal Differences: The deterministic case

Kalman Temporal Differences: The deterministic case

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Matthieu Geist Olivier Pietquin Gabriel Fricout IMS Research Group Supélec Metz France IMS Research Group Metz France MC cluster ArcelorMittal Research Maizieres-Les-Metz France

This paper deals with value function and Q-function approximation in deterministic Markovian decision processes. A general statistical framework based on the Kalman filtering paradigm is introduced. Its principle is to adopt a parametric representation of the value function, to model the associated parameter vector as a random variable and to minimize the mean-squared error of the parameters conditioned on past observed transitions. From this general framework, which will be called Kalman Temporal Differences (KTD), and using an approximation scheme called the unscented transform, a family of algorithms is derived, namely KTD-V, KTD-SARSA and KTD-Q, which aim respectively at estimating the value function of a given policy, the Q-function of a given policy and the optimal Q-function. The proposed approach holds for linear and nonlinear parameterization. This framework is discussed and potential advantages and shortcomings are highlighted.

关键词： Kalman filters Equations State-space methods learning Error correction Filtering Random variables Approximation algorithms dynamic programming Vectors

来源：评论

学校读者我要写书评

暂无评论

ADHDP(λ) strategies based coordinated ramps metering with queuing consideration

ADHDP(λ) strategies based coordinated ramps metering with q...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Xuerui Bai Dongbin Zhao Jianqiang Yi Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China

Ramp metering has been developed as a traffic management strategy to alleviate congestion on freeways. Most ramp metering control algorithms are concerned without queuing consideration, because its still a tough job to deal with the problems of coordinated multiple ramps metering with queuing consideration. In this paper, on the basis of our previous studies, we use action-dependent heuristic dynamic programming based on eligibility traces (ADHDP(lambda)) to solve local ramp metering and multiple ramps metering problems with queuing consideration. First, for the local ramp metering problem, we establish a comprehensive performance index which considers both traffic density and on-ramp queue length. Second, for the multiple ramps metering problem, based on ADHDP(lambda), the coordinated ramps metering and regulating queue lengths are achieved at the same time. Simulation studies on a hypothetical freeway are reported. It is shown that the proposed control scheme is efficient.

关键词： Traffic control Telecommunication traffic dynamic programming Communication system traffic control Performance analysis Laboratories Intelligent systems Automation Automatic control Feedback control

来源：评论

学校读者我要写书评

暂无评论

Algorithm and stability of ATC receding horizon control

Algorithm and stability of ATC receding horizon control

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hongwei Zhang Jie Huang Frank L. Lewis Department of Mechanical and Automation Engineering Chinese University of Hong Kong New Territories Hong Kong China Automation and Robotics Research Institute University of Texas Arlington Fort Worth TX USA

Receding horizon control (RHC), also known as model predictive control (MPC), is a suboptimal control scheme that solves a finite horizon open-loop optimal control problem in an infinite horizon context and yields a measured state feedback control law. A lot of efforts have been made to study the closed-loop stability, leading to various stability conditions involving constraints on either the terminal state, or the terminal cost, or the horizon size, or their different combinations. In this paper, we propose a modified RHC scheme, called adaptive terminal cost RHC (ATC-RHC). The control law generated by ATC-RHC algorithm converges to the solution of the infinite horizon optimal control problem. Moreover, it ensures the closed-loop system to be uniformly ultimately exponentially stable without imposing any constraints on the terminal state, the horizon size, or the terminal cost. Finally we show that when the horizon size is one, the underlying problems of ATC-RHC and heuristic dynamic programming (HDP) are the same. Thus, ATC-RHC can be implemented using HDP techniques without knowing the system matrix A.

关键词： Stability Optimal control Open loop systems Costs Infinite horizon Context modeling Predictive models Predictive control State feedback dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Neuro-controller of cement rotary kiln temperature with adaptive critic designs

Neuro-controller of cement rotary kiln temperature with adap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Xiaofeng Lin Tangbo Liu Shaojian Song Chunning Song College of Electrical Engineering Guangxi University Nanning China College of Electrical Engineering Guangxi University China

The production process of the cement rotary kiln is a typical engineering thermodynamics with large inertia, lagging and nonlinearity. So it is very difficult to control this process accurately using traditional control theory. In order to guarantee the process to be stable, and to produce the high-grade cement clinker, it is important to make the temperature of the sintering zone stable. artificial neural networks offer a solution to this problem due to their advantages, such as self-organization, self-adaptivity and fault tolerance. This paper introduces a novel nonlinear optimal neuro-controller which is based on adaptive critic design and uses the structure of action-dependant heuristic dynamic programming (ADHDP). The principle of ADHDP is presented. An action network and a critic network are set up in such a way that they basically learn from interactions based on local measurement to optimize the neuro-controller. The ADHDP neuro-controller has a simple frame-work and is independent from the system model. A simulation of the cement rotary kiln is carried out using Matlab/Simulink. The simulation results show that using the ADHDP neuro-controller it is possible to keep the temperature of sintering zone stable in a certain range, and the temperature can meet the requirements of cement clinker production. Simulation results also are presented to show that the neuro-controller with the ACD has the potential to control the cement rotary kiln.

关键词： Kilns Production Temperature distribution Thermodynamics Process control Control theory Artificial neural networks Fault tolerance dynamic programming Mathematical model

来源：评论

学校读者我要写书评

暂无评论

Algorithms for variance reduction in a policy-gradient based actor-critic framework

Algorithms for variance reduction in a policy-gradient based...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Yogesh P. Awate Department of Industrial Engineering and Operations Research Indian Institute of Technology Bombay India

We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochastic-gradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix - the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the policy-gradient theorem with function approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.

关键词： Function approximation Approximation algorithms Convergence State-space methods Covariance matrix Garnets learning Linear approximation Stochastic processes Table lookup

来源：评论

学校读者我要写书评

暂无评论

Path integral-based stochastic optimal control for rigid body dynamics

Path integral-based stochastic optimal control for rigid bod...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： E. A. Theodorou J. Buchli S. Schaal Computer Science Neuroscience & Biomedical Engineering University of Southern California CA USA ATR Computational Neuroscience Laboratories Kyoto Japan

Recent advances on path integral stochastic optimal control [1],[2] provide new insights in the optimal control of nonlinear stochastic systems which are linear in the controls, with state independent and time invariant control transition matrix. Under these assumptions, the Hamilton-Jacobi-Bellman (HJB) equation is formulated and linearized with the use of the logarithmic transformation of the optimal value function. The resulting HJB is a linear second order partial differential equation which is solved by an approximation based on the Feynman-Kac formula [3]. In this work we review the theory of path integral control and derive the linearized HJB equation for systems with state dependent control transition matrix. In addition we derive the path integral formulation for the general class of systems with state dimensionality that is higher than the dimensionality of the controls. Furthermore, by means of a modified inverse dynamics controller, we apply path integral stochastic optimal control over the new control space. Simulations illustrate the theoretical results. Future developments and extensions are discussed.

关键词： Stochastic processes Optimal control Cost function Control systems Integral equations Stochastic systems Nonlinear control systems learning Humanoid robots Sampling methods

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共103页 << < 79 80 81 82 83 84 85 86 87 88 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：