检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

748 篇 会议
271 篇 期刊文献
4 册 图书

馆藏范围

1,023 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

712 篇 工学
- 520 篇 计算机科学与技术...
- 381 篇 电气工程
- 278 篇 控制科学与工程
- 153 篇 软件工程
- 79 篇 信息与通信工程
- 40 篇 交通运输工程
- 23 篇 仪器科学与技术
- 20 篇 机械工程
- 9 篇 生物工程
- 8 篇 电子科学与技术（可...
- 7 篇 力学（可授工学、理...
- 7 篇 土木工程
- 6 篇 动力工程及工程热...
- 6 篇 石油与天然气工程
- 4 篇 生物医学工程（可授...
- 3 篇 材料科学与工程（可...
- 3 篇 化学工程与技术
- 3 篇 航空宇航科学与技...
- 3 篇 安全科学与工程
118 篇 理学
- 98 篇 数学
- 32 篇 系统科学
- 22 篇 统计学（可授理学、...
- 10 篇 生物学
- 8 篇 物理学
- 4 篇 化学
66 篇 管理学
- 63 篇 管理科学与工程(可...
- 14 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 4 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 医学
1 篇 教育学

主题

313 篇 reinforcement le...
216 篇 dynamic programm...
206 篇 optimal control
107 篇 adaptive dynamic...
104 篇 adaptive dynamic...
97 篇 learning
88 篇 neural networks
78 篇 heuristic algori...
68 篇 reinforcement le...
58 篇 learning (artifi...
54 篇 nonlinear system...
53 篇 convergence
51 篇 control systems
51 篇 mathematical mod...
48 篇 approximate dyna...
44 篇 approximation al...
43 篇 equations
42 篇 adaptive control
41 篇 artificial neura...
41 篇 cost function

机构

41 篇 chinese acad sci...
27 篇 univ rhode isl d...
17 篇 tianjin univ sch...
16 篇 univ sci & techn...
16 篇 univ illinois de...
15 篇 northeastern uni...
14 篇 beijing normal u...
13 篇 northeastern uni...
13 篇 guangdong univ t...
12 篇 northeastern uni...
9 篇 natl univ def te...
8 篇 ieee
8 篇 univ chinese aca...
7 篇 univ chinese aca...
7 篇 cent south univ ...
7 篇 southern univ sc...
7 篇 beijing univ tec...
6 篇 chinese acad sci...
6 篇 missouri univ sc...
5 篇 nanjing univ pos...

作者

54 篇 liu derong
37 篇 wei qinglai
29 篇 he haibo
22 篇 wang ding
21 篇 xu xin
19 篇 jiang zhong-ping
17 篇 lewis frank l.
17 篇 yang xiong
17 篇 zhang huaguang
17 篇 ni zhen
16 篇 zhao bo
15 篇 gao weinan
14 篇 zhao dongbin
13 篇 derong liu
13 篇 zhong xiangnan
12 篇 si jennie
10 篇 jagannathan s.
10 篇 dongbin zhao
10 篇 song ruizhuo
9 篇 abouheaf mohamme...

语言

992 篇 英文
25 篇 其他
6 篇 中文

检索条件"任意字段=IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning"

共 1023 条记录，以下是661-670 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Exponential moving average Q-learning algorithm

Exponential moving average Q-learning algorithm

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Mostafa D. Awheda Howard M. Schwartz Department of Systems and Computer Engineering Carleton University Ottawa Canada

A multi-agent policy iteration learning algorithm is proposed in this work. The Exponential Moving Average (EMA) mechanism is used to update the policy for a Q-learning agent so that it converges to an optimal policy against the policies of the other agents. The proposed EMA Q-learning algorithm is examined on a variety of matrix and stochastic games. Simulation results show that the proposed algorithm converges in a wider variety of situations than state-of-the-art multi-agent reinforcement learning (MARL) algorithms.

关键词： Games Nash equilibrium learning (artificial intelligence) Heuristic algorithms Probability distribution Vectors Markov processes

来源：评论

学校读者我要写书评

暂无评论

An integrated design for intensified direct heuristic dynamic programming

An integrated design for intensified direct heuristic dynami...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Xiong Luo Jennie Si Yuchao Zhou School of Computer and Communication Engineering University of Science and Technology Beijing (USTB) Beijing China Arizona State University Tempe AZ US

There has been a growing interest in the study of adaptive/approximate dynamic programming (ADP) in recent years. The ADP technique provides a powerful tool to understand and improve the principled technologies of machine intelligence system. As one of the ADP algorithms based on adaptive critic neural networks (NNs), the direct heuristic dynamic programming (direct HDP) has demonstrated some successful applications in solving realistic engineering control problems. In this study, based on a three-network architecture in which the reinforcement signal is approximated by an additional NN, a novel integrated design method for intensified direct HDP is developed. The new design approach is implemented by using multiple PID neural networks (PIDNNs), which effectively takes into account structural knowledge of system states and control that are usually present in a physical system. By using a Lyapunov stability approach, a uniformly ultimately boundedness (UUB) result is proved for our PIDNNs-based intensified direct HDP learning controller. Furthermore, the learning and control performances of the proposed design is tested using the popular cart-pole example to illustrate the key ideas of this paper.

关键词： Neural networks dynamic programming Convergence Lyapunov methods learning (artificial intelligence) Educational institutions Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

A novel approach for constructing basis functions in approximate dynamic programming for feedback control

A novel approach for constructing basis functions in approxi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jian Wang Zhenhua Huang Xin Xu College of Mechatronics and Automation National University of Defense Tech Changsha P. R. China

This paper presents a novel approach for constructing basis functions in approximate dynamic programming (ADP) through the locally linear embedding (LLE) process. It considers the experience (sample) data as a high-dimensional space and the basis functions to be solved as a low-dimensional space. Through mapping the high-dimensional data into a single global coordinate system of lower dimensionality, the solved basis functions in low-dimensional space have the property that nearby experience data in the high dimensional space remain nearby and similarly co-located with respect to one in the low dimensional space. Thus, the obtained basis functions can precisely approximate the real value/action-value function. The simulation results show that the basis functions obtained by LLE can represent the final policy with a higher precision.

关键词： learning (artificial intelligence) Function approximation dynamic programming Equations Linear approximation Vectors

来源：评论

学校读者我要写书评

暂无评论

Bias-corrected Q-learning to control max-operator bias in Q-learning

Bias-corrected Q-learning to control max-operator bias in Q-...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Donghun Lee Boris Defourny Warren B. Powell Department of Computer Science Princeton University Princeton NJ USA Operations Research and Financial Engineering Princeton University Princeton NJ USA

We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.

关键词： Convergence Reactive power Random variables Standards dynamic programming learning (artificial intelligence) Educational institutions

来源：评论

学校读者我要写书评

暂无评论

Free energy based policy gradients

Free energy based policy gradients

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Evangelos A. Theodorou Jiri Najemnik Emo Todorov Department of Computer Science and Engineering University of Washington Seattle University of Washington Seattle Departments of Computer Science and Engineering and Applied Math University of Washington Seattle

Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time for free energy-like cost functions. The derivation is based on successive application of Girsanov's theorem and the use of the Radon Nikodým derivative as formulated for Markov diffusion processes. The resulting policy gradient is reward weighted. The use of Radon Nikodým extends analysis and results to more general models of stochasticity in which jump diffusions processes are considered. We apply the resulting algorithm in two simple examples for learning attractor landscapes in rhythmic and discrete movements.

关键词： learning (artificial intelligence) Diffusion processes Markov processes Cost function Equations Heuristic algorithms

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 2013 ieee Conference on Evolving and adaptive Intelligent Systems, EAIS 2013 - 2013 ieee symposium Series on Computational Intelligence, SSCI 2013

Proceedings of the 2013 IEEE Conference on Evolving and Adap...

引用

2013 ieee Conference on Evolving and adaptive Intelligent Systems, EAIS 2013 - 2013 ieee symposium Series on Computational Intelligence, SSCI 2013

ISBN: (纸本)9781467358552

The proceedings contain 20 papers. The topics discussed include: Resolving global and local drifts in data stream regression using evolving rule-based models;fuzzy decision trees for dynamic data;dynamic and evolving fuzzy concept lattices;adapting meeting tools to agent decision;a meta-cognitive interval type-2 fuzzy inference system classifier and its projection based learning algorithm;ensemble method based on individual evolving classifiers;learning imbalanced classes in the presence of concept growth;evolving systems for computer user behavior classification;controlling evaluation duration in on-line, on-board evolutionary robotics;an agent based model of stress in the workplace;network robustness and topological characteristics in scale-free networks;evolution of a digital organism playing go;and an ensemble based genetic programming system to predict English football premier league games.

关键词：

来源：评论

学校读者我要写书评

暂无评论

adaptive optimal control for nonlinear discrete-time systems

Adaptive optimal control for nonlinear discrete-time systems

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Chunbin Qin Huaguang Zhang Yanhong Luo School of Information Science and Engineering Northeastern University Shenyang China Basic Experiment Teaching Center Henan University Kaifeng China

This paper proposes an on-line near-optimal control scheme based on capabilities of neural networks (NNs), in function approximation, to attain the on-line solution of optimal control problem for nonlinear discrete-time systems. First, to solve the Hamilton-Jacobi-Bellman (HJB) equation forward-in-time appearing in the optimal control problem, two neural networks are used to approximate the cost function and to compute the optimal control policy, respectively. And then, according to the Bellman's optimality principle and the adaptive technology, the on-line weight updating laws for the critic network and action network are derived, respectively. Further, considering NNs approximative errors, the stability analysis of the closed-loop system is demonstrated by Lyapunov theory. At last, a numerical example is provided to demonstrate the effectiveness of the proposed method.

关键词： Artificial neural networks Equations Optimal control Mathematical model dynamic programming Approximation methods Discrete-time systems

来源：评论

学校读者我要写书评

暂无评论

Optimistic planning for belief-augmented Markov Decision Processes

Optimistic planning for belief-augmented Markov Decision Pro...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Raphael Fonteneau Lucian Buşoniu Rémi Munos Department of Electrical Engineering and Computer Science University of Liège BELGIUM Universite de Lorraine CRAN FRANCE SequeL Team Inria Lille FRANCE

This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance.

关键词： Bayes methods Planning Context learning (artificial intelligence) Algorithm design and analysis Context modeling dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A combined hierarchical reinforcement learning based approach for multi-robot cooperative target searching in complex unknown environments

A combined hierarchical reinforcement learning based approac...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Yifan Cai Simon X. Yang Xin Xu The School of Engineering University of Guelph Guelph Ontario Canada The College of Mechatronics and Automation National University of Defense Technology Changsha Hunan Province China

Effective cooperation of multi-robots in unknown environments is essential in many robotic applications, such as environment exploration and target searching. In this paper, a combined hierarchical reinforcement learning approach, together with a designed cooperation strategy, is proposed for the real-time cooperation of multi-robots in completely unknown environments. Unlike other algorithms that need an explicit environment model or select parameters by trial and error, the proposed cooperation method obtains all the required parameters automatically through learning. By integrating segmental options with the traditional MAXQ algorithm, the cooperation hierarchy is built. In new tasks, the designed cooperation method can control the multi-robot system to complete the task effectively. The simulation results demonstrate that the proposed scheme is able to effectively and efficiently lead a team of robots to cooperatively accomplish target searching tasks in completely unknown environments.

关键词： Robot kinematics learning (artificial intelligence) Real-time systems Algorithm design and analysis dynamic programming Robot sensing systems

来源：评论

学校读者我要写书评

暂无评论

The second order temporal difference error for Sarsa(λ)

The second order temporal difference error for Sarsa(λ)

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Qiming Fu Quan Liu Fei Xiao Guixin Chen Department of Computer Science and Technology Soochow University Suzhou China

Traditional reinforcement learning algorithms, such as Q-learning, Q(λ), Sarsa, and Sarsa(λ), update the action value function using temporal difference (TD) error, which is computed by the last action value function. From the perspective of the TD error, and with respect to the problems of low efficiency and slow convergence of the traditional Sarsa(λ) algorithm, this paper defines the n th order TD Error, applies it in the traditional Sarsa(λ) algorithm, and develops a fast Sarsa(λ) algorithm based on the 2 nd order TD Error. The algorithm adjusts the Q value with the second-order TD Error and broadcasts the TD Error into the whole state-action space, which speeds up the convergence of the algorithm. This paper also analyzes the convergence rate, and under the condition of one-step update, the results show that the number of iteration depends primarily on γ, ε. Finally, using the proposed algorithm on the traditional reinforcement learning problems, the results show that the algorithm has both a faster convergence rate and better convergence performance.

关键词： Equations Convergence learning (artificial intelligence) Mathematical model Educational institutions Algorithm design and analysis Machine learning algorithms

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共103页 << < 63 64 65 66 67 68 69 70 71 72 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：