检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

748 篇 会议
271 篇 期刊文献
4 册 图书

馆藏范围

1,023 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

712 篇 工学
- 520 篇 计算机科学与技术...
- 381 篇 电气工程
- 278 篇 控制科学与工程
- 153 篇 软件工程
- 79 篇 信息与通信工程
- 40 篇 交通运输工程
- 23 篇 仪器科学与技术
- 20 篇 机械工程
- 9 篇 生物工程
- 8 篇 电子科学与技术（可...
- 7 篇 力学（可授工学、理...
- 7 篇 土木工程
- 6 篇 动力工程及工程热...
- 6 篇 石油与天然气工程
- 4 篇 生物医学工程（可授...
- 3 篇 材料科学与工程（可...
- 3 篇 化学工程与技术
- 3 篇 航空宇航科学与技...
- 3 篇 安全科学与工程
118 篇 理学
- 98 篇 数学
- 32 篇 系统科学
- 22 篇 统计学（可授理学、...
- 10 篇 生物学
- 8 篇 物理学
- 4 篇 化学
66 篇 管理学
- 63 篇 管理科学与工程(可...
- 14 篇 工商管理
- 5 篇 图书情报与档案管...
5 篇 经济学
- 4 篇 应用经济学
3 篇 法学
- 3 篇 社会学
2 篇 医学
1 篇 教育学

主题

313 篇 reinforcement le...
216 篇 dynamic programm...
206 篇 optimal control
107 篇 adaptive dynamic...
104 篇 adaptive dynamic...
97 篇 learning
88 篇 neural networks
78 篇 heuristic algori...
68 篇 reinforcement le...
58 篇 learning (artifi...
54 篇 nonlinear system...
53 篇 convergence
51 篇 control systems
51 篇 mathematical mod...
48 篇 approximate dyna...
44 篇 approximation al...
43 篇 equations
42 篇 adaptive control
41 篇 artificial neura...
41 篇 cost function

机构

41 篇 chinese acad sci...
27 篇 univ rhode isl d...
17 篇 tianjin univ sch...
16 篇 univ sci & techn...
16 篇 univ illinois de...
15 篇 northeastern uni...
14 篇 beijing normal u...
13 篇 northeastern uni...
13 篇 guangdong univ t...
12 篇 northeastern uni...
9 篇 natl univ def te...
8 篇 ieee
8 篇 univ chinese aca...
7 篇 univ chinese aca...
7 篇 cent south univ ...
7 篇 southern univ sc...
7 篇 beijing univ tec...
6 篇 chinese acad sci...
6 篇 missouri univ sc...
5 篇 nanjing univ pos...

作者

54 篇 liu derong
37 篇 wei qinglai
29 篇 he haibo
22 篇 wang ding
21 篇 xu xin
19 篇 jiang zhong-ping
17 篇 lewis frank l.
17 篇 yang xiong
17 篇 zhang huaguang
17 篇 ni zhen
16 篇 zhao bo
15 篇 gao weinan
14 篇 zhao dongbin
13 篇 derong liu
13 篇 zhong xiangnan
12 篇 si jennie
10 篇 jagannathan s.
10 篇 dongbin zhao
10 篇 song ruizhuo
9 篇 abouheaf mohamme...

语言

992 篇 英文
25 篇 其他
6 篇 中文

检索条件"任意字段=IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning"

共 1023 条记录，以下是911-920 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Randomly Sampling Actions In dynamic programming

Randomly Sampling Actions In Dynamic Programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Christopher G. Atkeson Robotics Institute Carnegie Mellon University Pittsburgh PA USA

We describe an approach towards reducing the curse of dimensionality for deterministic dynamic programming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching over a discretized multidimensional grid. We present results on finding time invariant control laws for two, four, and six dimensional deterministic swing up problems with up to 480 million discretized states

关键词： Sampling methods dynamic programming Cost function Computational efficiency Interpolation Steady-state learning Robots USA Councils Multidimensional systems

来源：评论

学校读者我要写书评

暂无评论

The Effect of Bootstrapping in Multi-Automata reinforcement learning

The Effect of Bootstrapping in Multi-Automata Reinforcement ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Maarten Peeters Katja Verbeeck Ann Nowe Computational Modeling Laboratory Vrije Universiteit Brussel Brussels Belgium

learning automata are shown to be an excellent tool for creating learning multi-agent systems. Most algorithms used in current automata research expect the environment to end in an explicit end-stage. In this end-stage the rewards are given to the learning automata (i.e. Monte Carlo updating). This is however unfeasible in sequential decision problems with infinite horizon where no such end-stage exists. In this paper we propose a new algorithm based on one-step returns that uses bootstrapping to find good equilibrium paths in multi-stage games

关键词： learning automata Monte Carlo methods Convergence dynamic programming Computational modeling Multiagent systems Infinite horizon Equations

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning in Continuous Action Spaces

Reinforcement Learning in Continuous Action Spaces

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hado van Hasselt Marco A. Wiering Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

Quite some research has been done on reinforcement learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named continuous actor critic learning automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method

关键词： learning automata Computational modeling dynamic programming Intelligent systems Telephony Books Physics computing

来源：评论

学校读者我要写书评

暂无评论

Knowledge Transfer Using Local Features

Knowledge Transfer Using Local Features

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Martin Stolle Christopher G. Atkeson Robotics Institute Carnegie Mellon University Pittsburgh PA USA

We present a method for reducing the effort required to compute policies for tasks based on solutions to previously solved tasks. The key idea is to use a learned intermediate policy based on local features to create an initial policy for the new task. In order to further improve this initial policy, we developed a form of generalized policy iteration. We achieve a substantial reduction in computation needed to find policies when previous experience is available

关键词： Knowledge transfer learning Automatic control Navigation dynamic programming Robots Artificial intelligence Strips Legged locomotion

来源：评论

学校读者我要写书评

暂无评论

Two Novel On-policy reinforcement learning Algorithms based on TD(λ)-methods

Two Novel On-policy Reinforcement Learning Algorithms based ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Marco A. Wiering Hado van Hasselt Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms

关键词： learning automata Neural networks dynamic programming Intelligent systems State estimation Probability distribution Stochastic systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

Online reinforcement learning Neural Network Controller Design for Nanomanipulation

Online Reinforcement Learning Neural Network Controller Desi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Qinmin Yang S. Jagannathan Department of Electrical & Computer Engineering University of Missouri Rolla MO USA

In this paper, a novel reinforcement learning neural network (NN)-based controller, referred to adaptive critic controller, is proposed for affine nonlinear discrete-time systems with applications to nanomanipulation. In the online NN reinforcement learning method, one NN is designated as the critic NN, which approximates the long-term cost function by assuming that the states of the nonlinear systems is available for measurement. An action NN is employed to derive an optimal control signal to track a desired system trajectory while minimizing the cost function. Online updating weight tuning schemes for these two NNs are also derived. By using the Lyapunov approach, the uniformly ultimate boundedness (UUB) of the tracking error and weight estimates is shown. Nanomanipulation implies manipulating objects with nanometer size. It takes several hours to perform a simple task in the nanoscale world. To accomplish the task automatically the proposed online learning control design is evaluated for the task of nanomanipulation and verified in the simulation environment

关键词： learning Neural networks Nonlinear control systems Control systems Cost function Programmable control adaptive control Nonlinear systems Optimal control Trajectory

来源：评论

学校读者我要写书评

暂无评论

Kernelizing LSPE(λ)

Kernelizing LSPE(λ)

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Tobias Jung Daniel Polani University of Mainz Germany University of Herfordshire UK

We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the 'kernelization' of model-free LSPE(λ). The 'kernelization' is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this

关键词： Least squares approximation Function approximation learning Kernel dynamic programming Electronic mail Optimal control Optimization methods Least squares methods Control systems

来源：评论

学校读者我要写书评

暂无评论

Call admission control in wireless DS-CDMA systems using actor-critic reinforcement learning

Call admission control in wireless DS-CDMA systems using act...

引用

2nd International symposium on Wireless Pervasive Computing

作者： Chanloha, Pitipong Usaha, Wipawee Suranaree Univ Technol Sch Telecommun Engn Nakhon Ratchasima 30000 Thailand

ISBN: (纸本)9781424405220

This paper addresses the call admission control (CAC) problem for multiple services in the uplink of a cellular system using direct sequential code division multiple access (DS-CDMA) when taking into account the physical layer channel and receiver structure at the base station. The problem is formulated as a semi-Markov decision process (SMDP) with constraints on the blocking probabilities and signal-to-interference ratio (SIR). The objective is to find a CAC policy which maximizes the throughput while still satisfying these quality-of-service (QoS) constraints. To solve for a near optimal CAC policy, an online decision-making algorithm based on an actor-critic with temporal-difference learning from a recent paper is modified by parameterizing the reward signal to deal with the QoS constraints. The proposed algorithm circumvents the computational complexity experienced in conventional dynamic programming techniques.

关键词： Code division multiple access

来源：评论

学校读者我要写书评

暂无评论

Model-Based reinforcement learning in Factored-State MDPs

Model-Based Reinforcement Learning in Factored-State MDPs

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Alexander L. Strehl Department of Computer Science Rutgers University Piscataway NJ USA

We consider the problem of learning in a factored-state Markov decision process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Roller for their DBN-E 3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the interval estimation approach to exploration and can be expected to outperform factored Rmax on most domains

关键词： learning Polynomials Algorithm design and analysis State-space methods dynamic programming Computer science Mathematical model Performance analysis Bayesian methods Linear approximation

来源：评论

学校读者我要写书评

暂无评论

Opposition-Based reinforcement learning in the Management of Water Resources

Opposition-Based Reinforcement Learning in the Management of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： M. Mahootchi H. R. Tizhoosh K. Ponnambalam Systems Design Engineering University of Waterloo Waterloo ONT Canada

Opposition-based learning (OBL) is a new scheme in machine intelligence. In this paper, an OBL version Q-learning which exploits opposite quantities to accelerate the learning is used for management of single reservoir operations. In this method, an agent takes an action, receives reward, and updates its knowledge in terms of action-value functions. Furthermore, the transition function which is the balance equation in the optimization model determines the next state and updates the action-value function pertinent to opposite action. Two type of opposite actions will be defined. It will be demonstrated that using OBL can significantly improve the efficiency of the operating policy within limited iterations. It is also shown that this technique is more robust than Q-learning

关键词： Resource management Water resources Reservoirs dynamic programming Stochastic processes Machine learning Neural networks Design engineering Systems engineering and theory Machine intelligence

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共103页 << < 88 89 90 91 92 93 94 95 96 97 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：