检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

228 篇 会议
4 篇 期刊文献

馆藏范围

232 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

98 篇 工学
- 93 篇 计算机科学与技术...
- 40 篇 软件工程
- 25 篇 电气工程
- 14 篇 控制科学与工程
- 4 篇 机械工程
- 1 篇 力学（可授工学、理...
- 1 篇 信息与通信工程
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
23 篇 理学
- 23 篇 数学
- 6 篇 统计学（可授理学、...
- 4 篇 系统科学
- 1 篇 化学
- 1 篇 大气科学
9 篇 管理学
- 7 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
52 篇 learning
46 篇 optimal control
37 篇 reinforcement le...
34 篇 learning (artifi...
27 篇 equations
22 篇 heuristic algori...
21 篇 control systems
20 篇 convergence
19 篇 neural networks
18 篇 function approxi...
17 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
14 篇 markov processes
14 篇 artificial neura...
14 篇 cost function
13 篇 stochastic proce...
12 篇 algorithm design...
12 篇 adaptive control

机构

5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 northeastern uni...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...
2 篇 college of infor...
2 篇 school of automa...

作者

7 篇 hado van hasselt
7 篇 lewis frank l.
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 liu derong
5 篇 huaguang zhang
5 篇 zhang huaguang
5 篇 derong liu
5 篇 warren b. powell
4 篇 xu xin
4 篇 vrabie draguna
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 yanhong luo
4 篇 damien ernst
4 篇 jan peters
4 篇 peters jan
4 篇 zhao dongbin
3 篇 xu hao
3 篇 martin riedmille...

语言

232 篇 英文

检索条件"任意字段=2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009"

共 232 条记录，以下是161-170 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

reinforcement learning in the game of Othello: learning against a fixed opponent and learning from self-play

Reinforcement learning in the game of Othello: Learning agai...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Michiel van der Ree Marco Wiering Faculty of Mathematics and Natural Sciences University of Groningen Institute of Artificial Intelligence and Cognitive Engineering The Netherlands

This paper compares three strategies in using reinforcement learning algorithms to let an artificial agent learn to play the game of Othello. The three strategies that are compared are: learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed opponent while learning from the opponent's moves as well. These issues are considered for the algorithms Q-learning, Sarsa and TD-learning. These three reinforcement learning algorithms are combined with multi-layer perceptrons and trained and tested against three fixed opponents. It is found that the best strategy of learning differs per algorithm. Q-learning and Sarsa perform best when trained against the fixed opponent they are also tested against, whereas TD-learning performs best when trained through self-play. Surprisingly, Q-learning and Sarsa outperform TD-learning against the stronger fixed opponents, when all methods use their best strategy. learning from the opponent's moves as well leads to worse results compared to learning only from the learning agent's own moves.

关键词： Games Training learning (artificial intelligence) Artificial neural networks Heuristic algorithms Testing

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning algorithms for solving classification problems

Reinforcement learning algorithms for solving classification...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Hado van Hasselt Auke-Dirk Pietersma Lambert Schomaker Department of Artificial Intelligence University of Groningam Netherlands Multi-agent and Adaptive Computation Centrum Wiskunde and Informatica Netherlands

We describe a new framework for applying reinforcement learning (RL) algorithms to solve classification tasks by letting an agent act on the inputs and learn value functions. This paper describes how classification problems can be modeled using classification Markov decision processes and introduces the Max-Min ACLA algorithm, an extension of the novel RL algorithm called actor-critic learning automaton (ACLA). Experiments are performed using 8 datasets from the UCI repository, where our RL method is combined with multi-layer perceptrons that serve as function approximators. The RL method is compared to conventional multi-layer perceptrons and support vector machines and the results show that our method slightly outperforms the multi-layer perceptron and performs equally well as the support vector machine. Finally, many possible extensions are described to our basic method, so that much future research can be done to make the proposed method even better.

关键词： Training Markov processes Support vector machines learning Testing Artificial neural networks Accuracy

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning algorithm developed to model GenCo strategic bidding behavior in multidimensional and continuous state and action spaces

A reinforcement learning algorithm developed to model GenCo ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Alfred Yong Fu Lau Dipti Srinivasan Thomas Reindl National University of Singapore Singapore SG Department of Electrical Computer Engineering National University of Singapore Singapore Solar Energy Research Institute of Singapore National University of Singapore Singapore

The electricity market has provided a complex economic environment, and consequently has increased the requirement for advancement of learning methods. In the agent-based modeling and simulation framework of this economic system, the generation company's decision-making is modeled using reinforcement learning. Existing learning methods that model the generation company's strategic bidding behavior are not adapted to the non-stationary and non-Markovian environment involving multidimensional and continuous state and action spaces. This paper proposes a reinforcement learning method to overcome these limitations. The proposed method discovers the input space structure through the self-organizing map, exploits learned experience through Roth-Erev reinforcement learning and explores through the actor critic map. Simulation results from experiments show that the proposed method outperforms Simulated Annealing Q-learning and Variant Roth-Erev reinforcement learning. The proposed method is a step towards more realistic agent learning in Agent-based Computational Economics.

关键词： learning (artificial intelligence) Electricity supply industry Adaptation models Vectors Computational modeling Simulated annealing Schedules

来源：评论

学校读者我要写书评

暂无评论

Optimistic planning for continuous-action deterministic systems

Optimistic planning for continuous-action deterministic syst...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Lucian Buşoniu Alexander Daniels Rémi Munos Robert Babuška Department of Automation Technical University of Cluj-Napoca Romania France DCSC Delft University of Technology the Netherlands Team SequeL INRIA Lille-Nord Europe France

We consider the class of online planning algorithms for optimal control, which compared to dynamic programming are relatively unaffected by large state dimensionality. We introduce a novel planning algorithm called SOOP that works for deterministic systems with continuous states and actions. SOOP is the first method to explore the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. SOOP can be used parameter-free at the cost of more model calls, but we also propose a more practical variant tuned by a parameter α, which balances finer discretization with longer planning horizons. Experiments on three problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization.

关键词： Planning Upper bound Optimization dynamic programming Measurement Heuristic algorithms Aerospace electronics

来源：评论

学校读者我要写书评

暂无评论

Cognitive control in cognitive dynamic systems: A new way of thinking inspired by the brain

Cognitive control in cognitive dynamic systems: A new way of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Simon Haykin Ashkan Amiri Mehdi Fatemi Cognitive Systems Laboratory McMaster University Hamilton Ontario Canada

Briefly, main purpose of the paper is fourfold: a) Cognitive perception, which consists of two functional blocks: improved sparse-coding under the influence of perceptual attention for extracting relevant information from the observables and ignoring irrelevant information, followed by a Bayesian algorithm for state estimation. b) Entropic state of the perceptor, which provides feedback information to the controller. c) Cognitive control, which also consists of two functional blocks: executive learning algorithm computed by processing the entropic state, followed by predictive planning to set the stage for policy to act on the environment, thereby establishing the global perception-action cycle. d) Experimental results for exploiting the perceptual as well as executive attention in a co-operative manner, which is aimed at the first demonstration of risk control in the presence of a severe disturbance in the environment.

关键词： Planning Mathematical model Bayes methods Heuristic algorithms Prediction algorithms Equations Feedforward neural networks

来源：评论

学校读者我要写书评

暂无评论

adaptive fault identification for a class of nonlinear dynamic systems

Adaptive fault identification for a class of nonlinear dynam...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Li-Bing Wu Dan Ye Xin-Gang Zhao College of Information Science and Engineering Northeastern University Shenyang Liaoning P. R. China College of Sciences University of Science and Technology Liaoning Anshan Liaoning P. R. China State Key Laboratory of Robotics and Shenyang Institute of Automation CAS Shenyang Liaoning P. R. China

ISBN: (纸本)9781479945511

This paper is concerned with the diagnosis problem of actuator faults for a class of nonlinear systems. It is assumed that the upper bound of the Lipschtiz constant of the nonlinearity in the faulty system is unknown. Then, a new nonlinear observer for fault diagnosis based on an adaptive estimator is proposed. Moreover, by making use of the designed adaptive observer with on-line update control law without σ-modification condition to approximate the faulty system, it is proved that the estimate error of the adaptive control parameter, the output observation error and the error between the system fault and the corresponding estimate value are uniformly ultimately bounded via Lyapunov stability analysis. Finally, simulation examples are provided to illustrate the efficiency of the proposed fault identification approach.

关键词： Fault diagnosis Observers Nonlinear systems Upper bound Fault detection adaptive systems Educational institutions

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning to train Ms. Pac-Man using higher-order action-relative inputs

Reinforcement learning to train Ms. Pac-Man using higher-ord...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Luuk Bom Ruud Henken Marco Wiering Faculty of Mathematics and Natural Sciences University of Groningen The Netherlands

reinforcement learning algorithms enable an agent to optimize its behavior from interacting with a specific environment. Although some very successful applications of reinforcement learning algorithms have been developed, it is still an open research question how to scale up to large dynamic environments. In this paper we will study the use of reinforcement learning on the popular arcade video game Ms. Pac-Man. In order to let Ms. Pac-Man quickly learn, we designed particular smart feature extraction algorithms that produce higher-order inputs from the game-state. These inputs are then given to a neural network that is trained using Q-learning. We constructed higher-order features which are relative to the action of Ms. Pac-Man. These relative inputs are then given to a single neural network which sequentially propagates the action-relative inputs to obtain the different Q-values of different actions. The experimental results show that this approach allows the use of only 7 input units in the neural network, while still quickly obtaining very good playing behavior. Furthermore, the experiments show that our approach enables Ms. Pac-Man to successfully transfer its learned policy to a different maze on which it was not trained before.

关键词： Games learning (artificial intelligence) Biological neural networks Neurons Heuristic algorithms Training

来源：评论

学校读者我要写书评

暂无评论

Neuro-controller of cement rotary kiln temperature with adaptive critic designs

Neuro-controller of cement rotary kiln temperature with adap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xiaofeng Lin Tangbo Liu Shaojian Song Chunning Song College of Electrical Engineering Guangxi University Nanning China College of Electrical Engineering Guangxi University China

The production process of the cement rotary kiln is a typical engineering thermodynamics with large inertia, lagging and nonlinearity. So it is very difficult to control this process accurately using traditional control theory. In order to guarantee the process to be stable, and to produce the high-grade cement clinker, it is important to make the temperature of the sintering zone stable. artificial neural networks offer a solution to this problem due to their advantages, such as self-organization, self-adaptivity and fault tolerance. This paper introduces a novel nonlinear optimal neuro-controller which is based on adaptive critic design and uses the structure of action-dependant heuristic dynamic programming (ADHDP). The principle of ADHDP is presented. An action network and a critic network are set up in such a way that they basically learn from interactions based on local measurement to optimize the neuro-controller. The ADHDP neuro-controller has a simple frame-work and is independent from the system model. A simulation of the cement rotary kiln is carried out using Matlab/Simulink. The simulation results show that using the ADHDP neuro-controller it is possible to keep the temperature of sintering zone stable in a certain range, and the temperature can meet the requirements of cement clinker production. Simulation results also are presented to show that the neuro-controller with the ACD has the potential to control the cement rotary kiln.

关键词： Kilns Production Temperature distribution Thermodynamics Process control Control theory Artificial neural networks Fault tolerance dynamic programming Mathematical model

来源：评论

学校读者我要写书评

暂无评论

Delayed insertion and rule effect moderation of domain knowledge for reinforcement learning

Delayed insertion and rule effect moderation of domain knowl...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Teck-Hou Teng Ah-Hwee Tan School of Computer Engineering Center for Computational Intelligence School of Computer Engineering Nanyang Technological University

Though not a fundamental pre-requisite to efficient machine learning, insertion of domain knowledge into adaptive virtual agent is nonetheless known to improve learning efficiency and reduce model complexity. Conventionally, domain knowledge is inserted prior to learning. Despite being effective, such approach may not always be feasible. Firstly, the effect of domain knowledge is assumed and can be inaccurate. Also, domain knowledge may not be available prior to learning. In addition, the insertion of domain knowledge can frame learning and hamper the discovery of more effective knowledge. Therefore, this work advances the use of domain knowledge by proposing to delay the insertion and moderate the effect of domain knowledge to reduce the framing effect while still benefiting from the use of domain knowledge. Using a non-trivial pursuit-evasion problem domain, experiments are first conducted to illustrate the impact of domain knowledge with different degrees of truth. The next set of experiments illustrates how delayed insertion of such domain knowledge can impact learning. The final set of experiments is conducted to illustrate how delaying the insertion and moderating the assumed effect of domain knowledge can ensure the robustness and versatility of reinforcement learning.

关键词： Vectors Knowledge engineering learning (artificial intelligence) Adaptation models Educational institutions Computational modeling Neural networks

来源：评论

学校读者我要写书评

暂无评论

Particle Swarn Optimized adaptive dynamic programming

Particle Swarn Optimized Adaptive Dynamic Programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Dongbin Zhao Jianqiang Yi Derong Liu Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China Department of Electrical and Computer Engineering University of Illinois Chicago Chicago IL USA

Particle swarm optimization is used for the training of the action network and critic network of the adaptive dynamic programming approach. The typical structures of the adaptive dynamic programming and particle swarm optimization are adopted for comparison to other learning algorithms such as gradient descent method. Besides simulation on the balancing of a cart pole plant, a more complex plant pendulum robot (pendubot) is tested for the learning performance. Compared to traditional adaptive dynamic programming approaches, the proposed evolutionary learning strategy is verified as faster convergence and higher efficiency. Furthermore, the structure becomes simple because the plant model does not need to be identified beforehand

关键词： dynamic programming Particle swarm optimization Neural networks Robots Backpropagation adaptive systems Evolutionary computation learning Cost function Testing

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共24页 << < 13 14 15 16 17 18 19 20 21 22 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：