检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

15 篇 会议
10 篇 期刊文献

馆藏范围

25 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

24 篇 工学
- 10 篇 电气工程
- 10 篇 计算机科学与技术...
- 9 篇 控制科学与工程
- 4 篇 机械工程
- 4 篇 仪器科学与技术
- 3 篇 信息与通信工程
- 3 篇 石油与天然气工程
- 3 篇 软件工程
- 2 篇 动力工程及工程热...
- 2 篇 电子科学与技术（可...
- 2 篇 交通运输工程
- 2 篇 生物医学工程（可授...
- 1 篇 安全科学与工程
- 1 篇 公安技术
- 1 篇 网络空间安全
6 篇 理学
- 3 篇 数学
- 2 篇 化学
- 2 篇 统计学（可授理学、...
- 1 篇 系统科学
4 篇 管理学
- 4 篇 管理科学与工程(可...
2 篇 经济学
- 2 篇 应用经济学
1 篇 医学
- 1 篇 基础医学(可授医学...

主题

25 篇 sarsa algorithm
16 篇 reinforcement le...
3 篇 path planning
3 篇 q-learning
2 篇 function approxi...
2 篇 dynamic role ass...
2 篇 markov decision ...
2 篇 robocup
1 篇 self-learning al...
1 篇 q-learning algor...
1 篇 genetic algorith...
1 篇 state discretiza...
1 篇 adaptive control...
1 篇 traffic signal
1 篇 maximal control ...
1 篇 diabetes
1 篇 power grids
1 篇 inspect
1 篇 grid integration
1 篇 intelligent body...

机构

2 篇 suzhou univ sci ...
2 篇 college of autom...
2 篇 suzhou univ sci ...
2 篇 suzhou univ sci ...
1 篇 guangxi univ tec...
1 篇 chongqing univ t...
1 篇 donghua univ eng...
1 篇 soochow univ sch...
1 篇 vellore institut...
1 篇 vellore institut...
1 篇 key laboratory o...
1 篇 wuhan univ sch c...
1 篇 pes university b...
1 篇 key laboratory o...
1 篇 college of compu...
1 篇 shanghai jiao to...
1 篇 swiss fed inst t...
1 篇 jiangsu univ ins...
1 篇 key lab smart en...
1 篇 xinjiang univ sc...

作者

2 篇 fu qiming
2 篇 chen jianping
2 篇 hu lingyao
2 篇 hu wen
2 篇 yang yongyi
2 篇 cui xuanyu
2 篇 liu haoran
2 篇 liang zhiwei
2 篇 wang jiawen
1 篇 huang chen
1 篇 jia liruizhi
1 篇 fan huahao
1 篇 hoda nasereddin
1 篇 renjith p.n.
1 篇 fan kai
1 篇 lin wei
1 篇 kang han
1 篇 shanta rangaswam...
1 篇 zhang yuxin
1 篇 kai fan

语言

25 篇 英文

检索条件"主题词=SARSA algorithm"

共 25 条记录，以下是21-30 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Glucose Level Control Using Temporal Difference Methods 25

Glucose Level Control Using Temporal Difference Methods

引用

25th Iranian Conference on Electrical Engineering (ICEE)

作者： Noori, Amin Sadrnia, Mohammad Ali Sistani, Mohammad Bagher Naghibi Shahrood Univ Technol Control Engn Shahrood Iran Ferdowsi Univ Mashhad Control Engn Mashhad Iran

ISBN: (纸本)9781509059638

Control theory has been widely used in various fields;one of these areas is medical issues. Diabetes is one of the new topics of interest in control. Obtaining the rates for the injection of insulin automatically always been a concern of physicians. The purpose of the control and treatment of diabetes, is keeping blood glucose in the normal range as possible. In this paper, we used sarsa method - which is an on-policy Temporal Difference (TD) technique - for insulin delivery rate. TD methods are the most known methods for solving reinforcement learning problem. Because TD methods don't require a precise model of environment dynamics;they have absorbed interests in medical applications during recent years. Although temporal difference methods don't require a mathematical model of the environment, but for simulating an environment, we used Palumbo mathematical model instead of real patients. Since patients' medical parameters vary from person to person, for controlling the disease we should have different drug schedules, in other word, we should have different controller for each patient. While RL methods, by interacting with their environment, automatically define suitable doses for each person. If we want less trial and error on real patients and therefore reduce the side effects of changes in dose on the patient;according to the parameters of a patient, we design a controller which estimate the appropriate insulin injection rate. Then the drug program can be applied to other real patients. At this stage controller (applies sarsa algorithm) with less trial and error, determines the appropriate dose for real patient. The results of the simulations, represents the efficiency of the proposed method.

关键词： Drug Therapy Diabetes sarsa algorithm Temporal Difference Reinforcement Learning

来源：评论

学校读者我要写书评

暂无评论

Multi-robot collaboration based on Markov decision process in Robocup3D soccer simulation game 27

Multi-robot collaboration based on Markov decision process i...

引用

27th Chinese Control and Decision Conference, CCDC 2015

作者： Cui, Xuanyu Liang, Zhiwei Yang, Yongyi Ping, Shen Wang, Jiawen Liu, Haoran Kai, Fan College of Automation Nanjing University of Posts and Telecommunications Nanjing China

ISBN: (纸本)9781479970179

Close collaboration and desired strategy is indispensable for humanoid robots in the RoboCup soccer competition. In order to solve the problem that the convergence rate is too low in training local strategies, this paper mainly proposed a method to optimize the parameters in decision and positioning based on reinforcement learning for soccer robots. First, Markov decision process is applied to the framework for reinforcement learning. Then, we propose a relative improved method, which is known as a sarsa algorithm to overcome the drawback of the low convergence rate of the average reward reinforcement learning. Meanwhile, in order to deal with the large state space problems arising in the training and improve the generalization ability, this method is applied to the Keepaway local training. The training results show that, this algorithm has a faster convergent speed than other ordinary learning algorithm. © 2015 IEEE.

关键词： Dynamic role assignment Markov Decision Process Reinforcement learning RoboCup sarsa algorithm

来源：评论

学校读者我要写书评

暂无评论

Multi-robot Collaboration Based on Markov Decision Process in Robocup3D Soccer Simulation Game

Multi-robot Collaboration Based on Markov Decision Process i...

引用

第27届中国控制与决策会议

作者： Cui Xuanyu Liang Zhiwei Yang Yongyi Shen Ping Wang Jiawen Liu Haoran Fan Kai College of Automation Nanjing University of Posts and Telecommunications

ISBN: (纸本)9781479970186

Close collaboration and desired strategy is indispensable for humanoid robots in the RoboCup soccer *** order to solve the problem that the convergence rate is too low in training local strategies,this paper mainly proposed a method to optimize the parameters in decision and positioning based on reinforcement learning for soccer ***,Markov decision process is applied to the framework for reinforcement ***,we propose a relative improved method,which is known as a sarsa algorithm to overcome the drawback of the low convergence rate of the average reward reinforcement ***,in order to deal with the large state space problems arising in the training and improve the generalization ability,this method is applied to the Keepaway local *** training results show that,this algorithm has a faster convergent speed than other ordinary learning algorithm.

关键词： Markov Decision Process sarsa algorithm Reinforcement learning Dynamic role assignment RoboCup

来源：评论

学校读者我要写书评

暂无评论

Urban Traffic Signal Learning Control Using sarsa algorithm Based on Adaptive RBF Network

Urban Traffic Signal Learning Control Using SARSA Algorithm ...

引用

International Conference on Measuring Technology and Mechatronics Automation

作者： Li Chun-gui Wang Meng Yang Shu-Hong Zhang Zeng-fang Guangxi Univ Technol Dept Comp Engn Liuzhou Peoples R China

ISBN: (纸本)9780769535838

Urban traffic control is very complicated, so to build a precise mathematical model for it is very difficult. In this paper, we use the sarsa reinforcement leaning algorithm to control the traffic signal, thus the decision can be made dynamically according to real-time traffic state information, and the change of environment can be adapted automatically;As the state space is too big to be stored and expressed directly, we applied radial basis function neural network to approximate the state value function. By training self-adapted non-linear processing unit, and realizing online and adaptive constructing of state space, the approximation is improved and thus the control of traffic signal at single intersections is solved. The simulation results show that the effectiveness of the new control algorithm is obviously better than traditional sliced time allocation methods.

关键词： sarsa algorithm function approximation adaptive RBF neural network traffic signal learning control

来源：评论

学校读者我要写书评

暂无评论

Blackjack as a test bed for learning strategies in neural networks

Blackjack as a test bed for learning strategies in neural ne...

引用

2nd IEEE World Congress on Computational Intelligence (WCCI 98)

作者： Perez-Uribe, A Sanchez, E Swiss Fed Inst Technol Dept Comp Sci Log Syst Lab CH-1015 Lausanne Switzerland

ISBN: (纸本)0780348591;0780348605

Blackjack or twenty-one is a card game where the player attempts to beat the dealer, by obtaining a sum of card values that is equal to or less than 21 so that his total is higher than the dealer's. The probabilistic nature of the game makes it an interesting testbed problem for learning algorithms, though the problem of learning a good playing strategy is not obvious. Learning with a teacher systems are not very useful since the target outputs for a given stage of the game are not known. Instead, the learning system has to explore different actions and develop a certain strategy by selectively retaining the actions that maximize the player's performance. This paper explores the use of blackjack as a test bed for learning strategies in neural networks, and specifically with reinforcement learning techniques. Furthermore, performance comparisons with previous related approaches are also reported.

关键词： reinforcement learning sarsa algorithm Q-learning Blackjack learning strategies artificial neural networks

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共3页 << < 1 2 3 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：