检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

299 篇 会议
8 篇 期刊文献

馆藏范围

307 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

180 篇 工学
- 158 篇 计算机科学与技术...
- 56 篇 电气工程
- 48 篇 软件工程
- 47 篇 控制科学与工程
- 13 篇 信息与通信工程
- 10 篇 机械工程
- 6 篇 仪器科学与技术
- 4 篇 力学（可授工学、理...
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 2 篇 交通运输工程
- 2 篇 核科学与技术
- 2 篇 生物医学工程（可授...
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 航空宇航科学与技...
- 1 篇 食品科学与工程（可...
40 篇 理学
- 35 篇 数学
- 9 篇 系统科学
- 8 篇 统计学（可授理学、...
- 4 篇 物理学
- 4 篇 生物学
- 1 篇 化学
- 1 篇 天文学
- 1 篇 大气科学
- 1 篇 地球物理学
- 1 篇 地质学
18 篇 管理学
- 17 篇 管理科学与工程(可...
- 7 篇 工商管理
4 篇 经济学
- 4 篇 应用经济学
1 篇 医学

主题

115 篇 dynamic programm...
76 篇 reinforcement le...
67 篇 learning
47 篇 optimal control
30 篇 neural networks
27 篇 control systems
21 篇 approximate dyna...
21 篇 approximation al...
20 篇 function approxi...
20 篇 equations
17 篇 convergence
16 篇 adaptive dynamic...
16 篇 state-space meth...
16 篇 heuristic algori...
14 篇 mathematical mod...
13 篇 stochastic proce...
12 篇 learning (artifi...
12 篇 adaptive control
12 篇 cost function
11 篇 algorithm design...

机构

5 篇 arizona state un...
4 篇 department of el...
4 篇 school of inform...
4 篇 department of in...
4 篇 univ sci & techn...
4 篇 chinese acad sci...
4 篇 department of el...
3 篇 princeton univ d...
3 篇 northeastern uni...
3 篇 national science...
3 篇 robotics institu...
3 篇 univ illinois de...
3 篇 univ utrecht dep...
2 篇 univ groningen i...
2 篇 sharif univ tech...
2 篇 univ texas autom...
2 篇 pengcheng labora...
2 篇 guangxi univ sch...
2 篇 chinese acad sci...
2 篇 cemagref lisc au...

作者

14 篇 liu derong
9 篇 wei qinglai
8 篇 si jennie
7 篇 xu xin
5 篇 derong liu
4 篇 lewis frank l.
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 jennie si
4 篇 marco a. wiering
4 篇 xin xu
4 篇 zhang huaguang
4 篇 dongbin zhao
4 篇 lei yang
4 篇 powell warren b.
4 篇 riedmiller marti...
3 篇 hado van hasselt
3 篇 van hasselt hado
3 篇 jagannathan s.
3 篇 munos remi

语言

305 篇 英文
1 篇 其他
1 篇 中文

检索条件"任意字段=IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning"

共 307 条记录，以下是191-200 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Sparse Temporal Difference learning Using LASSO

Sparse Temporal Difference Learning Using LASSO

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Manuel Loth Manuel Davy Philippe Preux SequeL INRIA-Futurs LIFL CNRS University of Lille (USTL) France SequeL INRIA-Futurs Lagis CNRS Ecole Centrale de Lille France SequeL INRIA-Futurs LIFL CNRS University of Lille (USTL) France

We consider the problem of on-line value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approximators. We propose to fit the use of kernels into the temporal difference algorithms by using regression via the LASSO. We introduce the equi-gradient descent algorithm (EGD) which is a direct adaptation of the one recently introduced in the LARS algorithm family for solving the LASSO. We advocate our choice of the EGD as a judicious algorithm for these tasks. We present the EGD algorithm in details as well as some experimental results. We insist on the qualities of the EGD for reinforcement learning.

关键词： learning Kernel Convergence Computational efficiency dynamic programming Costs Minimization methods Input variables Approximation algorithms Linear approximation

来源：评论

学校读者我要写书评

暂无评论

Distributed Deep reinforcement learning for Fighting Forest Fires with a Network of Aerial Robots

Distributed Deep Reinforcement Learning for Fighting Forest ...

引用

25th ieee/RSJ international Conference on Intelligent Robots and Systems (IROS)

作者： Haksar, Ravi N. Schwager, Mac Stanford Univ Dept Mech Engn Stanford CA 94305 USA Stanford Univ Dept Aeronaut & Astronaut Stanford CA 94305 USA

ISBN: (纸本)9781538680940

This paper proposes a distributed deep reinforcement learning (RL) based strategy for a team of Unmanned Aerial Vehicles (UAVs) to autonomously fight forest fires. We first model the forest fire as a Markov decision process (MDP) with a factored structure. We consider optimally controlling the forest fire without agents using dynamic programming, and show any exact solution and many approximate solutions are computationally intractable. Given the problem complexity, we consider a deep RL approach in which each agent learns a policy requiring only local information. We show with Monte Carlo simulations that the deep RL policy outperforms a hand-tuned heuristic, and scales well for various forest sizes and different numbers of UAVs as well as variations in model parameters. Experimental demonstrations with mobile robots fighting a simulated forest fire in the Robotarium at the Georgia Institute of Technology are also presented.

关键词： Vegetation Forestry Sensors Retardants Monitoring Lattices Unmanned aerial vehicles

来源：评论

学校读者我要写书评

暂无评论

Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design

Using ADP to Understand and Replicate Brain Intelligence: th...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Paul J. Werbos National Science Foundation Arlington VA USA

Since the 1960's the author proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptive dynamic programming (ADP) - like "reinforcement learning" but based on approximating the Bellman equation and allowing the controller to know its utility function. Growing empirical evidence on the brain supports this approach. Adaptive critic systems now meet tough engineering challenges and provide a kind of first-generation model of the brain. Lewis, Prokhorov and myself have early second-generation work. Mammal brains possess three core capabilities - creativity/imagination and ways to manage spatial and temporal complexity - even beyond the second generation. This paper reviews previous progress, and describes new tools and approaches to overcome the spatial complexity gap.

关键词： Adaptive systems Intelligent structures Buildings Programmable control Adaptive control dynamic programming learning Equations Control systems Brain modeling

来源：评论

学校读者我要写书评

暂无评论

Opposition-Based reinforcement learning in the Management of Water Resources

Opposition-Based Reinforcement Learning in the Management of...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： M. Mahootchi H. R. Tizhoosh K. Ponnambalam Systems Design Engineering University of Waterloo Waterloo ONT Canada

Opposition-based learning (OBL) is a new scheme in machine intelligence. In this paper, an OBL version Q-learning which exploits opposite quantities to accelerate the learning is used for management of single reservoir operations. In this method, an agent takes an action, receives reward, and updates its knowledge in terms of action-value functions. Furthermore, the transition function which is the balance equation in the optimization model determines the next state and updates the action-value function pertinent to opposite action. Two type of opposite actions will be defined. It will be demonstrated that using OBL can significantly improve the efficiency of the operating policy within limited iterations. It is also shown that this technique is more robust than Q-learning

关键词： Resource management Water resources Reservoirs dynamic programming Stochastic processes Machine learning Neural networks Design engineering Systems engineering and theory Machine intelligence

来源：评论

学校读者我要写书评

暂无评论

A Novel Fuzzy reinforcement learning Approach in Two-Level Intelligent Control of 3-DOF Robot Manipulators

A Novel Fuzzy Reinforcement Learning Approach in Two-Level I...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Nasser Sadati Mohammad Mollaie Emamzadeh Electrical Engineering Department Sharif University of Technology Tehran Tehran Iran Electrical Engineering Department Sharif University of Technology Tehran Iran

In this paper, a fuzzy coordination method based on interaction prediction principle (IPP) and reinforcement learning is presented for the optimal control of robot manipulators with three degrees-of-freedom. For this purpose, the robot manipulator is considered as a two-level large-scale system where in the first level, the robot manipulator is decomposed into several subsystems. In the second level, a fuzzy interaction prediction system is introduced for coordination of the overall system where a critic vector is also used for evaluating its performance. The simulation results on using the proposed novel approach, for optimal control of robot manipulators show its effectiveness and superiority in comparison with the centralized optimization methods

关键词： Fuzzy control learning Intelligent control Intelligent robots Manipulators Robot kinematics Optimal control Large-scale systems Fuzzy systems Optimization methods

来源：评论

学校读者我要写书评

暂无评论

Short-term Stock Market Timing Prediction under reinforcement learning Schemes

Short-term Stock Market Timing Prediction under Reinforcemen...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hailin Li Cihan H. Dagli David Enke Department of Engineering Management and Systems Engineering University of Missouri Rolla Rolla MO USA

There are fundamental difficulties when only using a supervised learning philosophy to predict financial stock short-term movements. We present a reinforcement-oriented forecasting framework in which the solution is converted from a typical error-based learning approach to a goal-directed match-based learning method. The real market timing ability in forecasting is addressed as well as traditional goodness-of-fit-based criteria. We develop two applicable hybrid prediction systems by adopting actor-only and actor-critic reinforcement learning, respectively, and compare them to both a supervised-only model and a classical random walk benchmark in forecasting three daily-based stock indices series within a 21-year learning and testing period. The performance of actor-critic-based systems was demonstrated to be superior to that of other alternatives, while the proposed actor-only systems also showed efficacy

关键词： Stock markets Timing Economic forecasting dynamic programming Stochastic processes Predictive models Testing Supervised learning Artificial intelligence Research and development management

来源：评论

学校读者我要写书评

暂无评论

Using Reward-weighted Regression for reinforcement learning of Task Space Control

Using Reward-weighted Regression for Reinforcement Learning ...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jan Peters Stefan Schaal University of Southern California Los Angeles CA USA

Many robot control problems of practical importance, including task or operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots

关键词： learning Orbital robotics Robot kinematics Control systems Robot control Optimal control Robot sensing systems Manipulators Anthropomorphism Acceleration

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Adaptive dynamic programming using Wavelet Basis Function Neural Networks

Discrete-Time Adaptive Dynamic Programming using Wavelet Bas...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Ning Jin Derong Liu Ting Huang Zhongyu Pang Department of Electrical and Computer Engineering University of Illinois Chicago IL USA

dynamic programming for discrete time systems is difficult due to the "curse of dimensionality": one has to find a series of control actions that must be taken in sequence, hoping that this sequence will lead to the optimal performance cost, but the total cost of those actions will be unknown until the end of that sequence. In this paper, we present our work on adaptive dynamic programming (ADP) for nonlinear discrete time system using neural networks. The neural network we adopted here is the wavelet basis function (WBF) neural network. We will exam the performance of an ADP algorithm using WBF neural networks. The comparison shows that when WBF neural networks are employed, the ADP algorithm gives faster training speed than when RBF neural networks are employed

关键词： dynamic programming Discrete wavelet transforms Neural networks Cost function Optimal control Function approximation Equations learning Control systems Discrete time systems

来源：评论

学校读者我要写书评

暂无评论

approximate Optimal Control-Based Neurocontroller with a State Observation System for Seedlings Growth in Greenhouse

Approximate Optimal Control-Based Neurocontroller with a Sta...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： H. D. Patino J. A. Pucheta C. Schugurensky R. Fullana B. Kuchen Universidad Nacional de San Juan San Juan Argentina

In this paper, an approximate optimal control-based neurocontroller for guiding the seedlings growth in greenhouse is presented. The main goal of this approach is to obtain a close-loop operation with a state neurocontroller, whose design is based on approximate optimal control theory. The neurocontroller drives the progress of the crop growth development while minimizing a predefined cost function in terms of operative costs and final state errors under physical constraints on process variables and actuator signals. The aim is to find an approximate optimal control policy to guide the development of tomato seedlings from an initial to a desired state by controlling the greenhouse's microclimate. In this paper we propose an indirect measuring of the seedlings growth state using artificial vision. In order to show the performance and practical feasibility of the proposed approach, an experiment was carried out for the development of tomato seedings

关键词： Optimal control Control systems Neurocontrollers Crops Temperature Cost function Production Observers dynamic programming learning

来源：评论

学校读者我要写书评

暂无评论

Leader-Follower semi-Markov Decision Problems: Theoretical Framework and approximate Solution

Leader-Follower semi-Markov Decision Problems: Theoretical F...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Kurian Tharakunnel Siddhartha Bhattacharyya Department of Information and Decision Sciences University of Illinois Chicago Chicago IL USA

Leader-follower problems are hierarchical decision problems in which a leader uses incentives to induce certain desired behavior among a set of self-interested followers. dynamic leader-follower problems extend this structure to multi-period decision situations. In this work we propose a Markov decision process (MDP) framework for a class of dynamic leader-follower problems that have important applications and discuss their approximate solution using reinforcement learning (RL). In these problems, the leader makes incentive decisions intermittently while the followers make their decisions in every period. Our theoretical framework and computational approach are based on the observation that such dynamic problems can be thought of as consisting of two coupled sequential decision processes, that of the leader and of the followers. In our formulation, the leader's decision problem that has the structure of a single-agent semi-Markov decision process (SMDP), and the followers' sequential decision problem structured as a stochastic game (multiagent competitive MDP) operate over the same state space. We call this MDP framework a leader-follower semi-Markov decision process (LFSMDP). We consider approximate solution of these problems using RL and demonstrate the solution approach in the special case where the followers' stochastic game is a repeated game.

关键词： Game theory learning Stochastic processes Pricing dynamic programming State-space methods Decision making Communication networks Electricity supply industry Peer to peer computing

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共31页 << < 16 17 18 19 20 21 22 23 24 25 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：