检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

140 篇 会议
7 篇 期刊文献

馆藏范围

147 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

71 篇 工学
- 66 篇 计算机科学与技术...
- 15 篇 软件工程
- 11 篇 电气工程
- 9 篇 控制科学与工程
- 2 篇 仪器科学与技术
- 2 篇 信息与通信工程
- 1 篇 力学（可授工学、理...
- 1 篇 机械工程
- 1 篇 建筑学
11 篇 理学
- 10 篇 数学
- 2 篇 系统科学
- 2 篇 统计学（可授理学、...
5 篇 管理学
- 4 篇 管理科学与工程(可...
- 3 篇 工商管理
- 1 篇 图书情报与档案管...
3 篇 经济学
- 3 篇 应用经济学

主题

76 篇 dynamic programm...
39 篇 learning
26 篇 optimal control
25 篇 reinforcement le...
15 篇 function approxi...
15 篇 control systems
14 篇 approximation al...
14 篇 equations
13 篇 neural networks
13 篇 stochastic proce...
12 篇 convergence
10 篇 state-space meth...
10 篇 cost function
9 篇 mathematical mod...
8 篇 trajectory
8 篇 approximation me...
7 篇 approximate dyna...
7 篇 algorithm design...
7 篇 adaptive control
7 篇 heuristic algori...

机构

4 篇 school of inform...
4 篇 department of in...
3 篇 department of el...
3 篇 northeastern uni...
3 篇 univ texas autom...
3 篇 arizona state un...
3 篇 robotics institu...
3 篇 univ illinois de...
2 篇 princeton univ d...
2 篇 national science...
2 篇 college of mecha...
2 篇 key laboratory o...
2 篇 univ utrecht dep...
2 篇 department of op...
1 篇 inria
1 篇 computational le...
1 篇 school of automa...
1 篇 univ cincinnati ...
1 篇 toyota technol c...
1 篇 neuroinformatics...

作者

5 篇 liu derong
4 篇 xu xin
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 marco a. wiering
4 篇 zhang huaguang
4 篇 si jennie
4 篇 derong liu
3 篇 hado van hasselt
3 篇 lewis frank l.
3 篇 dongbin zhao
3 篇 powell warren b.
3 篇 warren b. powell
3 篇 riedmiller marti...
2 篇 manuel loth
2 篇 van hasselt hado
2 篇 preux philippe
2 篇 hu dewen
2 篇 jennie si
2 篇 philippe preux

语言

142 篇 英文
5 篇 其他

检索条件"任意字段=2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007"

共 147 条记录，以下是121-130 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence Proof

Discrete-time nonlinear HJB solution using Approximate dynam...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Asma Al-Tamimi Frank Lewis Automation & Robotics Research Institute University of Texas Arlington Fort Worth TX USA

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used - one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

关键词： dynamic programming Optimal control Function approximation Riccati equations Robotics and automation Nonlinear equations learning Convergence Linear systems Neural networks

来源：评论

学校读者我要写书评

暂无评论

approximate Optimal Control-Based Neurocontroller with a State Observation System for Seedlings Growth in Greenhouse

Approximate Optimal Control-Based Neurocontroller with a Sta...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： H. D. Patino J. A. Pucheta C. Schugurensky R. Fullana B. Kuchen Universidad Nacional de San Juan San Juan Argentina

In this paper, an approximate optimal control-based neurocontroller for guiding the seedlings growth in greenhouse is presented. The main goal of this approach is to obtain a close-loop operation with a state neurocontroller, whose design is based on approximate optimal control theory. The neurocontroller drives the progress of the crop growth development while minimizing a predefined cost function in terms of operative costs and final state errors under physical constraints on process variables and actuator signals. The aim is to find an approximate optimal control policy to guide the development of tomato seedlings from an initial to a desired state by controlling the greenhouse's microclimate. In this paper we propose an indirect measuring of the seedlings growth state using artificial vision. In order to show the performance and practical feasibility of the proposed approach, an experiment was carried out for the development of tomato seedings

关键词： Optimal control Control systems Neurocontrollers Crops Temperature Cost function Production Observers dynamic programming learning

来源：评论

学校读者我要写书评

暂无评论

Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

Evaluation of Policy Gradient Methods and Variants on the Ca...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Martin Riedmiller Jan Peters Stefan Schaal NeuroInformatics Group University of Osnabrück Germany Computational Learning and Motor Control University of Southern California USA

In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, 'vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease

关键词： Gradient methods learning Finite difference methods Solids Legged locomotion Stochastic processes dynamic programming Motor drives Optimization methods Regulators

来源：评论

学校读者我要写书评

暂无评论

A Novel Fuzzy reinforcement learning Approach in Two-Level Intelligent Control of 3-DOF Robot Manipulators

A Novel Fuzzy Reinforcement Learning Approach in Two-Level I...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Nasser Sadati Mohammad Mollaie Emamzadeh Electrical Engineering Department Sharif University of Technology Tehran Tehran Iran Electrical Engineering Department Sharif University of Technology Tehran Iran

In this paper, a fuzzy coordination method based on interaction prediction principle (IPP) and reinforcement learning is presented for the optimal control of robot manipulators with three degrees-of-freedom. For this purpose, the robot manipulator is considered as a two-level large-scale system where in the first level, the robot manipulator is decomposed into several subsystems. In the second level, a fuzzy interaction prediction system is introduced for coordination of the overall system where a critic vector is also used for evaluating its performance. The simulation results on using the proposed novel approach, for optimal control of robot manipulators show its effectiveness and superiority in comparison with the centralized optimization methods

关键词： Fuzzy control learning Intelligent control Intelligent robots Manipulators Robot kinematics Optimal control Large-scale systems Fuzzy systems Optimization methods

来源：评论

学校读者我要写书评

暂无评论

Coordinated reinforcement learning for Decentralized Optimal Control

Coordinated Reinforcement Learning for Decentralized Optimal...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Daniel Yagan Chen-Khong Tham Department of Electrical and Computer Engineering National University of Singapore Singapore

We consider a multi-agent system where the overall performance is affected by the joint actions or policies of agents. However, each agent only observes a partial view of the global state condition. This model is known as a decentralized partially-observable Markov decision process (DEC-POMDP), which can be considered more applicable in real-world applications such as communication networks. It is known that the exact solution to a DEC-POMDP is NEXP-complete and memory requirements grow exponentially even for finite-horizon problems. In this paper, we propose to address these issues by using an online model-free technique and by exploiting the locality of interaction among agents in order to approximate the joint optimal policy. Simulation results show the effectiveness and convergence of the proposed algorithm in the context of resource allocation for multiagent wireless multi-hop networks.

关键词： learning Optimal control Control systems Resource management Spread spectrum communication Context modeling Multiagent systems Communication networks Stochastic processes dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes

Computing Optimal Stationary Policies for Multi-Objective Ma...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Edwin D. de Jong Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes a novel algorithm called CON-MODP for computing Pareto optimal policies for deterministic multi-objective sequential decision problems. CON-MODP is a value iteration based multi-objective dynamic programming algorithm that only computes stationary policies. We observe that for guaranteeing convergence to the unique Pareto optimal set of deterministic stationary policies, the algorithm needs to perform a policy evaluation step on particular policies that are inconsistent in a single state that is being expanded. We prove that the algorithm converges to the Pareto optimal set of value functions and policies for deterministic infinite horizon discounted multi-objective Markov decision processes. Experiments show that CON-MODP is much faster than previous multi-objective value iteration algorithms.

关键词： dynamic programming learning Distributed computing Heuristic algorithms Convergence Infinite horizon Intelligent systems Deductive databases Distributed databases Electronic mail

来源：评论

学校读者我要写书评

暂无评论

Sparse Temporal Difference learning Using LASSO

Sparse Temporal Difference Learning Using LASSO

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Manuel Loth Manuel Davy Philippe Preux SequeL INRIA-Futurs LIFL CNRS University of Lille (USTL) France SequeL INRIA-Futurs Lagis CNRS Ecole Centrale de Lille France SequeL INRIA-Futurs LIFL CNRS University of Lille (USTL) France

We consider the problem of on-line value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approximators. We propose to fit the use of kernels into the temporal difference algorithms by using regression via the LASSO. We introduce the equi-gradient descent algorithm (EGD) which is a direct adaptation of the one recently introduced in the LARS algorithm family for solving the LASSO. We advocate our choice of the EGD as a judicious algorithm for these tasks. We present the EGD algorithm in details as well as some experimental results. We insist on the qualities of the EGD for reinforcement learning.

关键词： learning Kernel Convergence Computational efficiency dynamic programming Costs Minimization methods Input variables Approximation algorithms Linear approximation

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of direct heuristic dynamic programming using control-theoretic measures

Performance analysis of direct heuristic dynamic programming...

引用

International Joint Conference on Neural Networks

作者： Yang, Lei Si, Jennie Tsakalis, Konstantinos S. Rodriguez, Annando A. Arizona State Univ Dept Elect Engn Tempe AZ 85287 USA

ISBN: (纸本)9781424413799

approximate dynamic programming (ADP) has been widely studied from several important perspectives: algorithm development, learning efficiency measured by success or failure statistics, convergence rate, and learning error bounds. Given that many learning benchmarks used in ADP or reinforcement learning studies are control problems, it is important and necessary to examine the learning controllers from a control-theoretic perspective. This paper makes use of direct heuristic dynamic programming (direct HDP) and several benchmark examples to introduce a unique analytical framework that can be extended to other learning control paradigms and other complex control problems. The sensitivity analysis and the linear quadratic regulator (LQR) design are used in the paper for two purposes: to gauge direct HDP performance characteristics and to provide guidance toward designing better learning controllers. This gauge however does not limit the direct HDP to be effective only as a linear controller. Toward this end, applications of the direct HDP for nonlinear control problems beyond sensitivity analysis and the confines of LQR have been developed and compared with LQR design for command following and internal system parameter changes.

关键词： Sensitivity analysis

来源：评论

学校读者我要写书评

暂无评论

Strategy Generation with Cognitive Distance in Two-Player Games

Strategy Generation with Cognitive Distance in Two-Player Ga...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kosuke Sekiyama Ricardo Carnieri Toshio Fukuda Department of Micro-Nano Systems Engineering University of Nagoya Nagoya Japan

In game theoretical approaches to multi-agent systems, a payoff matrix is often given a priori and used by agents in action selection. By contrast, in this paper we approach the problem of decision making by use of the concept of cognitive distance, which is a notion of the difficulty of an action perceived subjectively by the agent. As opposed to ordinary physical distance, cognitive distance depends on the situation and skills of the agent, ultimately representing the perceived difficulty in performing an action given the current state. The concept of cognitive distance is applied to a two-player game scenario, and it is shown how an agent can learn a model of its skills by estimating and observing the outcomes of its actions. This skill model is then used during play in a minimax search for the best actions

关键词： Game theory Uncertainty Decision making dynamic programming learning Systems engineering and theory Multiagent systems Minimax techniques Stochastic processes

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Adaptive dynamic programming using Wavelet Basis Function Neural Networks

Discrete-Time Adaptive Dynamic Programming using Wavelet Bas...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Ning Jin Derong Liu Ting Huang Zhongyu Pang Department of Electrical and Computer Engineering University of Illinois Chicago IL USA

dynamic programming for discrete time systems is difficult due to the "curse of dimensionality": one has to find a series of control actions that must be taken in sequence, hoping that this sequence will lead to the optimal performance cost, but the total cost of those actions will be unknown until the end of that sequence. In this paper, we present our work on adaptive dynamic programming (ADP) for nonlinear discrete time system using neural networks. The neural network we adopted here is the wavelet basis function (WBF) neural network. We will exam the performance of an ADP algorithm using WBF neural networks. The comparison shows that when WBF neural networks are employed, the ADP algorithm gives faster training speed than when RBF neural networks are employed

关键词： dynamic programming Discrete wavelet transforms Neural networks Cost function Optimal control Function approximation Equations learning Control systems Discrete time systems

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共15页 << < 6 7 8 9 10 11 12 13 14 15 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：