检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

229 篇 会议
18 篇 期刊文献

馆藏范围

247 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

113 篇 工学
- 103 篇 计算机科学与技术...
- 42 篇 软件工程
- 38 篇 电气工程
- 23 篇 控制科学与工程
- 5 篇 信息与通信工程
- 3 篇 机械工程
- 2 篇 力学（可授工学、理...
- 1 篇 仪器科学与技术
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
27 篇 理学
- 25 篇 数学
- 7 篇 系统科学
- 6 篇 统计学（可授理学、...
- 1 篇 物理学
- 1 篇 化学
- 1 篇 大气科学
10 篇 管理学
- 8 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
54 篇 optimal control
51 篇 learning
44 篇 reinforcement le...
35 篇 learning (artifi...
27 篇 equations
25 篇 neural networks
22 篇 heuristic algori...
20 篇 convergence
20 篇 control systems
18 篇 function approxi...
18 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
15 篇 cost function
14 篇 markov processes
14 篇 nonlinear system...
14 篇 artificial neura...
13 篇 stochastic proce...
12 篇 adaptive dynamic...

机构

10 篇 chinese acad sci...
5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 department of el...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 natl univ def te...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...

作者

13 篇 liu derong
7 篇 hado van hasselt
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 zhao dongbin
5 篇 xu xin
5 篇 lewis frank l.
5 篇 huaguang zhang
5 篇 wei qinglai
5 篇 derong liu
5 篇 warren b. powell
4 篇 haibo he
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 zhang huaguang
4 篇 ni zhen
4 篇 yanhong luo
4 篇 wang ding
4 篇 he haibo
4 篇 damien ernst

语言

246 篇 英文
1 篇 其他

检索条件"任意字段=2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014"

共 247 条记录，以下是231-240 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Robust dynamic programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Robust Dynamic Programming for Discounted Infinite-Horizon M...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Baohua Li Jennie Si Department of Electrical Engineering Arizona State University Tempe AZ USA

In this paper, finite-state, finite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space

关键词： Robustness dynamic programming Space stations learning Telephony Cost function Estimation error Design methodology Approximation methods Equations

来源：评论

学校读者我要写书评

暂无评论

Algorithm and stability of ATC receding horizon control

Algorithm and stability of ATC receding horizon control

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Hongwei Zhang Jie Huang Frank L. Lewis Department of Mechanical and Automation Engineering Chinese University of Hong Kong New Territories Hong Kong China Automation and Robotics Research Institute University of Texas Arlington Fort Worth TX USA

Receding horizon control (RHC), also known as model predictive control (MPC), is a suboptimal control scheme that solves a finite horizon open-loop optimal control problem in an infinite horizon context and yields a measured state feedback control law. A lot of efforts have been made to study the closed-loop stability, leading to various stability conditions involving constraints on either the terminal state, or the terminal cost, or the horizon size, or their different combinations. In this paper, we propose a modified RHC scheme, called adaptive terminal cost RHC (ATC-RHC). The control law generated by ATC-RHC algorithm converges to the solution of the infinite horizon optimal control problem. Moreover, it ensures the closed-loop system to be uniformly ultimately exponentially stable without imposing any constraints on the terminal state, the horizon size, or the terminal cost. Finally we show that when the horizon size is one, the underlying problems of ATC-RHC and heuristic dynamic programming (HDP) are the same. Thus, ATC-RHC can be implemented using HDP techniques without knowing the system matrix A.

关键词： Stability Optimal control Open loop systems Costs Infinite horizon Context modeling Predictive models Predictive control State feedback dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Value-Iteration Based Fitted Policy Iteration: learning with a Single Trajectory

Value-Iteration Based Fitted Policy Iteration: Learning with...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Andras Antos Csaba Szepesvari Remi Munos Computer and Automation Research Inst. Hungarian Academy of Sciences Budapest Hungary University of Alberta Edmonton Canada SequeL team INRIA Futurs University of Lille (USTL) Villeneuve d'Ascq France

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian decision problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian decision problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

关键词： learning Training data Algorithm design and analysis dynamic programming Automation Polynomials State-space methods Control systems Interleaved codes Extraterrestrial measurements

来源：评论

学校读者我要写书评

暂无评论

Discrete-time nonlinear HJB solution using Approximate dynamic programming: Convergence Proof

Discrete-time nonlinear HJB solution using Approximate dynam...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Asma Al-Tamimi Frank Lewis Automation & Robotics Research Institute University of Texas Arlington Fort Worth TX USA

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used - one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

关键词： dynamic programming Optimal control Function approximation Riccati equations Robotics and automation Nonlinear equations learning Convergence Linear systems Neural networks

来源：评论

学校读者我要写书评

暂无评论

Neuro-controller of cement rotary kiln temperature with adaptive critic designs

Neuro-controller of cement rotary kiln temperature with adap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Xiaofeng Lin Tangbo Liu Shaojian Song Chunning Song College of Electrical Engineering Guangxi University Nanning China College of Electrical Engineering Guangxi University China

The production process of the cement rotary kiln is a typical engineering thermodynamics with large inertia, lagging and nonlinearity. So it is very difficult to control this process accurately using traditional control theory. In order to guarantee the process to be stable, and to produce the high-grade cement clinker, it is important to make the temperature of the sintering zone stable. artificial neural networks offer a solution to this problem due to their advantages, such as self-organization, self-adaptivity and fault tolerance. This paper introduces a novel nonlinear optimal neuro-controller which is based on adaptive critic design and uses the structure of action-dependant heuristic dynamic programming (ADHDP). The principle of ADHDP is presented. An action network and a critic network are set up in such a way that they basically learn from interactions based on local measurement to optimize the neuro-controller. The ADHDP neuro-controller has a simple frame-work and is independent from the system model. A simulation of the cement rotary kiln is carried out using Matlab/Simulink. The simulation results show that using the ADHDP neuro-controller it is possible to keep the temperature of sintering zone stable in a certain range, and the temperature can meet the requirements of cement clinker production. Simulation results also are presented to show that the neuro-controller with the ACD has the potential to control the cement rotary kiln.

关键词： Kilns Production Temperature distribution Thermodynamics Process control Control theory Artificial neural networks Fault tolerance dynamic programming Mathematical model

来源：评论

学校读者我要写书评

暂无评论

A dynamic programming Approach to Viability Problems

A Dynamic Programming Approach to Viability Problems

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Pierre-Arnaud Coquelin Sophie Martin Remi Munos Centre de Mathématiques Appliquées Ecole Polytechnique Palaiseau France Laboratoire dIngénierie pour les Systémes Complexes Cemagref de Clermont-Ferrand Aubiere France INRIA Futurs Universite de Lille 3 France

Viability theory considers the problem of maintaining a system under a set of viability constraints. The main tool for solving viability problems lies in the construction of the viability kernel, defined as the set of initial states from which there exists a trajectory that remains in the set of constraints indefinitely. The theory is very elegant and appears naturally in many applications. Unfortunately, the current numerical approaches suffer from low computational efficiency, which limits the potential range of applications of this domain. In this paper we show that the viability kernel is the zero-level set of a related dynamic programming problem, which opens promising research directions for numerical approximation of the viability kernel using tools from approximate dynamic programming. We illustrate the approach using k-nearest neighbors on a toy problem in two dimensions and on a complex dynamical model for anaerobic digestion process in four dimensions

关键词： dynamic programming Kernel Control systems Evolution (biology) Constraint theory Grid computing learning Computational efficiency Time factors Uncertain systems

来源：评论

学校读者我要写书评

暂无评论

Approximate Optimal Control-Based Neurocontroller with a State Observation System for Seedlings Growth in Greenhouse

Approximate Optimal Control-Based Neurocontroller with a Sta...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： H. D. Patino J. A. Pucheta C. Schugurensky R. Fullana B. Kuchen Universidad Nacional de San Juan San Juan Argentina

In this paper, an approximate optimal control-based neurocontroller for guiding the seedlings growth in greenhouse is presented. The main goal of this approach is to obtain a close-loop operation with a state neurocontroller, whose design is based on approximate optimal control theory. The neurocontroller drives the progress of the crop growth development while minimizing a predefined cost function in terms of operative costs and final state errors under physical constraints on process variables and actuator signals. The aim is to find an approximate optimal control policy to guide the development of tomato seedlings from an initial to a desired state by controlling the greenhouse's microclimate. In this paper we propose an indirect measuring of the seedlings growth state using artificial vision. In order to show the performance and practical feasibility of the proposed approach, an experiment was carried out for the development of tomato seedings

关键词： Optimal control Control systems Neurocontrollers Crops Temperature Cost function Production Observers dynamic programming learning

来源：评论

学校读者我要写书评

暂无评论

Leader-Follower semi-Markov Decision Problems: Theoretical Framework and Approximate Solution

Leader-Follower semi-Markov Decision Problems: Theoretical F...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kurian Tharakunnel Siddhartha Bhattacharyya Department of Information and Decision Sciences University of Illinois Chicago Chicago IL USA

Leader-follower problems are hierarchical decision problems in which a leader uses incentives to induce certain desired behavior among a set of self-interested followers. dynamic leader-follower problems extend this structure to multi-period decision situations. In this work we propose a Markov decision process (MDP) framework for a class of dynamic leader-follower problems that have important applications and discuss their approximate solution using reinforcement learning (RL). In these problems, the leader makes incentive decisions intermittently while the followers make their decisions in every period. Our theoretical framework and computational approach are based on the observation that such dynamic problems can be thought of as consisting of two coupled sequential decision processes, that of the leader and of the followers. In our formulation, the leader's decision problem that has the structure of a single-agent semi-Markov decision process (SMDP), and the followers' sequential decision problem structured as a stochastic game (multiagent competitive MDP) operate over the same state space. We call this MDP framework a leader-follower semi-Markov decision process (LFSMDP). We consider approximate solution of these problems using RL and demonstrate the solution approach in the special case where the followers' stochastic game is a repeated game.

关键词： Game theory learning Stochastic processes Pricing dynamic programming State-space methods Decision making Communication networks Electricity supply industry Peer to peer computing

来源：评论

学校读者我要写书评

暂无评论

Opposition-Based Q(λ) with Non-Markovian Update

Opposition-Based Q(λ) with Non-Markovian Update

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Maryam Shokri Hamid R. Tizhoosh Mohamed S. Kamel Pattern Analysis and Machine Intelligence Laboratory Department of Systems Design Engineering University of Waterloo ONT Canada Department of Electrical and Computer Engineering University of Waterloo ONT Canada

The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins' Q(λ) reflected a remarkable increase in performance for the OQ(λ) algorithm. However the Markovian update of opposition traces is an issue which is investigated in this paper. It has been assumed that the opposite state can be presented to the agent. This may limit the usability of the technique to deterministic environments. In order to relax this assumption the non-Markovian opposition-based Q(λ) (NOQ(λ)) is introduced in this work. The new method is a hybrid of Markovian update for eligibility traces and non-Markovian-based update for opposition traces. The experimental results show improvements of learning speed for the proposed technique compared to Q(λ) and OQ(λ). The new technique performs faster than OQ(λ) algorithm with the same success rate and can be employed for broader range of applications since it does not require determining state transition

关键词： learning Usability dynamic programming Pattern analysis Machine intelligence Laboratories System analysis and design Design engineering Systems engineering and theory State-space methods

来源：评论

学校读者我要写书评

暂无评论

Coupling perception and action using minimax optimal control

Coupling perception and action using minimax optimal control

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Tom Erez William D. Smart Washington University Saint Louis MO USA

This paper proposes a novel approach for coupling perception and action through minimax dynamic programming. We tackle domains where the agent has some control over the observation process (e.g. via the manipulation of some sensors), and show how to transform the system so that an optimal control solution can be sought with standard algorithms. We demonstrate our method in a toy domain, where an agent guides two point masses (ldquohandsrdquo) to a target in a 2D scene with obstacles. The agent can direct the gaze of a virtual ldquoeyerdquo to different parts of the scene, thereby reducing the observation noise for elements of the scene in that vicinity and improving the quality of feedback control. In this manner, motor control of the eye allots attentional resources. We propose a unified framework that treats both perception and action as interdependent components of the same optimal control task. The implications of uncertainty on task performance are uncovered by deploying an adversary whose strength to do harm is proportional to the instantaneous level of state uncertainty. We transform the partially-observable system to a fully-observable by coupling the state dynamics with a state-estimation filter, and so augment the state space to include an explicit representation of the instantaneous state uncertainty. The augmented system is high-dimensional, but through minimax differential dynamic programming, a local method that is less susceptible to the curse of dimensionality, we are able to solve for the optimal control of the hands and the eye at the same time, allowing for the emergence of interesting phenomena such as hand-eye coordination, saccades and smooth pursuit.

关键词： Minimax techniques Optimal control Layout Uncertainty dynamic programming Control systems Sensor systems Noise reduction Feedback control Motor drives

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共25页 << < 16 17 18 19 20 21 22 23 24 25 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：