检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

299 篇 会议
8 篇 期刊文献

馆藏范围

307 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

180 篇 工学
- 158 篇 计算机科学与技术...
- 56 篇 电气工程
- 48 篇 软件工程
- 47 篇 控制科学与工程
- 13 篇 信息与通信工程
- 10 篇 机械工程
- 6 篇 仪器科学与技术
- 4 篇 力学（可授工学、理...
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 2 篇 交通运输工程
- 2 篇 核科学与技术
- 2 篇 生物医学工程（可授...
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 航空宇航科学与技...
- 1 篇 食品科学与工程（可...
40 篇 理学
- 35 篇 数学
- 9 篇 系统科学
- 8 篇 统计学（可授理学、...
- 4 篇 物理学
- 4 篇 生物学
- 1 篇 化学
- 1 篇 天文学
- 1 篇 大气科学
- 1 篇 地球物理学
- 1 篇 地质学
18 篇 管理学
- 17 篇 管理科学与工程(可...
- 7 篇 工商管理
4 篇 经济学
- 4 篇 应用经济学
1 篇 医学

主题

115 篇 dynamic programm...
76 篇 reinforcement le...
67 篇 learning
47 篇 optimal control
30 篇 neural networks
27 篇 control systems
21 篇 approximate dyna...
21 篇 approximation al...
20 篇 function approxi...
20 篇 equations
17 篇 convergence
16 篇 adaptive dynamic...
16 篇 state-space meth...
16 篇 heuristic algori...
14 篇 mathematical mod...
13 篇 stochastic proce...
12 篇 learning (artifi...
12 篇 adaptive control
12 篇 cost function
11 篇 algorithm design...

机构

5 篇 arizona state un...
4 篇 department of el...
4 篇 school of inform...
4 篇 department of in...
4 篇 univ sci & techn...
4 篇 chinese acad sci...
4 篇 department of el...
3 篇 princeton univ d...
3 篇 northeastern uni...
3 篇 national science...
3 篇 robotics institu...
3 篇 univ illinois de...
3 篇 univ utrecht dep...
2 篇 univ groningen i...
2 篇 sharif univ tech...
2 篇 univ texas autom...
2 篇 pengcheng labora...
2 篇 guangxi univ sch...
2 篇 chinese acad sci...
2 篇 cemagref lisc au...

作者

14 篇 liu derong
9 篇 wei qinglai
8 篇 si jennie
7 篇 xu xin
5 篇 derong liu
4 篇 lewis frank l.
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 jennie si
4 篇 marco a. wiering
4 篇 xin xu
4 篇 zhang huaguang
4 篇 dongbin zhao
4 篇 lei yang
4 篇 powell warren b.
4 篇 riedmiller marti...
3 篇 hado van hasselt
3 篇 van hasselt hado
3 篇 jagannathan s.
3 篇 munos remi

语言

305 篇 英文
1 篇 其他
1 篇 中文

检索条件"任意字段=IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning"

共 307 条记录，以下是201-210 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

A Recurrent Control Neural Network for Data Efficient reinforcement learning

A Recurrent Control Neural Network for Data Efficient Reinfo...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Anton Maximilian Schaefer Steffen Udluft Hans-Georg Zimmermann Department of Optimisation and Operations Research University of Ulm (EBS) Germany Department of Learning Systems Information & Communications Siemens AG Munich Germany

In this paper we introduce a new model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Our architecture is based on a recurrent neural network (RNN) with dynamically consistent overshooting, which we extend by an additional control network. The latter has the particular task to learn the optimal policy. This approach has the advantage that by using a neural network we can easily deal with high-dimensions and consequently are able to break Bellman's curse of dimensionality. Further due to the high system-identification quality of RNN our method is highly data-efficient. Because of its properties we refer to our new model as recurrent control neural network (RCNN). The network is tested on a standard reinforcement learning problem, namely the cart-pole balancing, where it shows especially in terms of data-efficiency outstanding results

关键词： Neural networks Recurrent neural networks Communication system control Testing dynamic programming Operations research Telephony learning systems Communications technology Equations

来源：评论

学校读者我要写书评

暂无评论

SVM Viability Controller Active learning: Application to Bike Control

SVM Viability Controller Active Learning: Application to Bik...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Laetitia Chapel Guillaume Deffuant Cemagref LISC Aubiere France

It was shown recently that SVMs are particularly adequate to define action policies to keep a dynamical system inside a given constraint set (in the framework of viability theory). However, the training set of the SVMs face the dimensionality curse, because it is based on a regular grid of the state space. In this paper, we propose an active learning approach, aiming at decreasing dramatically the training set size, keeping it as close as possible to the final number of support vectors. We use a virtual multi-resolution grid, and some particularities of the problem, to choose very efficient examples to add to the training set. To illustrate the performances of the algorithm, we solve a six-dimensional problem, controlling a bike on a track, problem usually solved using reinforcement learning techniques.

关键词： Support vector machines Bicycles Kernel State-space methods learning Environmental factors Costs Labeling Grid computing Support vector machine classification

来源：评论

学校读者我要写书评

暂无评论

Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

Evaluation of Policy Gradient Methods and Variants on the Ca...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Martin Riedmiller Jan Peters Stefan Schaal NeuroInformatics Group University of Osnabrück Germany Computational Learning and Motor Control University of Southern California USA

In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, 'vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease

关键词： Gradient methods learning Finite difference methods Solids Legged locomotion Stochastic processes dynamic programming Motor drives Optimization methods Regulators

来源：评论

学校读者我要写书评

暂无评论

Value-Iteration Based Fitted Policy Iteration: learning with a Single Trajectory

Value-Iteration Based Fitted Policy Iteration: Learning with...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Andras Antos Csaba Szepesvari Remi Munos Computer and Automation Research Inst. Hungarian Academy of Sciences Budapest Hungary University of Alberta Edmonton Canada SequeL team INRIA Futurs University of Lille (USTL) Villeneuve d'Ascq France

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian decision problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian decision problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

关键词： learning Training data Algorithm design and analysis dynamic programming Automation Polynomials State-space methods Control systems Interleaved codes Extraterrestrial measurements

来源：评论

学校读者我要写书评

暂无评论

2021 ieee/ACM 29th international symposium on Quality of Service, IWQOS 2021

2021 IEEE/ACM 29th International Symposium on Quality of Ser...

引用

29th ieee/ACM international symposium on Quality of Service, IWQOS 2021

ISBN: (纸本)9781665414944

The proceedings contain 105 papers. The topics discussed include: designing approximate and deployable SRPT scheduler: a unified framework;automated quality of service monitoring for 5G and beyond using distributed ledgers;HierTopo: towards high-performance and efficient topology optimization for dynamic networks;LCL: light contactless low-delay load monitoring via compressive attentional multi-label learning;high-QoE DASH live streaming using reinforcement learning;can online learning increase the reliability of extreme mobility management?;secure and efficient task matching with multi-keyword in multi-requester and multi-worker crowdsourcing;Gost: enabling efficient spatio-temporal gpu sharing for network function virtualization;and demystifying the relationship between network latency and mobility on high-speed rails: measurement and prediction.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Strategy Generation with Cognitive Distance in Two-Player Games

Strategy Generation with Cognitive Distance in Two-Player Ga...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Kosuke Sekiyama Ricardo Carnieri Toshio Fukuda Department of Micro-Nano Systems Engineering University of Nagoya Nagoya Japan

In game theoretical approaches to multi-agent systems, a payoff matrix is often given a priori and used by agents in action selection. By contrast, in this paper we approach the problem of decision making by use of the concept of cognitive distance, which is a notion of the difficulty of an action perceived subjectively by the agent. As opposed to ordinary physical distance, cognitive distance depends on the situation and skills of the agent, ultimately representing the perceived difficulty in performing an action given the current state. The concept of cognitive distance is applied to a two-player game scenario, and it is shown how an agent can learn a model of its skills by estimating and observing the outcomes of its actions. This skill model is then used during play in a minimax search for the best actions

关键词： Game theory Uncertainty Decision making dynamic programming learning Systems engineering and theory Multiagent systems Minimax techniques Stochastic processes

来源：评论

学校读者我要写书评

暂无评论

A dynamic programming Approach to Viability Problems

A Dynamic Programming Approach to Viability Problems

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Pierre-Arnaud Coquelin Sophie Martin Remi Munos Centre de Mathématiques Appliquées Ecole Polytechnique Palaiseau France Laboratoire dIngénierie pour les Systémes Complexes Cemagref de Clermont-Ferrand Aubiere France INRIA Futurs Universite de Lille 3 France

Viability theory considers the problem of maintaining a system under a set of viability constraints. The main tool for solving viability problems lies in the construction of the viability kernel, defined as the set of initial states from which there exists a trajectory that remains in the set of constraints indefinitely. The theory is very elegant and appears naturally in many applications. Unfortunately, the current numerical approaches suffer from low computational efficiency, which limits the potential range of applications of this domain. In this paper we show that the viability kernel is the zero-level set of a related dynamic programming problem, which opens promising research directions for numerical approximation of the viability kernel using tools from approximate dynamic programming. We illustrate the approach using k-nearest neighbors on a toy problem in two dimensions and on a complex dynamical model for anaerobic digestion process in four dimensions

关键词： dynamic programming Kernel Control systems Evolution (biology) Constraint theory Grid computing learning Computational efficiency Time factors Uncertain systems

来源：评论

学校读者我要写书评

暂无评论

Optimal Sliding Mode Control of ROV Fixed Depth Attitude Based on reinforcement learning 11

Optimal Sliding Mode Control of ROV Fixed Depth Attitude Bas...

引用

11th ieee Annual international Conference on CYBER Technology in Automation, Control, and Intelligent Systems, CYBER 2021

作者： Fule, Wang Qiuxia, Qu Baolong, Yuan Liangliang, Sun Yupeng, Li Guanyan, Guo Zupeng, Xiao Liang, Sun Zhigang, Li School of Information and Control Engineering Shenyang Jianzhu University Shenyang110168 China Shenyang Institute of Automation Chinese Academy of Science Shenyang1101669 China

ISBN: (纸本)9781665425278

In this paper, an integral sliding mode control algorithm based on reinforcement learning is proposed for underwater vehicle depth determination control system. Since it is difficult for nonlinear continuous systems to track time-varying trajectories, the optimal tracking problem is transformed into a nonlinear time invariant optimal control problem by introducing a new state variable. The HJB equation of nonlinear systems is solved by adaptive dynamic programming (ADP) algorithm to find an approximate optimal strategy. Combined with integral sliding mode control, an approximate optimal sliding mode controller is designed. In addition, the Lyapunov equation is used to verify that the control strategy proposed in this paper can ensure that the tracking error of the system converges to zero gradually, and the error is also verified in a small range. Finally, the effectiveness of the algorithm is verified by simulation experiments, which enhances the anti-interference and robustness of the underwater robot in the depth control direction. © 2021 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes

Computing Optimal Stationary Policies for Multi-Objective Ma...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Marco A. Wiering Edwin D. de Jong Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes a novel algorithm called CON-MODP for computing Pareto optimal policies for deterministic multi-objective sequential decision problems. CON-MODP is a value iteration based multi-objective dynamic programming algorithm that only computes stationary policies. We observe that for guaranteeing convergence to the unique Pareto optimal set of deterministic stationary policies, the algorithm needs to perform a policy evaluation step on particular policies that are inconsistent in a single state that is being expanded. We prove that the algorithm converges to the Pareto optimal set of value functions and policies for deterministic infinite horizon discounted multi-objective Markov decision processes. Experiments show that CON-MODP is much faster than previous multi-objective value iteration algorithms.

关键词： dynamic programming learning Distributed computing Heuristic algorithms Convergence Infinite horizon Intelligent systems Deductive databases Distributed databases Electronic mail

来源：评论

学校读者我要写书评

暂无评论

Robust dynamic programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Robust Dynamic Programming for Discounted Infinite-Horizon M...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Baohua Li Jennie Si Department of Electrical Engineering Arizona State University Tempe AZ USA

In this paper, finite-state, finite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space

关键词： Robustness dynamic programming Space stations learning Telephony Cost function Estimation error Design methodology Approximation methods Equations

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共31页 << < 17 18 19 20 21 22 23 24 25 26 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：