检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

299 篇 会议
8 篇 期刊文献

馆藏范围

307 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

180 篇 工学
- 158 篇 计算机科学与技术...
- 56 篇 电气工程
- 48 篇 软件工程
- 47 篇 控制科学与工程
- 13 篇 信息与通信工程
- 10 篇 机械工程
- 6 篇 仪器科学与技术
- 4 篇 力学（可授工学、理...
- 4 篇 生物工程
- 3 篇 动力工程及工程热...
- 2 篇 交通运输工程
- 2 篇 核科学与技术
- 2 篇 生物医学工程（可授...
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 航空宇航科学与技...
- 1 篇 食品科学与工程（可...
40 篇 理学
- 35 篇 数学
- 9 篇 系统科学
- 8 篇 统计学（可授理学、...
- 4 篇 物理学
- 4 篇 生物学
- 1 篇 化学
- 1 篇 天文学
- 1 篇 大气科学
- 1 篇 地球物理学
- 1 篇 地质学
18 篇 管理学
- 17 篇 管理科学与工程(可...
- 7 篇 工商管理
4 篇 经济学
- 4 篇 应用经济学
1 篇 医学

主题

115 篇 dynamic programm...
76 篇 reinforcement le...
67 篇 learning
47 篇 optimal control
30 篇 neural networks
27 篇 control systems
21 篇 approximate dyna...
21 篇 approximation al...
20 篇 function approxi...
20 篇 equations
17 篇 convergence
16 篇 adaptive dynamic...
16 篇 state-space meth...
16 篇 heuristic algori...
14 篇 mathematical mod...
13 篇 stochastic proce...
12 篇 learning (artifi...
12 篇 adaptive control
12 篇 cost function
11 篇 algorithm design...

机构

5 篇 arizona state un...
4 篇 department of el...
4 篇 school of inform...
4 篇 department of in...
4 篇 univ sci & techn...
4 篇 chinese acad sci...
4 篇 department of el...
3 篇 princeton univ d...
3 篇 northeastern uni...
3 篇 national science...
3 篇 robotics institu...
3 篇 univ illinois de...
3 篇 univ utrecht dep...
2 篇 univ groningen i...
2 篇 sharif univ tech...
2 篇 univ texas autom...
2 篇 pengcheng labora...
2 篇 guangxi univ sch...
2 篇 chinese acad sci...
2 篇 cemagref lisc au...

作者

14 篇 liu derong
9 篇 wei qinglai
8 篇 si jennie
7 篇 xu xin
5 篇 derong liu
4 篇 lewis frank l.
4 篇 martin riedmille...
4 篇 huaguang zhang
4 篇 jennie si
4 篇 marco a. wiering
4 篇 xin xu
4 篇 zhang huaguang
4 篇 dongbin zhao
4 篇 lei yang
4 篇 powell warren b.
4 篇 riedmiller marti...
3 篇 hado van hasselt
3 篇 van hasselt hado
3 篇 jagannathan s.
3 篇 munos remi

语言

305 篇 英文
1 篇 其他
1 篇 中文

检索条件"任意字段=IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning"

共 307 条记录，以下是181-190 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Call admission control in wireless DS-CDMA systems using actor-critic reinforcement learning

Call admission control in wireless DS-CDMA systems using act...

引用

2nd international symposium on Wireless Pervasive Computing

作者： Chanloha, Pitipong Usaha, Wipawee Suranaree Univ Technol Sch Telecommun Engn Nakhon Ratchasima 30000 Thailand

ISBN: (纸本)9781424405220

This paper addresses the call admission control (CAC) problem for multiple services in the uplink of a cellular system using direct sequential code division multiple access (DS-CDMA) when taking into account the physical layer channel and receiver structure at the base station. The problem is formulated as a semi-Markov decision process (SMDP) with constraints on the blocking probabilities and signal-to-interference ratio (SIR). The objective is to find a CAC policy which maximizes the throughput while still satisfying these quality-of-service (QoS) constraints. To solve for a near optimal CAC policy, an online decision-making algorithm based on an actor-critic with temporal-difference learning from a recent paper is modified by parameterizing the reward signal to deal with the QoS constraints. The proposed algorithm circumvents the computational complexity experienced in conventional dynamic programming techniques.

关键词： Code division multiple access

来源：评论

学校读者我要写书评

暂无评论

Kernelizing LSPE(λ)

Kernelizing LSPE(λ)

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Tobias Jung Daniel Polani University of Mainz Germany University of Herfordshire UK

We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the 'kernelization' of model-free LSPE(λ). The 'kernelization' is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this

关键词： Least squares approximation Function approximation learning Kernel dynamic programming Electronic mail Optimal control Optimization methods Least squares methods Control systems

来源：评论

学校读者我要写书评

暂无评论

Fitted Q Iteration with CMACs

Fitted Q Iteration with CMACs

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Stephan Timmer Martin Riedmiller Department of Computer Science University of Osnabrück Osnabruck Germany

A major issue in model-free reinforcement learning is how to efficiently exploit the data collected by an exploration strategy. This is especially important in case of continuous, high dimensional state spaces, since it is impossible to explore such spaces exhaustively. A simple but promising approach is to fix the number of state transitions which are sampled from the underlying Markov decision process. For several kernel-based learning algorithms there exist convergence proofs and notable empirical results, if a fixed set of transition instances is used. In this article, we will analyze how function approximators similar to the CMAC-architecture can be combined with this idea. We will show both analytically and empirically the potential power of the CMAC architecture combined with an offline version of Q-learning

关键词： Inference algorithms State-space methods Convergence Computer science Algorithm design and analysis dynamic programming Space exploration Interleaved codes Supervised learning Sampling methods

来源：评论

学校读者我要写书评

暂无评论

Two Novel On-policy reinforcement learning Algorithms based on TD(λ)-methods

Two Novel On-policy Reinforcement Learning Algorithms based ...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Marco A. Wiering Hado van Hasselt Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional actor-critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms

关键词： learning automata Neural networks dynamic programming Intelligent systems State estimation Probability distribution Stochastic systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

Q-learning with Continuous State Spaces and Finite Decision Set

Q-Learning with Continuous State Spaces and Finite Decision ...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Kengy Barty Pierre Girardeau Jean-Sebastien Roy Cyrille Strugarek EDF Research and Development Clamart France

This paper aims to present an original technique in order to compute the optimal policy of a Markov decision problem with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced in 1989 by Watkins for discrete Markov decision problems. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We state under mild assumptions a converge theorem for this algorithm. Finally, we illustrate our algorithm by solving two classical problems: the mountain car task and the puddle world task

关键词： State-space methods Kernel Costs dynamic programming Stochastic processes Recursive estimation Random variables learning Approximation algorithms Uncertainty

来源：评论

学校读者我要写书评

暂无评论

Coordinated reinforcement learning for Decentralized Optimal Control

Coordinated Reinforcement Learning for Decentralized Optimal...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Daniel Yagan Chen-Khong Tham Department of Electrical and Computer Engineering National University of Singapore Singapore

We consider a multi-agent system where the overall performance is affected by the joint actions or policies of agents. However, each agent only observes a partial view of the global state condition. This model is known as a decentralized partially-observable Markov decision process (DEC-POMDP), which can be considered more applicable in real-world applications such as communication networks. It is known that the exact solution to a DEC-POMDP is NEXP-complete and memory requirements grow exponentially even for finite-horizon problems. In this paper, we propose to address these issues by using an online model-free technique and by exploiting the locality of interaction among agents in order to approximate the joint optimal policy. Simulation results show the effectiveness and convergence of the proposed algorithm in the context of resource allocation for multiagent wireless multi-hop networks.

关键词： learning Optimal control Control systems Resource management Spread spectrum communication Context modeling Multiagent systems Communication networks Stochastic processes dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning

A Theoretical Analysis of Cooperative Behavior in Multi-agen...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Ludo Waltman Uzay Kaymak Erasmus Erasmus University Rotterdam Rotterdam Netherlands

A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This paper provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisoner's dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisoner's dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results presented in this paper are quite robust to violations of the underlying assumptions

关键词： Helium Oligopoly Nash equilibrium dynamic programming learning Environmental economics Robustness Performance analysis Algorithm design and analysis Microeconomics

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Alexandre Sztykgold Gilles Coppin Olivier Hudry GET/ENST-Bretagne LUSSI Department France GET/ENST Computer Science Department France

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to temporal differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues

关键词： Game theory dynamic programming learning Decision making Computer science Military computing Stochastic processes Shortest path problem Qualifications Graph theory

来源：评论

学校读者我要写书评

暂无评论

Safe Adaptive dynamic programming Method for Nonlinear Safety-Critical Systems with Disturbance 6

Safe Adaptive Dynamic Programming Method for Nonlinear Safet...

引用

6th international Conference on Robotics and Automation Engineering, ICRAE 2021

作者： Wang, Jinguang Zhang, Dehua Zhang, Jishi Zhu, Heyang Hu, Shaolin Qin, Chunbin Henan University School of Artificial Intelligence Kaifeng China Guangdong University of Petrochemical Technology School of Automation Maoming China

ISBN: (纸本)9781665406970

In this paper, a safe adaptive dynamic programming (SADP) method based on the barrier function (BF) is proposed for the optimal control problem of nonlinear safety-critical systems with the safety constraints and external disturbance. Firstly, the barrier function is used to transform the nonlinear system with the security constraints into a transformed system without the security constraints. Secondly, based on the transformed system, a new barrier-disturbance-related term is proposed to approximate the effect of the external disturbance. On the premise of satisfying the security constraints and stability, the neural network (NN) approximation method is used to approximate the optimal cost function and optimal control strategy of the system online. Finally, the simulation results show that the proposed method can make the system state convergence well and does not violate the security constraints. © 2021 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Identifying trajectory classes in dynamic tasks

Identifying trajectory classes in dynamic tasks

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Stuart O. Anderson Siddhartha S. Srinivasa Robotics Institute Carnegie Mellon University Pittsburgh PA USA Intel Research Pittsburgh Intel Corporation Pittsburgh PA USA

Using domain knowledge to decompose difficult control problems is a widely used technique in robotics. Previous work has automated the process of identifying some qualitative behaviors of a system, finding a decomposition of the system based on that behavior, and constructing a control policy based on that decomposition. We introduce a novel method for automatically finding decompositions of a task based on observing the behavior of a preexisting controller. Unlike previous work, these decompositions define reparameterizations of the state space that can permit simplified control of the system

关键词： State-space methods Automatic control Control systems Motion control dynamic programming Robotics and automation Convergence learning Humans Robot control

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共31页 << < 15 16 17 18 19 20 21 22 23 24 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：