检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

229 篇 会议
18 篇 期刊文献

馆藏范围

247 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

113 篇 工学
- 103 篇 计算机科学与技术...
- 42 篇 软件工程
- 38 篇 电气工程
- 23 篇 控制科学与工程
- 5 篇 信息与通信工程
- 3 篇 机械工程
- 2 篇 力学（可授工学、理...
- 1 篇 仪器科学与技术
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
27 篇 理学
- 25 篇 数学
- 7 篇 系统科学
- 6 篇 统计学（可授理学、...
- 1 篇 物理学
- 1 篇 化学
- 1 篇 大气科学
10 篇 管理学
- 8 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
54 篇 optimal control
51 篇 learning
44 篇 reinforcement le...
35 篇 learning (artifi...
27 篇 equations
25 篇 neural networks
22 篇 heuristic algori...
20 篇 convergence
20 篇 control systems
18 篇 function approxi...
18 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
15 篇 cost function
14 篇 markov processes
14 篇 nonlinear system...
14 篇 artificial neura...
13 篇 stochastic proce...
12 篇 adaptive dynamic...

机构

10 篇 chinese acad sci...
5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 department of el...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 natl univ def te...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...

作者

13 篇 liu derong
7 篇 hado van hasselt
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 zhao dongbin
5 篇 xu xin
5 篇 lewis frank l.
5 篇 huaguang zhang
5 篇 wei qinglai
5 篇 derong liu
5 篇 warren b. powell
4 篇 haibo he
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 zhang huaguang
4 篇 ni zhen
4 篇 yanhong luo
4 篇 wang ding
4 篇 he haibo
4 篇 damien ernst

语言

246 篇 英文
1 篇 其他

检索条件"任意字段=2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014"

共 247 条记录，以下是21-30 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Tunable and Generic Problem Instance Generation for Multi-objective reinforcement learning

Tunable and Generic Problem Instance Generation for Multi-ob...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Garrett, Deon Bieger, Jordi Throisson, Kristinn R. Reykjavik Univ Iceland Inst Intelligent Machines Reykjavik Iceland Reykjavik Univ Reykjavik Iceland

ISBN: (纸本)9781479945528

A significant problem facing researchers in reinforcement learning, and particularly in multi-objective learning, is the dearth of good benchmarks. In this paper, we present a method and software tool enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. This tool, called Merlin (for Multi-objective Environments for reinforcement learning), provides the ability to control these features in predictable ways, thus allowing researchers to begin to build a more detailed understanding about what features of a problem interact with a given learning algorithm to improve or degrade the algorithm's performance. We present this method and tool, and briefly discuss the controls provided by the generator, its supported options, and their implications on the generated benchmark instances.

关键词： learning (artificial intelligence) software tools Merlin learning algorithm multiobjective environments for reinforcement learning multiobjective learning problem multiobjective reinforcement learning problem facing researcher random problem instance software tool structural property Benchmark testing Correlation Covariance matrices Generators Heuristic algorithms learning (artificial intelligence) Optimization Software Tools Neurofibromin 2 Benchmark testing Structural properties dynamos learning algorithms learning (artificial intelligence) Heuristic algorithms variance covariance matrix Covariance matrix

来源：评论

学校读者我要写书评

暂无评论

Using Approximate dynamic programming for Estimating the Revenues of a Hydrogen-based High-Capacity Storage Device

Using Approximate Dynamic Programming for Estimating the Rev...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Francois-Lavet, Vincent Fonteneau, Raphael Ernst, Damien Univ Liege Dept Elect Engn & Comp Sci B-4000 Liege Belgium

ISBN: (纸本)9781479945528

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the dynamic programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be advisable to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity.

关键词： dynamic programming electrolysis fuel cells hydrogen storage power markets Belgian market day-ahead electricity market dynamic programming principle high-capacity storage device hydrogen-based storage devices interseasonal price fluctuations maximum revenue estimation optimal revenue dynamic programming Electricity Electrochemical processes Fuel cells Hydrogen Hydrogen storage

来源：评论

学校读者我要写书评

暂无评论

Cognitive Control in Cognitive dynamic Systems: A New Way of Thinking Inspired by The Brain

Cognitive Control in Cognitive Dynamic Systems: A New Way of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Haykin, Simon Amiri, Ashkan Fatemi, Mehdi McMaster Univ Cognit Syst Lab Hamilton ON L8S 4K1 Canada

ISBN: (纸本)9781479945528

Briefly, main purpose of the paper is fourfold: a) Cognitive perception, which consists of two functional blocks: improved sparse-coding under the influence of perceptual attention for extracting relevant information from the observables and ignoring irrelevant information, followed by a Bayesian algorithm for state estimation. b) Entropic state of the perceptor, which provides feedback information to the controller. c) Cognitive control, which also consists of two functional blocks: executive learning algorithm computed by processing the entropic state, followed by predictive planning to set the stage for policy to act on the environment, thereby establishing the global perception-action cycle. d) Experimental results for exploiting the perceptual as well as executive attention in a co-operative manner, which is aimed at the first demonstration of risk control in the presence of a severe disturbance in the environment.

关键词： Cognition Cognitive dynamic Systems Cognitive perception Cognitive Control Perceptual attention Executive attention Predictive planning Pre-adaptation

来源：评论

学校读者我要写书评

暂无评论

Heuristics for Multiagent reinforcement learning in Decentralized Decision Problems

Heuristics for Multiagent Reinforcement Learning in Decentra...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Allen, Martin W. Hahn, David MacFarland, Douglas C. Univ Wisconsin Dept Comp Sci La Crosse WI 54601 USA

ISBN: (纸本)9781479945528

Decentralized partially observable Markov decision processes (Dec-POMDPs) model cooperative multiagent scenarios, providing a powerful general framework for team-based artificial intelligence. While optimal algorithms exist for Dec-POMDPs, theoretical and empirical results demonstrate that they are impractical for many problems of real interest. We examine the use of reinforcement learning (RL) as a means to generate adequate, if not optimal, joint policies for Dec-POMDPs. It is easily demonstrated (and expected) that single-agent RL produces results of little joint utility. We therefore investigate heuristic methods, based upon the dynamics of the Dec-POMDP formulation, that bias the learning process to produce coordinated action. Empirical tests on a benchmark problem show that these heuristics significantly enhance learning performance, even out-performing a hand-crafted heuristic in cases where the learning process converges quickly.

关键词： Markov processes learning (artificial intelligence) multi-agent systems Dec-POMDP model cooperative multiagent decentralized decision problem heuristic method multiagent reinforcement learning partially observable Markov decision process team-based artificial intelligence Benchmark testing Complexity theory Equations Heuristic algorithms Joints learning (artificial intelligence) Markov chain Benchmark testing Complexity theory Heuristic algorithms Heuristics Joints Joints Multi-agent systems learning (artificial intelligence) heuristic methods learning processes Decentralized

来源：评论

学校读者我要写书评

暂无评论

Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning

Using supervised training signals of observable state dynami...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Elliott, Daniel L. Anderson, Charles Colorado State Univ Dept Comp Sci Ft Collins CO 80523 USA

ISBN: (纸本)9781479945528

A common complaint about reinforcement learning (RL) is that it is too slow to learn a value function which gives good performance. This issue is exacerbated in continuous state spaces. This paper presents a straight-forward approach to speeding-up and even improving RL solutions by reusing features learned during a pre-training phase prior to Q-learning. During pre-training, the agent is taught to predict state change given a state/action pair. The effect of pre-training is examined using the model-free Q-learning approach but could readily be applied to a number of RL approaches including model-based RL. The analysis of the results provides ample evidence that the features learned during pre-training is the reason behind the improved RL performance.

关键词： learning (artificial intelligence) neural nets state-space methods RL performance improvement RL solution improvement continuous state spaces feature reuse model-based RL approach model-free Q-learning approach observable state dynamics pretraining phase reinforcement learning state change prediction state-action pair supervised training signals value function learning Artificial neural networks Computational modeling Data models Heuristic algorithms learning (artificial intelligence) Supervised learning Training learning (artificial intelligence) State-space methods data models Neural network Semi-supervised learning Heuristic algorithms Computational modeling Artificial neural networks Functional training learning State Change Training

来源：评论

学校读者我要写书评

暂无评论

A Comparison of Approximate dynamic programming Techniques on Benchmark Energy Storage Problems: Does Anything Work?

A Comparison of Approximate Dynamic Programming Techniques o...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Jiang, Daniel R. Pham, Thuy V. Powell, Warren B. Salas, Daniel F. Scott, Warren R.

ISBN: (纸本)9781479945528

As more renewable, yet volatile, forms of energy like solar and wind are being incorporated into the grid, the problem of finding optimal control policies for energy storage is becoming increasingly important. These sequential decision problems are often modeled as stochastic dynamic programs, but when the state space becomes large, traditional (exact) techniques such as backward induction, policy iteration, or value iteration quickly become computationally intractable. Approximate dynamic programming (ADP) thus becomes a natural solution technique for solving these problems to near-optimality using significantly fewer computational resources. In this paper, we compare the performance of the following: various approximation architectures with approximate policy iteration (API), approximate value iteration (AVI) with structured lookup table, and direct policy search on a benchmarked energy storage problem (i.e., the optimal solution is computable).

关键词： dynamic programming energy storage power engineering computing power system management renewable energy sources table lookup ADP API AVI approximate dynamic programming approximate policy iteration approximate value iteration backward induction dynamic programming techniques energy storage control policy lookup table natural solution technique solar energy stochastic dynamic programs wind energy Approximation algorithms Benchmark testing Energy storage Equations Function approximation Mathematical model Table lookup dynamic programming energy storage Power system management AVI Benchmark testing Power engineering computing function approximation Approximation algorithms Adenosine Diphosphate Automatic data processing Renewable energy renewable energy sources Wind energy Solar Energy

来源：评论

学校读者我要写书评

暂无评论

Event-based Optimal Regulator Design for Nonlinear Networked Control Systems

Event-based Optimal Regulator Design for Nonlinear Networked...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Sahoo, Avimanyu Xu, Hao Jagannathan, S. Missouri Univ Sc & Tech Dept Elect & Comp Engn Rolla MO 65409 USA Texas A&M Univ Coll Sci & Engn Dept Elect Engn Corpus Christi TX USA

ISBN: (纸本)9781479945528

This paper presents a novel stochastic event-based near optimal control strategy to regulate a networked control system (NCS) represented as an uncertain nonlinear continuous time system. An online stochastic actor-critic neural network (NN) based approach is utilized to achieve the near optimal regulation in the presence of network constraints, such as, network induced time-varying delays and random packet losses under event-based transmission of the feedback signals. The transformed nonlinear NCS in discrete-time after the incorporation the delays and packet losses is utilized for the actor-critic NN based controller design. To relax the knowledge of the control coefficient matrix, a NN based identifier is used. Event sampled state vector is utilized as NN inputs and their respective weights are updated non-periodically at the occurrence of events. Further, an event-trigger condition is designed by using the Lyapunov technique to ensure ultimate boundedness of all the closed-loop signals and save network resources and computation. Moreover, policy and value iterations are not utilized for the stochastic optimal regulator design. Finally, the analytical design is verified by using a numerical example by carrying out Monte-Carlo simulations.

关键词： Event-triggered control optimal control adaptive dynamic programming neural networks networked control systems

来源：评论

学校读者我要写书评

暂无评论

Convergence of Value Iterations for Total-Cost MDPs and POMDPs with General State and Action Sets

Convergence of Value Iterations for Total-Cost MDPs and POMD...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Feinberg, Eugene A. Kasyanov, Pavlo O. Zgurovsky, Michael Z. SUNY Stony Brook Dept Appl Math & Stat Stony Brook NY 11794 USA Natl Tech Univ Ukraine Kyiv Polytech Inst Inst Appl Syst Anal UA-03056 Kiev Ukraine Natl Tech Univ Ukraine Kyiv Polytech Inst UA-03056 Kiev Ukraine

ISBN: (纸本)9781479945528

This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.

关键词： Markov processes convergence of numerical methods decision making dynamic programming iterative methods Borel state COMDPs Markov decision processes POMDPs action sets completely observable MDPs dynamic programming algorithm general state infinite state partially observable MDPs sufficient condition total-cost MDPs unbounded one-step cost functions value iterations convergence Convergence Cost function Equations Extraterrestrial measurements Kernel Markov chain dynamic programming algorithm convergence of numerical methods Extraterrestrial measurements iterative methods Converge Cost functions dynamic programming SETTING Sufficient conditions Kernel

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-based Optimal Control Considering L Computation Time Delay of Linear Discrete-time Systems

Reinforcement Learning-based Optimal Control Considering <i>...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Fujita, Taishi Ushio, Toshimitsu

ISBN: (纸本)9781479945528

In embedded control systems, the control input is computed based on sensing data of a plant in a processor and there is a delay, called the computation time delay, due to the computation and the data transmission. When we design an optimal controller, we need to take the delay into account to achieve its optimality. Moreover, in the case where it is difficult to identify a mathematical model of the plant, a model free approach is useful. Especially, the reinforcement learning-based approach has been much attention to in the design of an adaptive optimal controller. In this paper, we assume that the plant is a linear system but the parameters of the plant are unknown. Then, we apply the reinforcement learning to the design of an adaptive optimal digital controller with taking the computation time delay into consideration. First, we consider the case where all states of the plant are observed, and it takes L times to update the control input. An optimal feedback gain is learned from sequences of a pair of the state and the control input. Next, we consider the case where the control input is determined from outputs of the plant. We cannot use an observer to estimate the state of the plant since the parameters of the plant are unknown. So, we use a data-based control approach for the estimation. Finally, we apply the proposed adaptive optimal controller to attitude control of a quadrotor at the hovering state and show its efficiency by simulation.

关键词： adaptive control control engineering computing control system synthesis data communication delays discrete time systems embedded systems feedback learning (artificial intelligence) linear systems optimal control parameter estimation state estimation L-computation time delay adaptive optimal digital controller attitude control data transmission data-based control approach embedded control systems linear discrete-time systems linear system mathematical model model free approach optimal feedback gain reinforcement learning Adaptation models Delay effects Optimal control Output feedback Propellers State feedback discrete time systems Linear system Optimal control Parameter estimation learning (artificial intelligence) attitude control data transmission control engineering computing PROPELLER Delay effects control input control system synthesis data communication State feedback plants

来源：评论

学校读者我要写书评

暂无评论

Beyond Exponential Utility Functions: A Variance-Adjusted Approach for Risk-Averse reinforcement learning

Beyond Exponential Utility Functions: A Variance-Adjusted Ap...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Gosavi, Abhijit A. Das, Sajal K. Murray, Susan L. Missouri Univ Sci & Technol Dept Engn Management & Syst Engn Rolla MO 65409 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

ISBN: (纸本)9781479945528

Utility theory has served as a bedrock for modeling risk in economics. Where risk is involved in decision-making, for solving Markov decision processes (MDPs) via utility theory, the exponential utility (EU) function has been used in the literature as an objective function for capturing risk-averse behavior. The EU function framework uses a so-called risk-averseness coefficient (RAC) that seeks to quantify the risk appetite of the decision-maker. Unfortunately, as we show in this paper, the EU framework suffers from computational deficiencies that prevent it from being useful in practice for solution methods based on reinforcement learning (RL). In particular, the value function becomes very large and typically the computer overflows. We provide a simple example to demonstrate this. Further, we show empirically how a variance-adjusted (VA) approach, which approximates the EU function objective for reasonable values of the RAC, can be used in the RL algorithm. The VA framework in a sense has two objectives: maximize expected returns and minimize variance. We conduct empirical studies on a VA-based RL algorithm on the semi-MDP (SMDP), which is a more general version of the MDP. We conclude with a mathematical proof of the boundedness of the iterates in our algorithm.

关键词： Markov processes decision making economics learning (artificial intelligence) mathematical analysis risk analysis utility theory EU function MDP Markov decision process RAC VA approach exponential utility functions mathematical proof risk-averse reinforcement learning risk-averseness coefficient variance-adjusted approach Computers Equations learning (artificial intelligence) Linear programming Mathematical model Measurement Markov chain utility theory formal proof economics AKT1 gene Computers decision making mathematical analysis linear programming Risk Management risk analysis learning (artificial intelligence) Mathematical Model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共25页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：