检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

229 篇 会议
18 篇 期刊文献

馆藏范围

247 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

113 篇 工学
- 103 篇 计算机科学与技术...
- 42 篇 软件工程
- 38 篇 电气工程
- 23 篇 控制科学与工程
- 5 篇 信息与通信工程
- 3 篇 机械工程
- 2 篇 力学（可授工学、理...
- 1 篇 仪器科学与技术
- 1 篇 建筑学
- 1 篇 化学工程与技术
- 1 篇 交通运输工程
27 篇 理学
- 25 篇 数学
- 7 篇 系统科学
- 6 篇 统计学（可授理学、...
- 1 篇 物理学
- 1 篇 化学
- 1 篇 大气科学
10 篇 管理学
- 8 篇 管理科学与工程(可...
- 3 篇 工商管理
- 2 篇 图书情报与档案管...
2 篇 经济学
- 2 篇 应用经济学
1 篇 法学
- 1 篇 社会学

主题

95 篇 dynamic programm...
54 篇 optimal control
51 篇 learning
44 篇 reinforcement le...
35 篇 learning (artifi...
27 篇 equations
25 篇 neural networks
22 篇 heuristic algori...
20 篇 convergence
20 篇 control systems
18 篇 function approxi...
18 篇 mathematical mod...
16 篇 approximation al...
15 篇 vectors
15 篇 cost function
14 篇 markov processes
14 篇 nonlinear system...
14 篇 artificial neura...
13 篇 stochastic proce...
12 篇 adaptive dynamic...

机构

10 篇 chinese acad sci...
5 篇 school of inform...
4 篇 northeastern uni...
4 篇 department of el...
4 篇 department of in...
3 篇 department of el...
3 篇 automation and r...
3 篇 department of el...
3 篇 robotics institu...
3 篇 key laboratory o...
3 篇 natl univ def te...
3 篇 univ illinois de...
2 篇 department of ar...
2 篇 school of electr...
2 篇 univ groningen i...
2 篇 univ texas autom...
2 篇 colorado state u...
2 篇 guangxi univ sch...
2 篇 national science...
2 篇 informatics inst...

作者

13 篇 liu derong
7 篇 hado van hasselt
7 篇 marco a. wiering
7 篇 dongbin zhao
6 篇 zhao dongbin
5 篇 xu xin
5 篇 lewis frank l.
5 篇 huaguang zhang
5 篇 wei qinglai
5 篇 derong liu
5 篇 warren b. powell
4 篇 haibo he
4 篇 jagannathan s.
4 篇 frank l. lewis
4 篇 zhang huaguang
4 篇 ni zhen
4 篇 yanhong luo
4 篇 wang ding
4 篇 he haibo
4 篇 damien ernst

语言

246 篇 英文
1 篇 其他

检索条件"任意字段=2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2014"

共 247 条记录，以下是221-230 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Short-term Stock Market Timing Prediction under reinforcement learning Schemes

Short-term Stock Market Timing Prediction under Reinforcemen...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Hailin Li Cihan H. Dagli David Enke Department of Engineering Management and Systems Engineering University of Missouri Rolla Rolla MO USA

There are fundamental difficulties when only using a supervised learning philosophy to predict financial stock short-term movements. We present a reinforcement-oriented forecasting framework in which the solution is converted from a typical error-based learning approach to a goal-directed match-based learning method. The real market timing ability in forecasting is addressed as well as traditional goodness-of-fit-based criteria. We develop two applicable hybrid prediction systems by adopting actor-only and actor-critic reinforcement learning, respectively, and compare them to both a supervised-only model and a classical random walk benchmark in forecasting three daily-based stock indices series within a 21-year learning and testing period. The performance of actor-critic-based systems was demonstrated to be superior to that of other alternatives, while the proposed actor-only systems also showed efficacy

关键词： Stock markets Timing Economic forecasting dynamic programming Stochastic processes Predictive models Testing Supervised learning Artificial intelligence Research and development management

来源：评论

学校读者我要写书评

暂无评论

A Recurrent Control Neural Network for Data Efficient reinforcement learning

A Recurrent Control Neural Network for Data Efficient Reinfo...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Anton Maximilian Schaefer Steffen Udluft Hans-Georg Zimmermann Department of Optimisation and Operations Research University of Ulm (EBS) Germany Department of Learning Systems Information & Communications Siemens AG Munich Germany

In this paper we introduce a new model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Our architecture is based on a recurrent neural network (RNN) with dynamically consistent overshooting, which we extend by an additional control network. The latter has the particular task to learn the optimal policy. This approach has the advantage that by using a neural network we can easily deal with high-dimensions and consequently are able to break Bellman's curse of dimensionality. Further due to the high system-identification quality of RNN our method is highly data-efficient. Because of its properties we refer to our new model as recurrent control neural network (RCNN). The network is tested on a standard reinforcement learning problem, namely the cart-pole balancing, where it shows especially in terms of data-efficiency outstanding results

关键词： Neural networks Recurrent neural networks Communication system control Testing dynamic programming Operations research Telephony learning systems Communications technology Equations

来源：评论

学校读者我要写书评

暂无评论

A Scalable Model-Free Recurrent Neural Network Framework for Solving POMDPs

A Scalable Model-Free Recurrent Neural Network Framework for...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Zhenzhen Liu Itamar Elhanany Department of Electrical & Computer Engineering University of Tennessee Knoxville TN USA

This paper presents a framework for obtaining an optimal policy in model-free partially observable Markov decision problems (POMDPs) using a recurrent neural network (RNN), A Q-function approximation approach is taken, utilizing a novel RNN architecture with computation and storage requirements that are dramatically reduced when compared to existing schemes. A scalable online training algorithm, derived from the real-time recurrent learning (RTRL) algorithm, is employed. Moreover, stochastic meta-descent (SMD), an adaptive step size scheme for stochastic gradient-descent problems, is utilized as means of incorporating curvature information to accelerate the learning process. We consider case studies of POMDPs where state information is not directly available to the agent. Particularly, we investigate scenarios in which the agent receives identical observations for multiple states, thereby relying on temporal dependencies captured by the RNN to obtain the optimal policy, Simulation results illustrate the effectiveness of the approach along with substantial improvement in convergence rate when compared to existing schemes

关键词： Recurrent neural networks Neurons Stochastic processes Nonlinear dynamical systems Computational complexity dynamic programming learning Computer networks Computer architecture Acceleration

来源：评论

学校读者我要写书评

暂无评论

Opposition-Based reinforcement learning in the Management of Water Resources

Opposition-Based Reinforcement Learning in the Management of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： M. Mahootchi H. R. Tizhoosh K. Ponnambalam Systems Design Engineering University of Waterloo Waterloo ONT Canada

Opposition-based learning (OBL) is a new scheme in machine intelligence. In this paper, an OBL version Q-learning which exploits opposite quantities to accelerate the learning is used for management of single reservoir operations. In this method, an agent takes an action, receives reward, and updates its knowledge in terms of action-value functions. Furthermore, the transition function which is the balance equation in the optimization model determines the next state and updates the action-value function pertinent to opposite action. Two type of opposite actions will be defined. It will be demonstrated that using OBL can significantly improve the efficiency of the operating policy within limited iterations. It is also shown that this technique is more robust than Q-learning

关键词： Resource management Water resources Reservoirs dynamic programming Stochastic processes Machine learning Neural networks Design engineering Systems engineering and theory Machine intelligence

来源：评论

学校读者我要写书评

暂无评论

Coordinated reinforcement learning for Decentralized Optimal Control

Coordinated Reinforcement Learning for Decentralized Optimal...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Daniel Yagan Chen-Khong Tham Department of Electrical and Computer Engineering National University of Singapore Singapore

We consider a multi-agent system where the overall performance is affected by the joint actions or policies of agents. However, each agent only observes a partial view of the global state condition. This model is known as a decentralized partially-observable Markov decision process (DEC-POMDP), which can be considered more applicable in real-world applications such as communication networks. It is known that the exact solution to a DEC-POMDP is NEXP-complete and memory requirements grow exponentially even for finite-horizon problems. In this paper, we propose to address these issues by using an online model-free technique and by exploiting the locality of interaction among agents in order to approximate the joint optimal policy. Simulation results show the effectiveness and convergence of the proposed algorithm in the context of resource allocation for multiagent wireless multi-hop networks.

关键词： learning Optimal control Control systems Resource management Spread spectrum communication Context modeling Multiagent systems Communication networks Stochastic processes dynamic programming

来源：评论

学校读者我要写书评

暂无评论

DHP adaptive Critic Motion Control of Autonomous Wheeled Mobile Robot

DHP Adaptive Critic Motion Control of Autonomous Wheeled Mob...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Wei-Song Lin Ping-Chieh Yang Department and Institute of Electrical Engineering National Taiwan University Taipei Taiwan

Autonomous drive of wheeled mobile robot (WMR) needs implementing velocity and path tracking control subject to complex dynamical constraints. Conventionally, this control design is obtained by analysis and synthesis of the WMR system. This paper presents the dual heuristic programming (DHP) adaptive critic design of the motion control system that enables WMR to achieve the control purpose simply by learning through trial. The design consists of an adaptive critic velocity neuro-control loop and a posture neuro-control loop. The neural weights in the velocity neuro-controller (VNC) are corrected with the DHP adaptive critic method. The designer simply expresses the control objective with a utility function. The VNC learns by sequential optimization to satisfy the control objective. The posture neuro-controller (PNC) approximates the inverse velocity model of WMR so as to map planned positions to desired velocities. Supervised drive of WMR in variant velocities supplies training samples for the PNC and VNC to setup the neural weights. In autonomous drive, the learning mechanism keeps improving the PNC and VNC. The design is evaluated on an experimental WMR. The excellent results make it certain that the DHP adaptive critic motion control design enables WMR to develop the control ability autonomously.

关键词： Programmable control adaptive control Motion control Mobile robots Velocity control Control design Control system synthesis Robot programming Control systems Design methodology

来源：评论

学校读者我要写书评

暂无评论

Dual Representations for dynamic programming and reinforcement learning

Dual Representations for Dynamic Programming and Reinforceme...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Tao Wang Michael Bowling Dale Schuurmans Department of Computing Science University of Alberta Edmonton Canada

We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distributions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems

关键词： dynamic programming learning Approximation algorithms Probability distribution Linear approximation Decision making Distributed computing Heuristic algorithms Linear programming Yield estimation

来源：评论

学校读者我要写书评

暂无评论

Continuous-Time ADP for Linear Systems with Partially Unknown dynamics

Continuous-Time ADP for Linear Systems with Partially Unknow...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Draguna Vrabie Murad Abu-Khalaf Frank L. Lewis Youyi Wang Automation and Robotics Research Institute University of Texas Arlington Fort Worth TX USA School of Electrical and Electronic Engineering Nanyang Technological University Singapore

Approximate dynamic programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model knowledge requirements. In this paper is presented a novel online adaptive critic (AC) scheme, based on approximate dynamic programming (ADP), to solve the infinite horizon optimal control problem for continuous-time dynamical systems; thus bringing together concepts from the fields of computational intelligence and control theory. Only partial knowledge about the system model is used, as knowledge about the plant internal dynamics is not needed. The method is thus useful to determine the optimal controller for plants with partially unknown dynamics. It is shown that the proposed iterative ADP algorithm is in fact a quasi-Newton method to solve the underlying algebraic Riccati equation (ARE) of the optimal control problem. An initial gain that determines a stabilizing control policy is not required. In control theory terms, in this paper is developed a direct adaptive control algorithm for obtaining the optimal control solution without knowing the system A matrix

关键词： Linear systems Optimal control dynamic programming adaptive control Control theory Iterative algorithms Riccati equations Sampling methods Programmable control Infinite horizon

来源：评论

学校读者我要写书评

暂无评论

An Approximate dynamic programming Approach for Job Releasing and Sequencing in a Reentrant Manufacturing Line

An Approximate Dynamic Programming Approach for Job Releasin...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Jose A. Ramirez-Hernandez Emmanuel Fernandez Department of Electrical & Computer Engineering University of Cincinnati OH USA

This paper presents the application of an approximate dynamic programming (ADP) algorithm to the problem of job releasing and sequencing of a benchmark reentrant manufacturing line (RML). The ADP approach is based on the SARSA(lambda) algorithm with linear approximation structures that are tuned through a gradient-descent approach. The optimization is performed according to a discounted cost criterion that seeks both the minimization of inventory costs and the maximization of throughput. Simulation experiments are performed by using different approximation architectures to compare the performance of optimal strategies against policies obtained with ADP. Results from these experiments showed a statistical match in performance between the optimal and the approximated policies obtained through ADP. Such results also suggest that the applicability of the ADP algorithm presented in this paper may be a promising approach for larger RML systems

关键词： dynamic programming Control systems Workstations Pulp manufacturing Cost function Optimal control Manufacturing industries Fabrication Semiconductor devices Manufacturing processes

来源：评论

学校读者我要写书评

暂无评论

Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes

Computing Optimal Stationary Policies for Multi-Objective Ma...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Marco A. Wiering Edwin D. de Jong Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

This paper describes a novel algorithm called CON-MODP for computing Pareto optimal policies for deterministic multi-objective sequential decision problems. CON-MODP is a value iteration based multi-objective dynamic programming algorithm that only computes stationary policies. We observe that for guaranteeing convergence to the unique Pareto optimal set of deterministic stationary policies, the algorithm needs to perform a policy evaluation step on particular policies that are inconsistent in a single state that is being expanded. We prove that the algorithm converges to the Pareto optimal set of value functions and policies for deterministic infinite horizon discounted multi-objective Markov decision processes. Experiments show that CON-MODP is much faster than previous multi-objective value iteration algorithms.

关键词： dynamic programming learning Distributed computing Heuristic algorithms Convergence Infinite horizon Intelligent systems Deductive databases Distributed databases Electronic mail

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共25页 << < 16 17 18 19 20 21 22 23 24 25 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：