检索结果-内蒙古大学图书馆

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Hailin Li Cihan H. Dagli David Enke Department of Engineering Management and Systems Engineering University of Missouri Rolla Rolla MO USA

There are fundamental difficulties when only using a supervised learning philosophy to predict financial stock short-term movements. We present a reinforcement-oriented forecasting framework in which the solution is converted from a typical error-based learning approach to a goal-directed match-based learning method. The real market timing ability in forecasting is addressed as well as traditional goodness-of-fit-based criteria. We develop two applicable hybrid prediction systems by adopting actor-only and actor-critic reinforcement learning, respectively, and compare them to both a supervised-only model and a classical random walk benchmark in forecasting three daily-based stock indices series within a 21-year learning and testing period. The performance of actor-critic-based systems was demonstrated to be superior to that of other alternatives, while the proposed actor-only systems also showed efficacy

关键词： Stock markets Timing Economic forecasting dynamic programming Stochastic processes Predictive models Testing Supervised learning Artificial intelligence Research and development management

来源：评论

学校读者我要写书评

暂无评论

Dual Representations for dynamic programming and reinforcement learning

Dual Representations for Dynamic Programming and Reinforceme...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Tao Wang Michael Bowling Dale Schuurmans Department of Computing Science University of Alberta Edmonton Canada

We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distributions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems

关键词： dynamic programming learning Approximation algorithms Probability distribution Linear approximation Decision making Distributed computing Heuristic algorithms Linear programming Yield estimation

来源：评论

学校读者我要写书评

暂无评论

A dynamic programming Approach to Viability Problems

A Dynamic Programming Approach to Viability Problems

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Pierre-Arnaud Coquelin Sophie Martin Remi Munos Centre de Mathématiques Appliquées Ecole Polytechnique Palaiseau France Laboratoire dIngénierie pour les Systémes Complexes Cemagref de Clermont-Ferrand Aubiere France INRIA Futurs Universite de Lille 3 France

Viability theory considers the problem of maintaining a system under a set of viability constraints. The main tool for solving viability problems lies in the construction of the viability kernel, defined as the set of initial states from which there exists a trajectory that remains in the set of constraints indefinitely. The theory is very elegant and appears naturally in many applications. Unfortunately, the current numerical approaches suffer from low computational efficiency, which limits the potential range of applications of this domain. In this paper we show that the viability kernel is the zero-level set of a related dynamic programming problem, which opens promising research directions for numerical approximation of the viability kernel using tools from approximate dynamic programming. We illustrate the approach using k-nearest neighbors on a toy problem in two dimensions and on a complex dynamical model for anaerobic digestion process in four dimensions

关键词： dynamic programming Kernel Control systems Evolution (biology) Constraint theory Grid computing learning Computational efficiency Time factors Uncertain systems

来源：评论

学校读者我要写书评

暂无评论

A Recurrent Control Neural Network for Data Efficient reinforcement learning

A Recurrent Control Neural Network for Data Efficient Reinfo...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Anton Maximilian Schaefer Steffen Udluft Hans-Georg Zimmermann Department of Optimisation and Operations Research University of Ulm (EBS) Germany Department of Learning Systems Information & Communications Siemens AG Munich Germany

In this paper we introduce a new model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Our architecture is based on a recurrent neural network (RNN) with dynamically consistent overshooting, which we extend by an additional control network. The latter has the particular task to learn the optimal policy. This approach has the advantage that by using a neural network we can easily deal with high-dimensions and consequently are able to break Bellman's curse of dimensionality. Further due to the high system-identification quality of RNN our method is highly data-efficient. Because of its properties we refer to our new model as recurrent control neural network (RCNN). The network is tested on a standard reinforcement learning problem, namely the cart-pole balancing, where it shows especially in terms of data-efficiency outstanding results

关键词： Neural networks Recurrent neural networks Communication system control Testing dynamic programming Operations research Telephony learning systems Communications technology Equations

来源：评论

学校读者我要写书评

暂无评论

Leader-Follower semi-Markov Decision Problems: Theoretical Framework and approximate Solution

Leader-Follower semi-Markov Decision Problems: Theoretical F...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kurian Tharakunnel Siddhartha Bhattacharyya Department of Information and Decision Sciences University of Illinois Chicago Chicago IL USA

Leader-follower problems are hierarchical decision problems in which a leader uses incentives to induce certain desired behavior among a set of self-interested followers. dynamic leader-follower problems extend this structure to multi-period decision situations. In this work we propose a Markov decision process (MDP) framework for a class of dynamic leader-follower problems that have important applications and discuss their approximate solution using reinforcement learning (RL). In these problems, the leader makes incentive decisions intermittently while the followers make their decisions in every period. Our theoretical framework and computational approach are based on the observation that such dynamic problems can be thought of as consisting of two coupled sequential decision processes, that of the leader and of the followers. In our formulation, the leader's decision problem that has the structure of a single-agent semi-Markov decision process (SMDP), and the followers' sequential decision problem structured as a stochastic game (multiagent competitive MDP) operate over the same state space. We call this MDP framework a leader-follower semi-Markov decision process (LFSMDP). We consider approximate solution of these problems using RL and demonstrate the solution approach in the special case where the followers' stochastic game is a repeated game.

关键词： Game theory learning Stochastic processes Pricing dynamic programming State-space methods Decision making Communication networks Electricity supply industry Peer to peer computing

来源：评论

学校读者我要写书评

暂无评论

Robust dynamic programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Robust Dynamic Programming for Discounted Infinite-Horizon M...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Baohua Li Jennie Si Department of Electrical Engineering Arizona State University Tempe AZ USA

In this paper, finite-state, finite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space

关键词： Robustness dynamic programming Space stations learning Telephony Cost function Estimation error Design methodology Approximation methods Equations

来源：评论

学校读者我要写书评

暂无评论

Continuous-time adaptive critics

引用

ieee TRANSACTIONS ON NEURAL NETWORKS 2007年第3期18卷 631-647页

作者： Hanselmann, Thomas Noakes, Lyle Zaknich, Anthony Univ Melbourne Dept Elect & Elect Engn Parkville Vic 3010 Australia Univ Western Australia Sch Math & Stat Crawley WA 6009 Australia Murdoch Univ Sch Engn Sci Perth WA 6150 Australia

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume.

关键词： actor-critic adaptation adaptive critic design (ACD) approximate dynamic programming backpropagation through time (BPTT) continuous adaptive critic designs real-time recurrent learning (RTRL) reinforcement learning second-order actor adaptation

来源：评论

学校读者我要写书评

暂无评论

Online reinforcement learning Neural Network Controller Design for Nanomanipulation

Online Reinforcement Learning Neural Network Controller Desi...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Qinmin Yang S. Jagannathan Department of Electrical & Computer Engineering University of Missouri Rolla MO USA

In this paper, a novel reinforcement learning neural network (NN)-based controller, referred to adaptive critic controller, is proposed for affine nonlinear discrete-time systems with applications to nanomanipulation. In the online NN reinforcement learning method, one NN is designated as the critic NN, which approximates the long-term cost function by assuming that the states of the nonlinear systems is available for measurement. An action NN is employed to derive an optimal control signal to track a desired system trajectory while minimizing the cost function. Online updating weight tuning schemes for these two NNs are also derived. By using the Lyapunov approach, the uniformly ultimate boundedness (UUB) of the tracking error and weight estimates is shown. Nanomanipulation implies manipulating objects with nanometer size. It takes several hours to perform a simple task in the nanoscale world. To accomplish the task automatically the proposed online learning control design is evaluated for the task of nanomanipulation and verified in the simulation environment

关键词： learning Neural networks Nonlinear control systems Control systems Cost function Programmable control Adaptive control Nonlinear systems Optimal control Trajectory

来源：评论

学校读者我要写书评

暂无评论

A Scalable Model-Free Recurrent Neural Network Framework for Solving POMDPs

A Scalable Model-Free Recurrent Neural Network Framework for...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Zhenzhen Liu Itamar Elhanany Department of Electrical & Computer Engineering University of Tennessee Knoxville TN USA

This paper presents a framework for obtaining an optimal policy in model-free partially observable Markov decision problems (POMDPs) using a recurrent neural network (RNN), A Q-function approximation approach is taken, utilizing a novel RNN architecture with computation and storage requirements that are dramatically reduced when compared to existing schemes. A scalable online training algorithm, derived from the real-time recurrent learning (RTRL) algorithm, is employed. Moreover, stochastic meta-descent (SMD), an adaptive step size scheme for stochastic gradient-descent problems, is utilized as means of incorporating curvature information to accelerate the learning process. We consider case studies of POMDPs where state information is not directly available to the agent. Particularly, we investigate scenarios in which the agent receives identical observations for multiple states, thereby relying on temporal dependencies captured by the RNN to obtain the optimal policy, Simulation results illustrate the effectiveness of the approach along with substantial improvement in convergence rate when compared to existing schemes

关键词： Recurrent neural networks Neurons Stochastic processes Nonlinear dynamical systems Computational complexity dynamic programming learning Computer networks Computer architecture Acceleration

来源：评论

学校读者我要写书评

暂无评论

Opposition-Based Q(λ) with Non-Markovian Update

Opposition-Based Q(λ) with Non-Markovian Update

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (adprl)

作者： Maryam Shokri Hamid R. Tizhoosh Mohamed S. Kamel Pattern Analysis and Machine Intelligence Laboratory Department of Systems Design Engineering University of Waterloo ONT Canada Department of Electrical and Computer Engineering University of Waterloo ONT Canada

The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins' Q(λ) reflected a remarkable increase in performance for the OQ(λ) algorithm. However the Markovian update of opposition traces is an issue which is investigated in this paper. It has been assumed that the opposite state can be presented to the agent. This may limit the usability of the technique to deterministic environments. In order to relax this assumption the non-Markovian opposition-based Q(λ) (NOQ(λ)) is introduced in this work. The new method is a hybrid of Markovian update for eligibility traces and non-Markovian-based update for opposition traces. The experimental results show improvements of learning speed for the proposed technique compared to Q(λ) and OQ(λ). The new technique performs faster than OQ(λ) algorithm with the same success rate and can be employed for broader range of applications since it does not require determining state transition

关键词： learning Usability dynamic programming Pattern analysis Machine intelligence Laboratories System analysis and design Design engineering Systems engineering and theory State-space methods

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：