检索结果-内蒙古大学图书馆

ieee International symposium on Intelligent Signal Processing

作者： Gomez, M. Martinez, T. Sanchez, S. Meziat, D. Univ Alcala Escuela Politecn Super Dept Automat Alcala De Henares Spain Univ Alicante Escuela Politecn Super Ingn Sistemas Teoria Sefial Dept Fis Alicante Spain

ISBN: (纸本)9781424408290

The goal of the work described in this paper is to develop a particular optimal control technique based on a Cell. Mapping technique in combination with the Q-learning reinforcement learning method to control wheeled mobile vehicles. This approach manages 4 state variables due to a dynamic model is performed instead of a kinematics model which can be done with less variables. This new solution can be applied to non-linear continuous systems where reinforcement learning methods have multiple constraints. Emphasis is given to the new combination of techniques, which applied to optimal control problems produce satisfactory results. The proposed algorithm is very robust to any change involved In the vehicle parameters because the vehicle model is estimated in real time from received experience.

关键词： Cell-Mapping dynamic programming optimal control principle of optimality Q-learning reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Short-term stock market timing prediction under reinforcement learning schemes

Short-term stock market timing prediction under reinforcemen...

引用

2007 ieee symposium on Approximate dynamic programming and reinforcement learning, ADPRL 2007

作者： Hailin, Li Dagli, Cihan H. Enke, David Department of Engineering Management and Systems Engineering University of Missouri-Rolla Rolla MO 65409-0370 United States

ISBN: (纸本)1424407060

There are fundamental difficulties when only using a supervised learning philosophy to predict financial stock short-term movements. We present a reinforcement-oriented forecasting framework in which the solution is converted from a typical error-based learning approach to a goal-directed matchbased learning method. The real market timing ability in forecasting is addressed as well as traditional goodness-of-fit-based criteria. We develop two applicable hybrid prediction systems by adopting actor-only and actor-critic reinforcement learning, respectively, and compare them to both a supervised-only model and a classical random walk benchmark in forecasting three daily-based stock indices series within a 21-year learning and testing period. The performance of actor-critic-based systems was demonstrated to be superior to that of other alternatives, while the proposed actor-only systems also showed efficacy. © 2007 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement-learning-based magneto-hydrodynamic control of hypersonic flows

Reinforcement-learning-based magneto-hydrodynamic control of...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Kulkarni, Nilesh V. Phan, Minh Q. NASA Ames Res Ctr QSS Grp Inc Moffett Field CA 94035 USA Dartmouth Coll Thayer Sch Engn Hanover NH 03755 USA

ISBN: (纸本)9781424407064

In this work, we design a policy-iteration-based Q-learning approach for on-line optimal control of ionized hypersonic flow at the inlet of a scramjet engine. Magneto-hydrodynamics (MHD) has been recently proposed as a means for flow control in various aerospace problems. This mechanism corresponds to applying external magnetic fields to ionized flows towards achieving desired flow behavior. The applications range from external flow control for producing forces and moments on the air-vehicle to internal flow control designs, which compress and extract electrical energy from the flow. The current work looks at the later problem of internal flow control. The baseline controller and Q-function parameterizations are derived from an off-line mixed predictive-control and dynamic-programming-based design. The nominal optimal neural network Q-function and controller are updated on-line to handle modeling errors in the off-line design. The on-line implementation investigates key concerns regarding the conservativeness of the update methods. Value-iteration-based update methods have been shown to converge in a probabilistic sense. However, simulations results illustrate that realistic implementations of these methods face significant training difficulties, often failing in learning the optimal controller on-line. The present approach, therefore, uses a policyiteration-based update, which has time-based convergence guarantees. Given the special finite-horizon nature of the problem, three novel on-line update algorithms are proposed. These algorithms incorporate different mix of concepts, which include bootstrapping, and forward and backward dynamic programming update rules. Simulation results illustrate success of the proposed update algorithms in re-optimizing the performance of the MHD generator during system operation.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Kernel-based least squares policy iteration for reinforcement learning

引用

ieee TRANSACTIONS ON NEURAL NETWORKS 2007年第4期18卷 973-992页

作者： Xu, Xin Hu, Dewen Lu, Xicheng Natl Univ Def Technol Coll Mechatron & Automat Inst Automat Changsha 410073 Peoples R China Natl Univ Def Technol Coll Mechatron & Automat Dept Automat Control Changsha 410073 Peoples R China Natl Univ Def Technol Sch Comp Changsha 410073 Peoples R China

In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating a

关键词： approximate dynamic programming kernel methods least squares Markov decision problems (MDPs) reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

2007 ieee ADPRL International Program Committee Members

2007 IEEE ADPRL International Program Committee Members

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

Provides a listing of current committee members.

关键词：

来源：评论

学校读者我要写书评

暂无评论

programming and reinforcement learning

Programming and Reinforcement Learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

Welcome to ADPRL 2007 - the very first ieee International symposium on Approximate dynamic programming and reinforcement learning. The area of approximate dynamic programming and reinforcement learning is a fusion of a number of research areas in engineering, mathematics, artificial intelligence, operations research, and systems and control theory. You will enjoy an extraordinary technical program thanks to the ADPRL 2007 International Program Committee members who worked very hard to have all papers reviewed before the review deadline. We received a total of 65 submissions from various parts of the world. The final technical program consists of 49 papers among which 40 are oral session papers and 9 are poster session papers. There will be a keynote lecture delivered by Frank L. Lewis entitled “adaptive dynamic programming for Robust Optimal Control Using Nonlinear Network learning Structures.”

关键词：

来源：评论

学校读者我要写书评

暂无评论

Particle Swarn Optimized adaptive dynamic programming

Particle Swarn Optimized Adaptive Dynamic Programming

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Dongbin Zhao Jianqiang Yi Derong Liu Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China Department of Electrical and Computer Engineering University of Illinois Chicago Chicago IL USA

Particle swarm optimization is used for the training of the action network and critic network of the adaptive dynamic programming approach. The typical structures of the adaptive dynamic programming and particle swarm optimization are adopted for comparison to other learning algorithms such as gradient descent method. Besides simulation on the balancing of a cart pole plant, a more complex plant pendulum robot (pendubot) is tested for the learning performance. Compared to traditional adaptive dynamic programming approaches, the proposed evolutionary learning strategy is verified as faster convergence and higher efficiency. Furthermore, the structure becomes simple because the plant model does not need to be identified beforehand

关键词： dynamic programming Particle swarm optimization Neural networks Robots Backpropagation adaptive systems Evolutionary computation learning Cost function Testing

来源：评论

学校读者我要写书评

暂无评论

Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design

Using ADP to Understand and Replicate Brain Intelligence: th...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Paul J. Werbos National Science Foundation Arlington VA USA

Since the 1960's the author proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptive dynamic programming (ADP) - like "reinforcement learning" but based on approximating the Bellman equation and allowing the controller to know its utility function. Growing empirical evidence on the brain supports this approach. adaptive critic systems now meet tough engineering challenges and provide a kind of first-generation model of the brain. Lewis, Prokhorov and myself have early second-generation work. Mammal brains possess three core capabilities - creativity/imagination and ways to manage spatial and temporal complexity - even beyond the second generation. This paper reviews previous progress, and describes new tools and approaches to overcome the spatial complexity gap.

关键词： adaptive systems Intelligent structures Buildings Programmable control adaptive control dynamic programming learning Equations Control systems Brain modeling

来源：评论

学校读者我要写书评

暂无评论

Toward effective combination of off-line and on-line training in ADP framework

Toward effective combination of off-line and on-line trainin...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Danil Prokhorov Toyota Technical Center Ann Arbor MI USA

We are interested in finding the most effective combination between off-line and on-line/real-time training in approximate dynamic programming. We introduce our approach of combining proven off-line methods of training for robustness with a group of on-line methods. Training for robustness is carried out on reasonably accurate models with the multi-stream Kalman filter method (Feldkamp et al., 1998), whereas on-line adaptation is performed either with the help of a critic or by methods resembling reinforcement learning. We also illustrate importance of using recurrent neural networks for both controller/actor and critic

关键词： Neurocontrollers Robustness Recurrent neural networks Neural networks adaptive control dynamic programming Robust control Programmable control learning Uncertainty

来源：评论

学校读者我要写书评

暂无评论

Convergence of Model-Based Temporal Difference learning for Control

Convergence of Model-Based Temporal Difference Learning for ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hado van Hasselt Marco A. Wiering Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

A theoretical analysis of model-based temporal difference learning for control is given, leading to a proof of convergence. This work differs from earlier work on the convergence of temporal difference learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the optimal policy is guaranteed to be reached

关键词： Convergence learning dynamic programming Intelligent systems Telephony Stochastic processes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：