检索结果-内蒙古大学图书馆

World Automation Congress 2008

作者： Lin, Wei-Song Tu, Chia-Hsiang Natl Taiwan Univ Dept Elect Engn Taipei Taiwan

ISBN: (纸本)9781889335384

Motion controllers capable of incremental learning and optimization can automatically tune their parameters to pursue optimal control. By implementing reinforcement learning and approximate dynamic programming, an adaptive critic motion controller is shown able to achieve this objective. The control policy and the adaptive critic are implemented by sparse radial basis function networks. The policy and the critic updating rules are derived. Ability and performance of the adaptive critic motion controller is demonstrated by the control of a rotary inverted pendulum system.

关键词： neural network approximate dynamic programming adaptive critic motion control radial basis function

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming strategies and their applicability for process control: A review and future directions

引用

INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS 2004年第3期2卷 263-278页

作者： Lee, JM Lee, JH Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

This paper reviews dynamic programming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-dynamic programming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques for dynamic programming and identify those most suitable for process control problems. Several open issues are also identified and discussed.

关键词： approximate dynamic programming reinforcement learning neuro-dynamic programming optimal control function approximation

来源：评论

学校读者我要写书评

暂无评论

Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints

引用

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2007年第2期37卷 425-436页

作者： He, Pingan Jagannathan, S. Univ Missouri Dept Elect & Comp Engn Rolla MO 65409 USA

A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the "strategic" utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing certain quadratic performance indexes. Using the Lyapunov approach and with novel weight updates, the uniformly ultimate boundedness of the closed-loop tracking error and weight estimates is shown in the presence of NN approximation errors and bounded unknown disturbances. The proposed NN controller works in the presence of multiple nonlinearities, unlike other schemes that normally approximate one nonlinearity. Moreover, the adaptive critic NN controller does not require an explicit offline training phase, and the NN weights can be initialized at zero or random. Simulation results justify the theoretical analysis.

关键词： approximate dynamic programming neural network control optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Q-Learning-based method applied to stochastic resource constrained project scheduling with new project arrivals

引用

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 2007年第13期17卷 1214-1231页

作者： Choi, Jaein Realff, Matthew J. Lee, Jay H. Georgia Inst Technol Sch Chem & Biochem Engn Atlanta GA 30332 USA

In many resource-constrained project scheduling problems (RCPSP), the set of candidate projects is not fixed a priori but evolves with time. For example, while performing an initial set of projects according to a certain decision policy, a new promising project can emerge. To make an appropriate resource allocation decision for such a problem, project cancellation and resource idling decisions should complement the conventional scheduling decisions. In this study, the problem of stochastic RCPSP (sRCPSP) with dynamic project arrivals is addressed with the added flexibility of project cancellation and resource idling. To solve the problem, a Q-Learning-based approach is adopted. To use the approach, the problem is formulated as a Markov Decision Process with appropriate definitions of states, including information state and action variables. The Q-Learning approach enables us to derive an empirical state transition rules from simulation data so that analytical calculations of potentially exorbitantly complicated state transition rules can be circumvented. To maximize the advantage of using the empirically learned state transition rules, special type of actions including project cancellation and resource idling, which are difficult to incorporate into heuristics, were randomly added in the simulation. The random actions are filtered during the Q-Value iteration and properly utilized in the online decision making to maximize the total expected reward. Copyright (C) 2007 John Wiley & Sons, Ltd.

关键词： approximate dynamic programming Q-Learning stochastic project scheduling

来源：评论

学校读者我要写书评

暂无评论

An infinite-dimensional linear programming algorithm for deterministic semi-markov decision processes on borel spaces

引用

MATHEMATICS OF OPERATIONS RESEARCH 2007年第3期32卷 528-550页

作者： Klabjan, Diego Adelman, Daniel Univ Illinois Dept Civil & Environm Engn Urbana IL 61801 USA Univ Chicago Grad Sch Business Chicago IL 60637 USA

We devise an algorithm for solving the infinite-dimensional linear programs that arise from general deterministic semi-Markov decision processes on Borel spaces. The algorithm constructs a sequence of approximate primal-dual solutions that converge to an optimal one. The innovative idea is to approximate the dual solution with continuous piecewise linear ridge functions that naturally represent functions defined on a high-dimensional domain as linear combinations of functions defined on only a single dimension. This approximation gives rise to a primal/dual pair of semi-infinite programs, for which we show strong duality. In addition, we prove various properties of the underlying ridge functions.

关键词： infinite/semi-infinite linear programming algorithms deterministic semi-Markov decision processes approximate dynamic programming ridge function approximations

来源：评论

学校读者我要写书评

暂无评论

Continuous-time adaptive critics

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2007年第3期18卷 631-647页

作者： Hanselmann, Thomas Noakes, Lyle Zaknich, Anthony Univ Melbourne Dept Elect & Elect Engn Parkville Vic 3010 Australia Univ Western Australia Sch Math & Stat Crawley WA 6009 Australia Murdoch Univ Sch Engn Sci Perth WA 6150 Australia

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume.

关键词： actor-critic adaptation adaptive critic design (ACD) approximate dynamic programming backpropagation through time (BPTT) continuous adaptive critic designs real-time recurrent learning (RTRL) reinforcement learning second-order actor adaptation

来源：评论

学校读者我要写书评

暂无评论

Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control

引用

AUTOMATICA 2007年第3期43卷 473-481页

作者： Al-Tamimi, Asma Lewis, Frank L. Abu-Khalaf, Murad Univ Texas Automat & Robot Res Inst Arlington TX 76118 USA

In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x, u, w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming (ADP) model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the GARE of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft. (C) 2007 Elsevier Ltd. All rights reserved.

关键词： adaptive critics approximate dynamic programming zero-sum games policy iterations H-infinity optimal control Q-function Q-learning adaptive control

来源：评论

学校读者我要写书评

暂无评论

Kernel-based least squares policy iteration for reinforcement learning

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2007年第4期18卷 973-992页

作者： Xu, Xin Hu, Dewen Lu, Xicheng Natl Univ Def Technol Coll Mechatron & Automat Inst Automat Changsha 410073 Peoples R China Natl Univ Def Technol Coll Mechatron & Automat Dept Automat Control Changsha 410073 Peoples R China Natl Univ Def Technol Sch Comp Changsha 410073 Peoples R China

In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating a

关键词： approximate dynamic programming kernel methods least squares Markov decision problems (MDPs) reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Simulation-based design of dual-mode controller for non-linear processes

引用

CANADIAN JOURNAL OF CHEMICAL ENGINEERING 2007年第4期85卷 506-511页

作者： Lee, Jong Min Lee, Jay H. Univ Alberta Dept Chem & Mat Engn Edmonton AB T6G 2G6 Canada Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

This paper presents a simulation-based approach for designing a non-linear override control scheme to improve the performance of a local linear controller. The higher-level non-linear controller monitors the dynamic state of the system and calculates an override control action whenever the system is predicted to move outside an acceptable operating regime under the local controller. The design of the non-linear override controller is based on a cost-to-go function, which is constructed by using simulation or operation data. The cost-to-go function delineates the admissible region of state space within which the local controller is effective, thereby yielding a switching rule.

关键词： approximate dynamic programming non-linear predictive control switching controller cost-to-go function invariant sets

来源：评论

学校读者我要写书评

暂无评论

dynamic optimization of the strength ratio during a terrestrial conflict

Dynamic optimization of the strength ratio during a terrestr...

引用

IEEE International Symposium on approximate dynamic programming and Reinforcement Learning

作者： Sztykgold, Alexandre Coppin, Gilles Hudry, Olivier GET ENST Bretagne LUSSI Dept CNRS TAMCICUMR 2872 Bretagne Germany GET ENST Bretagne Dept Comp Sci CNRS LTCI UMR 5141 Bretagne Germany

ISBN: (纸本)9781424407064

The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to Temporal Differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues.

关键词： decision aid game theory graph theory viability theory Temporal Differences methods approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：