检索结果-内蒙古大学图书馆

approximate dynamic programming with a fuzzy parameterization

AUTOMATICA 2010年第5期46卷 804-814页

作者： Busoniu, Lucian Ernst, Damien De Schutter, Bart Babuska, Robert Delft Univ Technol Delft Ctr Syst &Control NL-2628 CD Delft Netherlands Univ Liege Inst Montefiore FNRS B-4000 Liege Belgium

dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem. (C) 2010 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Fuzzy approximation Value iteration Convergence analysis

来源：评论

学校读者我要写书评

暂无评论

Spacecraft Autonomy modeled via Markov Decision Process and Associative Rule-based Machine Learning 4

Spacecraft Autonomy modeled via Markov Decision Process and ...

引用

4th IEEE International Workshop on Metrology for AeroSpace

作者： D'Angelo, Gianni Tipaldi, Massimo Glielmo, Luigi Rampone, Salvatore Univ Sannio Dept Sci & Technol Benevento Italy OHB Italia SpA Via Gallarate 150 I-20151 Milan Italy Univ Sannio Dept Engn Benevento Italy

ISBN: (纸本)9781509042340

Spacecraft on-board autonomy is an important topic in currently developed and future space missions. In this study, we present a robust approach to the optimal policy of autonomous space systems modeled via Markov Decision Process (MDP) from the values assigned to its transition probability matrix. After addressing the curse of dimensionality in solving the formulated MDP problem via approximate dynamic programming, we use an Apriori-based Association Classifier to infer a specific optimal policy. Finally, we also assess the effectiveness of such optimal policy in fulfilling the spacecraft autonomy requirements.

关键词： Spacecraft Autonomy Markov Decision Process approximate dynamic programming Machine Learning Association Classifier Apriori-based classifier

来源：评论

学校读者我要写书评

暂无评论

Relations between Model Predictive Control and Reinforcement Learning

Relations between Model Predictive Control and Reinforcement...

引用

20th World Congress of the International-Federation-of-Automatic-Control (IFAC)

作者： Goerges, Daniel Univ Kaiserslautern Electromobil Erwin Schrodinger Str 12 D-67663 Kaiserslautern Germany

In this paper relations between model predictive control and reinforcement learning are studied for discrete-time linear time-invariant systems with state and input constraints and a quadratic value function. The principles of model predictive control and reinforcement learning are reviewed in a tutorial manner. From model predictive control theory it is inferred that the optimal value function is piecewise quadratic on polyhedra and that the optimal policy is piecewise affine on polyhedra. Various ideas for exploiting the knowledge on the structure and the properties of the optimal value function and the optimal policy in reinforcement learning theory and practice are presented. The ideas can be used for deriving stability and feasibility criteria and for accelerating the learning process which can facilitate reinforcement learning for systems with high order, fast dynamics, and strict safety requirements. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： Model predictive control multi-parametric programming reinforcement learning approximate dynamic programming actor-critic structure

来源：评论

学校读者我要写书评

暂无评论

Four Nonlinear Multi-input Multi-output ADHDP Constructions and Algorithms Based on Topology Principle 4

Four Nonlinear Multi-input Multi-output ADHDP Constructions ...

引用

4th International Conference on Systems and Informatics (ICSAI)

作者： Huang, Zhijian Zhang, Cheng Zheng, Huan Wang, Shengtang Liu, Yihua Zhang, Guichen Huang, Xing Shanghai Maritime Univ Lab Intelligent Control & Computat Shanghai Peoples R China

ISBN: (纸本)9781538611074

In this paper, Four action-dependent heuristic dynamic programming control methods are presented for nonlinear multi-input-multi-output system with different characters based on the topology principle. These four methods are the action-network extension method, the sub-network method, the cascaded action-network method and the combined method. The derivation procedure and computing formulas of these methods are also derived. In it, the action-network extension method is mainly used for the conditions where the multi-output variables have the same orders of magnitude and a naturally coupled relationship. The sub-network method can nearly be applied in all cases and can solve the problem that the multi-output variables have different orders of magnitude. The cascaded action-network method is utilized when the multiple input variables have explicit cascaded relationships. The combined method can be used to control some highly regarded systems. Thus, these four methods can almost be used to satisfy all the design requirements of the nonlinear multi-input-multi-output control systems. The latter can refer to and select these methods as well as formulas for their control systems according to the research results to achieve a better control effect.

关键词： approximate dynamic programming nonlinear multi-input-multi-output method

来源：评论

学校读者我要写书评

暂无评论

Active Fault Diagnosis for Jump Markov Nonlinear Systems

引用

IFAC-PapersOnLine 2017年第1期50卷 7308-7313页

作者： Škach J. Punčochář I. Straka O. European Centre of Excellence - NTIS Faculty of Applied Sciences University of West Bohemia Pilsen 306 14 Czech Republic

In this paper, a problem of active fault diagnosis for jump Markov nonlinear systems with non-Gaussian noises is considered. The imperfect state information formulation is transformed using sufficient statistics to a dynamical optimization problem that can be solved using approximate dynamic programming. The sufficient statistics are produced using the Bayesian recursive relations and particle filter algorithm. A special structure of approximate Bellman function is chosen to reduce a complexity caused by high dimension of statistics obtained from the particle filter. The proposed active fault detector design is compared with an extended Kalman filter based design in the simulation example. © 2017

关键词： Active fault diagnosis approximate dynamic programming jump Markov systems nonlinear state estimation optimization particle filter

来源：评论

学校读者我要写书评

暂无评论

Stochastic Zero-Sum Nash Games for Uncertain Nonlinear Markovian Jump Systems 56

Stochastic Zero-Sum Nash Games for Uncertain Nonlinear Marko...

引用

56th Annual IEEE Conference on Decision and Control (CDC)

作者： Vamvoudakis, Kyriakos G. Safaei, Farshad R. Pour Virginia Tech Dept Aerosp & Ocean Engn Blacksburg VA 24061 USA Bosch Res & Technol Ctr North Amer RTC Palo Alto CA 94304 USA

ISBN: (纸本)9781509028733

In this paper, a novel adaptive learning technique is proposed to solve a stochastic zero-sum Nash game with partially unknown nonlinear systems for which the lengths of time intervals that the system spends in each mode are independent random variables with exponential distributions, i.e. the environment and the cost matrices depend on the outcome of a Markov chain. We first formulate the problem by using an optimal stopping process and then provide a verification theorem for stopping zero-sum games. A structure of 2 actors and 1 critic approximators are used to approximate the saddle-point policies and the optimal cost respectively. Effective tuning laws are proposed to solve the stochastic Nash game problem while also guaranteeing closed-loop stability with the use of rigorous Lyapunov-based stability proofs. Finally, a numerical example is used to illustrate the effectiveness of the proposed approach.

关键词： Stochastic Nash game optimal stopping process approximate dynamic programming Markov jump systems

来源：评论

学校读者我要写书评

暂无评论

Discrete-time Optimal Zero-sum Games for Nonlinear Systems via Adaptive dynamic programming 6

Discrete-time Optimal Zero-sum Games for Nonlinear Systems v...

引用

2017 IEEE 6th Data Driven Control and Learning Systems Conference (DDCLS’17)

作者： Qinglai Wei Ruizhuo Song Yancai Xu Derong Liu Qiao Lin The State Key Laboratory of Management and Control for Complex Systems Institute of Automation Chinese Academy of Sciences School of Automation and Electrical Engineering University of Science and Technology Beijing

ISBN: (纸本)9781509054626

In this paper, a novel discrete-time iterative zero-sum adaptive dynamic programming(ADP) algorithm is developed for solving the optimal control problems of nonlinear systems. Two iteration processes, which are lower and upper iterations, are employed to solve the lower and upper value functions, respectively. Arbitrary positive semi-definite functions are acceptable to initialize the upper and lower iterations of the iterative zero-sum ADP algorithm. It is proven that the upper and lower value functions converge to the optimal performance index function if the optimal performance index function exists, where the existence criterion of the optimal performance index function is unnecessary. Simulation examples are given to illustrate the effective performance of the present method.

关键词： approximate dynamic programming adaptive dynamic programming zero-sum game optimal control

来源：评论

学校读者我要写书评

暂无评论

Neural Network Adaptive Critic Control With Disturbance Rejection 29

Neural Network Adaptive Critic Control With Disturbance Reje...

引用

第29届中国控制与决策会议

作者： Ding Wang Chaoxu Mu Derong Liu The State Key Laboratory of Management and Control for Complex Systems Institute of Automation Chinese Academy of Sciences School of Computer and Control Engineering University of Chinese Academy of Sciences School of Electrical and Information Engineering Tianjin University School of Automation and Electrical Engineering University of Science and Technology Beijing

ISBN: (纸本)9781509046584

A neural-network-based adaptive critic control method is established for continuous-time input-affine uncertain nonlinear systems to achieve disturbance *** present problem can be formulated as a two-player zero-sum differential game and the adaptive critic mechanism is employed to solve the minimax optimization problem.A neural network identifier is developed to reconstruct the unknown dynamical *** optimal control law and the worst-case disturbance law are designed by introducing and training a critic neural *** effectiveness of the present self-learning control method is also illustrated by a simulation experiment.

关键词： Adaptive critic control Adaptive dynamic programming approximate dynamic programming Disturbance rejection Learning systems Neural networks Nonlinear control

来源：评论

学校读者我要写书评

暂无评论

Local Policy Iteration Adaptive dynamic programming for Discrete-Time Nonlinear Systems 14th

Local Policy Iteration Adaptive Dynamic Programming for Disc...

引用

14th International Symposium on Neural Networks (ISNN)

作者： Wei, Qinglai Xu, Yancai Lin, Qiao Liu, Derong Song, Ruizhuo Univ Chinese Acad Sci Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

ISBN: (纸本)9783319590813;9783319590806

Adaptive dynamic programming is a hot research topic nowadays. Therefore, the paper concerns a new local policy adaptive iterative dynamic programming (ADP) algorithm. Moreover, this algorithm is designed for the discrete-time nonlinear systems, which are used to solve problems concerning infinite horizon optimal control. The new local policy iteration ADP algorithm has the characteristics of updating the iterative control law and value function within one subset of the state space. Morevover, detailed iteration process of the local policy iteration is presented thereafter. The simulation example is listed to show the good performance of the newly developed algorithm.

关键词： Nonlinear systems approximate dynamic programming Local policy iteration Optimal control Discrete time

来源：评论

学校读者我要写书评

暂无评论

A Data-driven Online ADP of Exponential Convergence Based on k-nearest-neighbor Averager, Stable Term and Persistence Excitation 4

A Data-driven Online ADP of Exponential Convergence Based on...

引用

4th International Conference on Systems and Informatics (ICSAI)

作者： Huang, Zhijian Wang, Shengtang Zheng, Huan Zhang, Cheng Zhang, Guichen Wu, Qili Tan, Qinmin Yang, Zhiyuan Shanghai Maritime Univ Lab Intelligent Control & Computat Shanghai Peoples R China

ISBN: (纸本)9781538611074

With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. approximate dynamic programming solves the curse of dimensionality problem of dynamic programming, and it is a new kind of approximate optimization solution that emerges in recent years. Based on the analysis of optimization system, this paper proposes a nonlinear multi-input multi-output, online learning, and data-driven approximate dynamic programming structure and its learning algorithm. The method is achieved from the following three aspects: 1) the critic function of multi-dimensional input critic module of the approximate dynamic programming is approximated with a data-driven k-nearest neighbor method;2) the multi-output policy iteration of the approximate dynamic programming actor module is calculated with an exponential convergence performance;3) The critic and actor modules are learned synchronously, and achieve the online optimal and control effect. The optimal control for the longitudinal motion of a thermal underwater glider is used to show the effect of the proposed method. This work can lay a foundation for the theory and application of a nonlinear data-driven multi-input multi-output approximate dynamic programming method. It's also the consensus needs in optimization control and artificial intelligence of many scientific and engineering fields, such as energy conservation, emission reduction, decision support and operational management etc.

关键词： approximate dynamic programming exponential convergence k-nearest-neighbor persistence excitation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：