检索结果-内蒙古大学图书馆

learning Intrusion Prevention Policies through Optimal Stopping 17

Learning Intrusion Prevention Policies through Optimal Stopp...

17th international Conference on Network and Service Management (CNSM) - Smart Management for Future Networks and Services

作者： Hammar, Kim Stadler, Rolf KTH Royal Inst Technol Div Network & Syst Engn Stockholm Sweden KTH Ctr Cyber Def & Informat Secur Stockholm Sweden

ISBN: (纸本)9783903176362

We study automated intrusion prevention using reinforcement learning. In a novel approach, we formulate the problem of intrusion prevention as an optimal stopping problem. This formulation allows us insight into the structure of the optimal policies, which turn out to be threshold based. Since the computation of the optimal defender policy using dynamic programming is not feasible for practical cases, we approximate the optimal policy through reinforcement learning in a simulation environment. To define the dynamics of the simulation, we emulate the target infrastructure and collect measurements. Our evaluations show that the learned policies are close to optimal and that they indeed can be expressed using thresholds.

关键词： Network Security automation optimal stopping reinforcement learning Markov Decision Processes

来源：评论

学校读者我要写书评

暂无评论

Relations between Model Predictive Control and reinforcement learning

Relations between Model Predictive Control and Reinforcement...

引用

20th World Congress of the international-Federation-of-Automatic-Control (IFAC)

作者： Goerges, Daniel Univ Kaiserslautern Electromobil Erwin Schrodinger Str 12 D-67663 Kaiserslautern Germany

In this paper relations between model predictive control and reinforcement learning are studied for discrete-time linear time-invariant systems with state and input constraints and a quadratic value function. The principles of model predictive control and reinforcement learning are reviewed in a tutorial manner. From model predictive control theory it is inferred that the optimal value function is piecewise quadratic on polyhedra and that the optimal policy is piecewise affine on polyhedra. Various ideas for exploiting the knowledge on the structure and the properties of the optimal value function and the optimal policy in reinforcement learning theory and practice are presented. The ideas can be used for deriving stability and feasibility criteria and for accelerating the learning process which can facilitate reinforcement learning for systems with high order, fast dynamics, and strict safety requirements. (C) 2017, IFAC (international Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： Model predictive control multi-parametric programming reinforcement learning approximate dynamic programming actor-critic structure

来源：评论

学校读者我要写书评

暂无评论

Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming

引用

NEUROCOMPUTING 2016年 198卷 80-90页

作者： Yang, Xiong Liu, Derong Wei, Qinglai Wang, Ding Chinese Acad Sci Complex Syst Inst Automat State Key Lab Management & Control Beijing 100190 Peoples R China Univ Sci & Technol Sch Automat & Elect Engn Beijing 100083 Peoples R China

This paper presents an adaptive dynamic programming-based guaranteed cost neural tracking control algorithm for a class of continuous-time matched uncertain nonlinear systems. By introducing an augmented system and employing a modified cost function with a discount factor, the guaranteed cost tracking control problem is transformed into an optimal tracking control problem. Unlike existing optimal tracking control algorithms often requiring the control matrix to be invertible, the developed control algorithm relaxes this restrictive condition under the assumption that the system is controllable. A single critic neural network (NN) is constructed to approximate the solution of the modified Hamilton-Jacobi-Bellman equation corresponding to the nominal augmented error dynamics. Utilizing the newly developed critic NN, the optimal tracking control can be derived without policy iteration. All signals in the closed-loop system are proved to be uniformly ultimately bounded via Lyapunov's direct method. In addition, the developed control scheme is verified to guarantee that the tracking errors converge to an adjustable neighborhood of the origin. Two numerical examples are provided to illustrate the effectiveness and applicability of the developed approach. (C) 2016 Elsevier B.V. All rights reserved.

关键词： Adaptive dynamic programming Guaranteed cost control Hamilton-Jacobi-Bellman equation Neural network Nonlinear system reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Decision theory on dynamic domains: Nabla derivatives and the hamilton-jacobi-bellman equation

Decision theory on dynamic domains: Nabla derivatives and th...

引用

2008 ieee international Conference on Systems, Man and Cybernetics, SMC 2008

作者： Seiffertt, John Wunsch II, Donald C. Sanyal, Suman Department of Electrical and Computer Engineering Missouri University of Science and Technology Rolla MO United States Department of Mathematics and Computer Science Clarkson University Potsdam NY United States

The time scales calculus, which includes the study of the nabla derivatives, is an emerging key topic due to many multidisciplinary applications. We extend this calculus to approximate dynamic programming. In particular, we investigate application of the nabla derivative, one of the fundamental dynamic derivatives of time scales. We present a nabla-derivative based derivation and proof of the Hamilton-Jacobi-Bellman equation, the solution of which is the fundamental problem in the field of dynamic programming. By drawing together the calculus of time scales and the applied area of stochastic control via approximate dynamic programming, we connect two major fields of research. © 2008 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A non-parametric approach to approximate dynamic programming

A non-parametric approach to approximate dynamic programming

引用

10th international Conference on Machine learning and Applications, ICMLA 2011

作者： Glaude, Hadrien Akrimi, Fadi Geist, Matthieu Pietquin, Olivier 57070 Metz France 2 rue Edouard Belin 57070 Metz France

ISBN: (纸本)9780769546070

approximate dynamic programming (ADP) is a machine learning method aiming at learning an optimal control policy for a dynamic and stochastic system from a logged set of observed interactions between the system and one or several non-optimal controlers. It defines a class of particular reinforcement learning (RL) algorithms which is a general paradigm for learning such a control policy from interactions. ADP addresses the problem of systems exhibiting a state space which is too large to be enumerated in the memory of a computer. Because of this, approximation schemes are used to generalize estimates over continuous state spaces. Nevertheless, RL still suffers from a lack of scalability to multidimensional continuous state spaces. In this paper, we propose the use of the Locally Weighted Projection Regression (LWPR) method to handle this scalability problem. We prove the efficacy of our approach on two standard benchmarks modified to exhibit larger state spaces. © 2011 ieee.

关键词： Stochastic systems

来源：评论

学校读者我要写书评

暂无评论

Advances in reinforcement learning and their implications for intelligent control

Advances in reinforcement learning and their implications fo...

引用

Proceedings of the 5th ieee international symposium on Intelligent Control 1990

作者： Whitehead, Steven D. Sutton, Richard S. Ballard, Dana H. Dept of Comput Sci Univ of Rochester NY USA

ISBN: (纸本)0818621087

The focus of this work is on control architectures that are based on reinforcement learning. A number of recent advances that have contributed to the viability of reinforcement learning approaches to intelligent control are surveyed. These advances include the formalization of the relationship between reinforcement learning and dynamic programming, the use of internal predictive models to improve learning rate, and the integration of reinforcement learning with active perception. On the basis of these advances and other results, it is concluded that control architectures based on reinforcement learning are now in a position to satisfy many of the criteria associated with intelligent control.

关键词： learning Systems

来源：评论

学校读者我要写书评

暂无评论

Optimal Control for a Class of Unknown Nonlinear Systems via the Iterative GDHP Algorithm

Optimal Control for a Class of Unknown Nonlinear Systems via...

引用

8th international symposium on Neural Networks

作者： Wang, Ding Liu, Derong Chinese Acad Sci Inst Automat Beijing 100190 Peoples R China

ISBN: (纸本)9783642210891

Using the neural-network-based iterative adaptive dynamic programming (ADP) algorithm, an optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is proposed in this paper. The optimal controller is designed with convergence analysis in terms of cost function and control law. In order to implement the algorithm via globalized dual heuristic programming (CDHP) technique, a neural network is constructed first to identify the unknown nonlinear system, and then two other neural networks are used to approximate the cost function and the control law, respectively. An example is provided to verify the effectiveness of the present approach.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming intelligent control neural networks optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Policy Gradient Approaches for Multi-Objective Sequential Decision Making: A Comparison

Policy Gradient Approaches for Multi-Objective Sequential De...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Parisi, Simone Pirotta, Matteo Smacchia, Nicola Bascetta, Luca Restelli, Marcello Politecn Milan Dept Elect Informat & Bioengn Piazza Leonardo da Vinci 32 I-20133 Milan Italy

ISBN: (纸本)9781479945528

This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective reinforcement-learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.

关键词： Pareto optimisation approximation theory decision making evolutionary computation gradient methods learning (artificial intelligence) MOMDPs MORL approaches Pareto following Pareto frontier approximation gradient-ascent algorithms gradient-based policy-search procedures multiobjective Markov decision processes multiobjective evolutionary algorithms multiobjective optimization problems multiobjective reinforcement-learning approaches multiobjective sequential decision making nondominated policies performance criterion policy gradient approaches policy-gradient algorithms radial following Algorithm design and analysis Approximation algorithms Approximation methods Manifolds Measurement Optimization Water resources evolutionary algorithm Performance metrics Pareto optimisation Algorithm design and analysis Manifolds Approximation method gradient methods Approximation Theory Approximation algorithms Water Resources Policies decision making

来源：评论

学校读者我要写书评

暂无评论

learning IN CONSTRAINED STOCHASTIC dynamic POTENTIAL GAMES 41

LEARNING IN CONSTRAINED STOCHASTIC DYNAMIC POTENTIAL GAMES

引用

41st ieee international Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Macua, Sergio Valcarcel Zazo, Santiago Zazo, Javier Univ Politecn Madrid E-28040 Madrid Spain

ISBN: (纸本)9781479999880

We extend earlier works on continuous potential games to the most general case: stochastic time varying environment, stochastic rewards, non-reduced form and constrained state-action sets. We provide conditions for a Markov Nash equilibrium (MNE) of the game to be equivalent to the solution of a single control problem. Then, we address the problem of learning this MNE when the reward and state transition models are unknown. We follow a reinforcement learning approach and extend previous algorithms for working with constrained state-action subsets of real vector spaces. As an application example, we simulate a network flow optimization model, in which the relays have batteries that deplete with a random factor. The results obtained with the proposed framework are close to optimal.

关键词： approximate dynamic programming game theory multi-agent network flow reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Optimal Control Scheme Based on Q-learning Algorithm 7

Discrete-Time Optimal Control Scheme Based on <i>Q</i>-Learn...

引用

7th international Conference on Intelligent Control and Information Processing (ICICIP)

作者： Wei, Qinglai Liu, Derong Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

ISBN: (纸本)9781509021550

This paper is concerned with optimal control problems of discrete-time nonlinear systems via a novel Q-learning algorithm. In the newly developed Q-learning algorithm, the iterative Q function in each iteration is required to update on the whole state and control spaces, instead of being updated by a single state and control pair. A new convergence criterion of the corresponding Q-learning algorithm is presented, where the traditional constraints for the learning rates of Q-learning algorithms is relaxed. Finally, simulation results are provided to exemplify the good performance of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming neuro-dynamic programming Q-learning optimal control neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：