检索结果-内蒙古大学图书馆

Online Nash-optimization tracking control of multi-motor driven load system with simplified RL scheme

ISA TRANSACTIONS 2020年 98卷 251-262页

作者： Lv, Yongfeng Ren, Xuemei Na, Jing Beijing Inst Technol Sch Automat Beijing 100081 Peoples R China Kunming Univ Sci & Technol Fac Mech & Elect Engn Kunming 650500 Yunnan Peoples R China

Although the optimal tracking control problem (OTCP) has been addressed recently, only the single-input system is considered in the recent literature. In this paper, the OTCP of unknown multi-motor driven load systems (MMDLS) is addressed based on a simplified reinforcement learning (RL) structure, where all the motor inputs with different dynamics will be obtained as a Nash equilibrium. Thus, the performance indexes associated with each input can be optimized as an outcome of a Nash equilibrium. Firstly, we use an identifier to reconstruct MMDLS dynamics, such that the accurate model required in the general control design is avoided. We use the identified dynamics to drive Nash-optimization inputs, which include the steady-state controls and the RL-based controls. The steady-state controls are designed with the identified system model. The RL-based controls are designed using the optimization method with the simplified RL-based critic NN schemes. We use the simplified RL structures to approximate the cost function of each motor input in the optimal control design. The NN weights of both the identified algorithm and simplified RL-based structure are approximated by using a novel adaptation algorithm, where the learning gains can be optimized adaptively. The weight convergences and the Nash-optimization MMDLS stability are all proved. Finally, numerical MMDLS simulations are implemented to show the correctness and the improved performance of the proposed methods. (C) 2019 ISA. Published by Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Nonzero-sum game Neural network Servo system Optimal tracking control

来源：评论

学校读者我要写书评

暂无评论

dynamic Optimization for Airline Maintenance Operations

引用

TRANSPORTATION SCIENCE 2020年第4期54卷 998-1015页

作者： Lagos, Carlos Delgado, Felipe Klapp, Mathias A. Pontificia Univ Catolica Chile Sch Engn Santiago 9999 Chile

The occurrence of unexpected aircraft maintenance tasks can produce expensive changes in an airline's operation. When it comes to critical tasks, it might even cancel programmed flights. Despite this, the challenge of scheduling aircraft maintenance operations under uncertainty has received limited attention in the scientific literature. We study a dynamic airline maintenance scheduling problem, which daily decides the set of aircraft to maintain and the set of pending tasks to execute in each aircraft. The objective is to minimize the expected costs of expired maintenance tasks over the operating horizon. To increase flexibility and reduce costs, we integrate maintenance scheduling with tail assignment decisions. We formulate our problem as a Markov decision process and design dynamic policies based on approximate dynamic programming, including value function approximation, rolling horizon techniques, and a hybrid policy between the latter two that deliversthebest results. In a casestudy based on LATAM airline, we show the value of dynamic optimization by testing our best policies against a simple airline decision rule and a deterministic relaxation with perfect future information. We suggest to schedule tasks requiring less resources first to increase utilization of residual maintenance capacity. Finally, we observe strong economies of scale when sharing maintenance resources between multiple airlines.

关键词： aircraft maintenance approximate dynamic programming task scheduling tail assignment

来源：评论

学校读者我要写书评

暂无评论

Actor-critic learning for optimal building energy management with phase change materials

引用

ELECTRIC POWER SYSTEMS RESEARCH 2020年 188卷

作者： Rahimpour, Zahra Verbic, Gregor Chapman, Archie C. Univ Sydney Sch Elect & Informat Engn Sydney NSW Australia Univ Queensland Sch Informat Technol & Elect Engn Brisbane Qld Australia

Energy management in buildings using phase change materials (PCM) to improve thermal performance is challenging due to the nonlinear thermal capacity of the PCM. To address this problem, this paper adopts a model-free actor-critic on-policy reinforcement learning method based on deep deterministic policy gradient (DDPG). The proposed approach overcomes the major weakness of model-based approaches, such as approximate dynamic programming (ADP), which require an explicit thermal model of the building under control. This requirement makes a plug-and-play implementation of the energy management algorithm in an existing smart meter difficult due to the wide variety of building design and construction types. To overcome this difficulty, we use a DDPG algorithm that can learn policies in continuous action spaces without access to the full dynamics of the building. We demonstrate the competitive performance of DDPG by benchmarking it against an ADP-based approach with access to the full thermal dynamics of the building.

关键词： Actor-critic approximate dynamic programming Deep deterministic policy gradient Home energy management Phase change materials

来源：评论

学校读者我要写书评

暂无评论

Differential-game for resource aware approximate optimal control of large-scale nonlinear systems with multiple players

引用

NEURAL NETWORKS 2020年 124卷 95-108页

作者： Sahoo, Avimanyu Narayanan, Vignesh Oklahoma State Univ Div Engn Technol 555 Engn North Stillwater OK 74078 USA Washington Univ St Louis MO 63110 USA

In this paper, we propose a novel differential-game based neural network (NN) control architecture to solve an optimal control problem for a class of large-scale nonlinear systems involving N-players. We focus on optimizing the usage of the computational resources along with the system performance simultaneously. In particular, the N-players' control policies are desired to be designed such that they cooperatively optimize the large-scale system performance, and the sampling intervals for each player are desired to reduce the frequency of feedback execution. To develop a unified design framework that achieves both these objectives, we propose an optimal control problem by integrating both the design requirements, which leads to a multi-player differential-game. A solution to this problem is numerically obtained by solving the associated Hamilton-Jacobi (HJ) equation using event-driven approximate dynamic programming (E-ADP) and artificial NNs online and forward-in-time. We employ the critic neural networks to approximate the solution to the HJ equation, i.e., the optimal value function, with aperiodically available feedback information. Using the NN approximated value function, we design the control policies and the sampling schemes. Finally, the event-driven N-player system is remodeled as a hybrid dynamical system with impulsive weight update rules for analyzing its stability and convergence properties. The closed-loop practical stability of the system and Zeno free behavior of the sampling scheme are demonstrated using the Lyapunov method. Simulation results using a numerical example are also included to substantiate the analytical results. (C) 2020 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Event-driven control Neural network control Nonzero sum game Optimal control

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming via a Smoothed Linear Program

引用

OPERATIONS RESEARCH 2012年第3期60卷 655-674页

作者： Desai, Vijay V. Farias, Vivek F. Moallemi, Ciamac C. Columbia Univ Dept Ind Engn & Operat Res New York NY 10027 USA MIT Sloan Sch Management Cambridge MA 02139 USA Columbia Univ Grad Sch Business New York NY 10027 USA

We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural "projection" of a well-studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program-the "smoothed approximate linear program"-is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. These bounds are, in general, no worse than those available for extant LP approaches and for specific problem instances can be shown to be arbitrarily stronger. Second, experiments with our approach on a pair of challenging problems (the game of Tetris and a queueing network control problem) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by a substantial margin.

关键词： optimization linear programming stochastic control Markov decision processes approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Revisiting approximate Linear programming: Constraint-Violation Learning with Applications to Inventory Control and Energy Storage

引用

MANAGEMENT SCIENCE 2020年第4期66卷 1544-1562页

作者： Lin, Qihang Nadarajah, Selvaprabu Soheili, Negar Univ Iowa Tippie Coll Business Iowa City IA 52242 USA Univ Illinois Coll Business Adm Chicago IL 60607 USA

approximate linear programs (ALPs) are well-known models for computing value function approximations (VFAs) of intractable Markov decision processes (MDPs). VFAs from ALPs have desirable theoretical properties, define an operating policy, and provide a lower bound on the optimal policy cost. However, solving ALPs near-optimally remains challenging, for example, when approximating MDPs with nonlinear cost functions and transition dynamics or when rich basis functions are required to obtain a good VFA. We address this tension between theory and solvability by proposing a convex saddle-point reformulation of an ALP that includes as primal and dual variables, respectively, a vector of basis function weights and a constraint violation density function over the state-action space. To solve this reformulation, we develop a proximal stochastic mirror descent (PSMD) method that learns regions of high ALP constraint violation via its dual update. We establish that PSMD returns a near-optimal ALP solution and a lower bound on the optimal policy cost in a finite number of iterations with high probability. We numerically compare PSMD with several benchmarks on inventory control and energy storage applications. We find that the PSMD lower bound is tighter than a perfect information bound. In contrast, the constraint-sampling approach to solve ALPs may not provide a lower bound, and applying row generation to tackle ALPs is not computationally viable. PSMD policies outperform problem-specific heuristics and are comparable or better than the policies obtained using constraint sampling. Overall, our ALP reformulation and solution approach broadens the applicability of approximate linear programming.

关键词： approximate linear programming approximate dynamic programming stochastic gradient descent Inventory control energy storage

来源：评论

学校读者我要写书评

暂无评论

A Distributed Iterative Learning Framework for DC Microgrids: Current Sharing and Voltage Regulation

引用

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020年第2期4卷 119-129页

作者： Liu, Xiao-Kang Jiang, He Wang, Yan-Wu He, Haibo Huazhong Univ Sci & Technol Sch Automat Wuhan 430074 Peoples R China Huazhong Univ Sci & Technol Minist Educ Key Lab Image Proc & Intelligent Control Wuhan 430074 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

With the penetration of computation intelligence, an increasing number of learning methods are developed into power engineering, such as dc microgrid applications. This paper establishes a distributed iterative learning framework to solve the current sharing and voltage regulation problem in an islanded dc microgrid from the perspective of game theory. The control objectives of dc microgrid include not only achieving the desired output current dispatch, but also regulating the voltage of dc bus to its rated value. Based on the two objectives, local performance indexes are established and an N-player game is formulated. Each source aims to minimize its own performance index and to achieve the current sharing objective simultaneously. Under the framework of game theory, a distributed iterative learning algorithm is designed based on the Bellman optimality principle and subsequently carried out using the approximate dynamic programming technique. The proposed algorithm is data based where it does not require to have the accurate model parameters of the dc microgrid and it ensures that the dc microgrid falls into a Nash equilibrium. Furthermore, a rigorous convergence analysis of the proposed algorithm is given. To demonstrate the effectiveness of the proposed method, simulation examples are presented on a tested dc microgrid.

关键词： Iterative learning approximate dynamic programming Nash equilibrium DC microgrid distributed control

来源：评论

学校读者我要写书评

暂无评论

Context-Aware dynamic Asset Allocation for Maritime Interdiction Operations

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2020年第3期50卷 1055-1073页

作者： Sidoti, David Han, Xu Zhang, Lingyi Avvari, Gopi Vinod Ayala, Diego Fernando Martinez Mishra, Manisha Sankavaram, Muni Sravanth Kellmeyer, David L. Hansen, James A. Pattipati, Krishna R. Univ Connecticut Dept Elect & Comp Engn Storrs CT 06269 USA Two Sigma Modeling Dept New York NY 10013 USA Argus Informat & Advisory Serv LLC Data & Applicat Solut White Plains NY 10601 USA SPAWAR Syst Ctr Pacific Command & Control Dept San Diego CA 92152 USA US Naval Res Lab Marine Meteorol Div Monterey CA 93943 USA

This paper validates two approximate dynamic programming approaches on a maritime interdiction problem involving the allocation of multiple heterogeneous assets over a large area of responsibility to interdict multiple drug smugglers using heterogeneous types of transportation on the sea with varying contraband weights. The asset allocation is based on a probability of activity surface, which represents spatio-temporal target activity obtained by integrating intelligence data on drug smuggler whereabouts/waypoints for contraband transportation, behavior models, and meteorological and oceanographic information. We validate the proposed architectural and algorithmic concepts via several realistic mission scenarios. We conduct sensitivity analyses to quantify the robustness and proactivity of our approach, as well as to measure the value of information used in the allocation process. The contributions of this paper have been transitioned to and are currently being tested by Joint Interagency Task Force-South, an organization tasked with providing the initial line of defense against drug trafficking in the East Pacific and Caribbean Oceans.

关键词： Resource management dynamic programming Drugs Stochastic processes Transportation Algorithm design and analysis Planning approximate dynamic programming Gauss-Seidel iteration mission planning resource management problem rollout sensitivity analysis value of information (VOI)

来源：评论

学校读者我要写书评

暂无评论

A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system

引用

JOURNAL OF PROCESS CONTROL 2020年 87卷 166-178页

作者： Kim, Jong Woo Park, Byung Jun Yoo, Haeun Oh, Tae Hoon Lee, Jay H. Lee, Jong Min Seoul Natl Univ Sch Chem & Biol Engn Inst Chem Proc 1 Gwanak Ro Seoul 08826 South Korea Korea Adv Inst Sci & Technol Dept Chem & Biomol Engn 291 Daehak Ro Daejeon 34141 South Korea

The Hamilton-Jacobi-Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method. (C) 2020 Elsevier Ltd. All rights reserved.

关键词： Reinforcement learning approximate dynamic programming Deep neural networks Globalized dual heuristic programming Finite horizon optimal control problem Hamilton-Jacobi-Bellman equation

来源：评论

学校读者我要写书评

暂无评论

Value Iteration-Based H∞ Controller Design for Continuous-Time Nonlinear Systems Subject to Input Constraints

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2020年第11期50卷 3986-3995页

作者： Zhang, Huaguang Xiao, Geyang Liu, Yang Liu, Lei Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110004 Peoples R China Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Peoples R China Liaoning Univ Technol Coll Sci Jinzhou 121001 Peoples R China

In this paper, a novel integral reinforcement learning method is proposed based on value iteration (VI) to design the H-infinity controller for continuous-time nonlinear systems subject to input constraints. To confront the control constraints, a nonquadratic function is introduced to reconstruct the L-2-gain condition for the H-infinity control problem. Then, the VI method is proposed to solve the corresponding Hamilton-Jacobi-Isaacs equation initialized with an arbitrary positive semi-definite value function. Compared with most existing works developed based on policy iteration, the initial admissible control policy is no longer required which results in a more free initial condition. The iterative process of the proposed VI method is analyzed and the convergence to the saddle point solution is proved in a general way. For the implementation of the proposed method, only one neural network is introduced to approximate the iterative value function, which results in a simpler architecture with less computational load compared with utilizing three neural networks. To verify the effectiveness of the VI-based method, two nonlinear cases are presented, respectively.

关键词： approximate dynamic programming H-infinity control reinforcement learning (RL) value iteration (VI)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：