This paper reviews dynamicprogramming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-dynamicprogramming (NDP),...
详细信息
This paper reviews dynamicprogramming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-dynamicprogramming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques for dynamicprogramming and identify those most suitable for process control problems. Several open issues are also identified and discussed.
The global containerised trade heavily relies on liner shipping services, facilitating the worldwide movement of large cargo volumes along fixed routes and schedules. The profitability of shipping companies hinges on ...
详细信息
The global containerised trade heavily relies on liner shipping services, facilitating the worldwide movement of large cargo volumes along fixed routes and schedules. The profitability of shipping companies hinges on how efficiently they design their shipping network;a complex optimization problem known as the liner shipping network design problem (LSNDP). In recent years, approximate dynamic programming (ADP), also known as reinforcement learning, has emerged as a promising approach for large-scale optimisation. This paper introduces a novel Markov decision process for the LSNDP and investigates the potential of ADP. We show that ADP methods based on value iteration produce optimal solutions to small instances, but their scalability is hindered by high memory demands. An ADP method based on a deep neural network requires less memory and successfully obtains feasible solutions. The quality of solutions, however, declines for larger instances, possibly due to the discrete nature of high-dimensional state and action spaces.
In this paper, we propose a novel formulation for encoding state constraints into the Linear programming approach to approximate dynamic programming via the use of penalty functions. To maintain tractability of the re...
详细信息
approximate dynamic programming(ADP) formulation implemented with an adaptive critic(AC)-based neural network(NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman(HJB) *** interest...
详细信息
approximate dynamic programming(ADP) formulation implemented with an adaptive critic(AC)-based neural network(NN) structure has evolved as a powerful technique for solving the Hamilton-Jacobi-Bellman(HJB) *** interest in ADP and the AC solutions are escalating with time,there is a dire need to consider possible enabling factors for their implementations.A typical AC structure consists of two interacting NNs,which is computationally *** this paper,a new architecture,called the ’cost-function-based single network adaptive critic(J-SNAC)’ is presented,which eliminates one of the networks in a typical AC *** approach is applicable to a wide class of nonlinear systems in *** order to demonstrate the benefits and the control synthesis with the J-SNAC,two problems have been solved with the AC and the J-SNAC *** are presented,which show savings of about 50% of the computational costs by J-SNAC while having the same accuracy levels of the dual network structure in solving for optimal ***,convergence of the J-SNAC iterations,which reduces to a least-squares problem,is discussed;for linear systems,the iterative process is shown to reduce to solving the familiar algebraic Ricatti equation.
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI)...
详细信息
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI) and operations research (OR) research communities, ADP has recently seen significant activities as an effective method for solving Markov decision processes (MDPs), which represent a type of multi-stage decision problems under uncertainty. Process control problems are similar to MDPs with the key difference being the continuous state and action spaces as opposed to discrete ones. In addition, unlike in other popular ADP application areas like robotics or games, in process control applications first and foremost concern should be on the safety and economics of the on-going operation rather than on efficient learning. We explore different options within ADP design, such as the pre-decision state vs. post-decision state value function, parametric vs. nonparametric value function approximator, batch-mode vs. continuous-mode learning, and exploration vs. robustness. We argue that ADP possesses great potentials, especially for obtaining effective control policies for stochastic constrained nonlinear or linear systems and continually improving them towards optimality. (C) 2010 Elsevier Ltd. All rights reserved.
The objective of this work is to extend the approximate dynamic programming (ADP) framework to online control of distributed parameter systems. The ADP framework involves using suboptimal control policies to identify ...
详细信息
The objective of this work is to extend the approximate dynamic programming (ADP) framework to online control of distributed parameter systems. The ADP framework involves using suboptimal control policies to identify the relevant regions of the state space and to generate a cost-to-go function approximation applicable in this region. We present model-based value iteration and model-free Q-learning approaches for feedback control of an adiabatic plug flow reactor. The state dimension is reduced using appropriate model reduction and sensor placement techniques. We show that both the approaches provide better performance than the initial model predictive control and Proportional-Integral-Derivative (PID) controllers. Finally, an extension of ADP to the stochastic case with full state feedback is presented. (C) 2011 Curtin University of Technology and John Wiley & Sons, Ltd.
The paper addresses the energy management of a building cooling system comprising a chiller plant with two chillers, a thermal storage unit, and a cooling load representing a building. Uncertainty affects the system s...
详细信息
ISBN:
(纸本)9781479929849
The paper addresses the energy management of a building cooling system comprising a chiller plant with two chillers, a thermal storage unit, and a cooling load representing a building. Uncertainty affects the system since the cooling load depends on the building occupancy. The goal is to minimize the energy consumption of the cooling system, while preserving comfort in the building. This is achieved by optimally distributing the cooling load demand among the chillers and the thermal storage unit, and modulating the building temperature set-point to some (limited) extent. The problem can be decomposed into a static optimization problem, and a dynamicprogramming problem, which is solved based on the abstraction to a Markov chain of the stochastic hybrid system modeling the cooling system.
In this paper, we propose a novel formulation for encoding state constraints into the Linear programming approach to approximate dynamic programming via the use of penalty functions. To maintain tractability of the re...
详细信息
In this paper, we propose a novel formulation for encoding state constraints into the Linear programming approach to approximate dynamic programming via the use of penalty functions. To maintain tractability of the resulting optimization problem that needs to be solved, we suggest a penalty function that is constructed as a point-wise maximum taken over a family of low-order polynomials. Once the penalty functions are designed, no additional approximations are introduced by the proposed formulation. The effectiveness and numerical stability of the formulation is demonstrated through examples. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
The strategy using approximate/adaptive dynamicprogramming (ADP) has been widely used to design a learning controller for complex systems of higher dimension in recent years. This paper aims at handling an important ...
详细信息
ISBN:
(纸本)9783642384608;9783642384592
The strategy using approximate/adaptive dynamicprogramming (ADP) has been widely used to design a learning controller for complex systems of higher dimension in recent years. This paper aims at handling an important problem in the design of ADP learning controllers, which is the improvement of learning algorithm for its convergence performance. We analyze ADP controller implementation framework according to the requirement of tracking control task, with emphasis on providing an improved weight-updating gradient descent approach in optimizing connection weights in network structures. A comparison of the proposed method and classic ADP design for tracking and controlling pitch angle of aircraft is presented. It verifies the feasibility in the design of the proposed ADP based controller.
In this paper, a new algorithm for realization of approximate dynamic programming (ADP) with Gaussian processes (GPs) for continuous-time (CT) nonlinear input-affine systems is proposed to infinite horizon optimal con...
详细信息
In this paper, a new algorithm for realization of approximate dynamic programming (ADP) with Gaussian processes (GPs) for continuous-time (CT) nonlinear input-affine systems is proposed to infinite horizon optimal control problems. The convergence for the ADP algorithm is proven based on the assumption of an exact approximation, where both the cost function and the control input converge to their optimal values, that is, the solution to the Hamilton-Jacobi-Bellman (HJB) equation. The approximation errors, however, are unavoidable in almost every case of applications. In order to tackle the problem, the proposed algorithm is derived with the proof of convergence, where the cost function and the control input, which are both approximated, converge to those of the ADP as the number of data points for GPs approaches infinity. A numerical simulation demonstrates the effectiveness of the proposed algorithm. Copyright (C) 2020 The Authors.
暂无评论