approximate dynamic programming method is a combination of neural networks, reinforcement learning, as well as the idea of dynamicprogramming. It is an online control method which bases on actual data rather than a p...
详细信息
ISBN:
(纸本)9781424447947
approximate dynamic programming method is a combination of neural networks, reinforcement learning, as well as the idea of dynamicprogramming. It is an online control method which bases on actual data rather than a precise mathematical model of the system. This method is suitable for the optimal control of nonlinear systems, and can avoid the problem of dimension disaster. It can effectively solve the non-linearity of the plant or the uncertainty problem caused by the uncertainty of the system modeling. So, it is suitable for processing the complex system and task of time-varying. The heating section of the continuous annealing furnace consumes a large number of energy, and the dynamicprogramming method has some limitation for solve the problems. We design the optimization controller for the heating section of the annealing furnace based on the approximate dynamic programming method. In this paper, it mainly gives the basic structure and algorithm of the action-dependent heuristic dynamicprogramming method (ADHDP), and designs the temperature optimization controller of the heating section in the continuous annealing furnace based on the ADHDP method. Simulation shows the temperature controller based on ADHDP has some theoretical and practical significance for the future practical application.
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI)...
详细信息
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI) and operations research (OR) research communities, ADP has recently seen significant activities as an effective method for solving Markov decision processes (MDPs), which represent a type of multi-stage decision problems under uncertainty. Process control problems are similar to MDPs with the key difference being the continuous state and action spaces as opposed to discrete ones. In addition, unlike in other popular ADP application areas like robotics or games, in process control applications first and foremost concern should be on the safety and economics of the on-going operation rather than on efficient learning. We explore different options within ADP design, such as the pre-decision state vs. post-decision state value function, parametric vs. nonparametric value function approximator, batch-mode vs. continuous-mode learning, and exploration vs. robustness. We argue that ADP possesses great potentials, especially for obtaining effective control policies for stochastic constrained nonlinear or linear systems and continually improving them towards optimality. (C) 2010 Elsevier Ltd. All rights reserved.
Three novel approximate dynamic programming algorithms based on the temporal, spatial, and spatiotemporal decomposition are proposed for the economic dispatch problem (EDP) in a distribution energy system with complex...
详细信息
Three novel approximate dynamic programming algorithms based on the temporal, spatial, and spatiotemporal decomposition are proposed for the economic dispatch problem (EDP) in a distribution energy system with complex topology and many non-dispatchable renewable energy sources and energy storage systems (ESS). Computational efficiency of the proposed algorithms is compared and convergence to the optimal solution is shown in numeric experiments on the example of the two-day hourly EDP for the IEEE 33bw test network having 200+ consumers, 150+ energy storages, and 1000+ consuming devices. Copyright (C) 2022 The Authors.
Intermittent electricity generation from renewable sources is characterized by a wide range of fluctuations in frequency spectrum. The medium-frequency component of 0.01 Hz - 1 Hz cannot be filtered out by system iner...
详细信息
ISBN:
(纸本)9781479964154
Intermittent electricity generation from renewable sources is characterized by a wide range of fluctuations in frequency spectrum. The medium-frequency component of 0.01 Hz - 1 Hz cannot be filtered out by system inertia and automatic generation control (AGC) and thus it results in deterioration of frequency quality. In this paper, an approximate dynamic programming (ADP) based supplementary frequency controller for thermal generators is developed to attenuate renewable generation fluctuation in medium-frequency range. A policy iteration based training algorithm is employed for online and model-free learning. Our simulation results demonstrate that the proposed supplementary frequency controller can effectively adapt to changes in the system and provide improved frequency control. Further sensitivity analysis validates that the supplementary frequency controller significantly attenuates the dependence of frequency deviation on the medium-frequency component of renewable generation fluctuation.
The United States Air Force (USAF) makes officer accession and promotion decisions annually. Optimal manpower planning of the commissioned officer corps is vital to ensuring a well-balanced manpower system. A manpower...
详细信息
The United States Air Force (USAF) makes officer accession and promotion decisions annually. Optimal manpower planning of the commissioned officer corps is vital to ensuring a well-balanced manpower system. A manpower system that is neither over-manned nor under-manned is desirable as it is most cost effective. The Air Force Officer Manpower Planning Problem (AFO-MPP) is introduced, which models officer accessions, promotions, and the uncertainty in retention rates. The objective for the AFO-MPP is to identify the policy for accession and promotion decisions that minimizes expected total discounted cost of maintaining the required number of officers in the system over an infinite time horizon. The AFO-MPP is formulated as an infinite-horizon Markov decision problem, and a policy is found using approximate dynamic programming. A least-squares temporal differencing (LSTD) algorithm is employed to determine the best approximate policies. Six computational experiments are conducted with varying retention rates and officer manning starting conditions. The policies determined by the LSTD algorithm are compared to the benchmark policy, which is the policy currently practiced by the USAF. Results indicate that when the manpower system is in a starting state with on-target numbers of officers per rank, the ADP policy outperforms the benchmark policy. When the starting state is unbalanced, with more officers in junior ranking positions, the benchmark policy outperforms the ADP policy. When the starting state is unbalanced, with more officers in senior ranking positions, there is not statistical difference between the ADP and benchmark policy. In this starting state, ADP policy has smaller variance, indicating the ADP policy is more dependable than the benchmark policy.
Multistate stochastic programs pose some of the more challenging optimization problems. Because such models can become rather intractable in general, it is important to design algorithms that can provide approximation...
详细信息
Multistate stochastic programs pose some of the more challenging optimization problems. Because such models can become rather intractable in general, it is important to design algorithms that can provide approximations which, in the long run, yield solutions that are arbitrarily close to an optimum. In this paper, we propose such a sequential sampling method which is applicable to multistage stochastic linear programs, and we refer to it as the multistage stochastic decomposition (MSD) algorithm. This algorithm represents a dynamic extension of a regularized version of stochastic decomposition (SD). While the method allows general correlation structures, specialized streamlined versions are also possible for special cases of stagewise independent and autoregressive processes commonly incorporated in stochastic programming. As with its two-stage counterpart, the MSD algorithm is shown to provide an asymptotically optimal solution, with probability one. As a by-product of this study, we also show that SD algorithms draw upon features of both approximate dynamic programming as well as stochastic programming.
Internet Service Providers (ISPs) have the ability to route their traffic over different network providers. This study investigates the optimal routing strategy under multihoming in the case where network providers ch...
详细信息
Internet Service Providers (ISPs) have the ability to route their traffic over different network providers. This study investigates the optimal routing strategy under multihoming in the case where network providers charge ISPs according to top-percentile pricing (i.e. based on the theta th highest volume of traffic shipped). We call this problem the Top-percentile Traffic Routing Problem (TpTRP). The TpTRP is a multistage stochastic optimization problem. Routing decision for every time period should be made before knowing the amount of traffic that is to be sent. The stochastic nature of the problem forms the critical difficulty of this study. Solution approaches based on Stochastic Integer programming or Stochastic dynamicprogramming (SDP) suffer from the curse of dimensionality, which restricts their applicability. To overcome this, we suggest to use approximate dynamic programming, which exploits the structure of the problem to construct continuous approximations of the value functions in SDP. Thus, the curse of dimensionality is largely avoided.
Growing penetration of renewable distributed generation, a major concern nowadays, has played a critical role in distribution system operation. This paper develops a state-based sequential network reconfiguration stra...
详细信息
Growing penetration of renewable distributed generation, a major concern nowadays, has played a critical role in distribution system operation. This paper develops a state-based sequential network reconfiguration strategy by using a Markov decision process (MDP) model with the objective of minimizing renewable distributed generation curtailment and load shedding under operational constraints. Available power outputs of distributed generators and the system topology in each decision time are represented as Markov states, which are driven to other Markov states in next decision time in consideration of uncertainties of renewable distributed generation. For each Markov state in each decision time, a recursive optimization model with a current cost and a future cost is developed to make state-based actions, including system reconfiguration, load shedding, and distributed generation curtailment. To address the curse of dimensionality caused by enormous states and actions in the proposed model, an approximate dynamic programming (ADP) approach, including post-decision states and forward dynamic algorithm, is used to solve the proposed MDP-based model. IEEE 33-bus system and IEEE 123-bus system are used to validate the proposed model.
This paper focuses on economical operation of a microgrid (MG) in real-time. A novel dynamic energy management system is developed to incorporate efficient management of energy storage system into MG real-time dispatc...
详细信息
This paper focuses on economical operation of a microgrid (MG) in real-time. A novel dynamic energy management system is developed to incorporate efficient management of energy storage system into MG real-time dispatch while considering power flow constraints and uncertainties in load, renewable generation and real-time electricity price. The developed dynamic energy management mechanism does not require long-term forecast and optimization or distribution knowledge of the uncertainty, but can still optimize the long-term operational costs of MGs. First, the real-time scheduling problem is modeled as a finite-horizon Markov decision process over a day. Then, approximate dynamic programming and deep recurrent neural network learning are employed to derive a near optimal real-time scheduling policy. Last, using real power grid data from California independent system operator, a detailed simulation study is carried out to validate the effectiveness of the proposed method.
暂无评论