This paper proposes a multi-dimensional approximate dynamic programming (ADP) algorithm for the real-time schedule of integrated heat and power system (IHPS) with battery and heat storage tank (HST). The multi-time pe...
详细信息
This paper proposes a multi-dimensional approximate dynamic programming (ADP) algorithm for the real-time schedule of integrated heat and power system (IHPS) with battery and heat storage tank (HST). The multi-time period optimization problem is reformulated under the Markov Decision Process. The high dimensional state variables are aggregated into the state of charge (SOC) of battery and overall available heat (OAH) of HST to reduce the computation of value function approximation (VFA) while ensuring approximate accuracy. Under sufficient training by uncertainty scenarios of wind power, electricity price, electrical and heat load, the approximate value function (AVF) can derive empirical knowledge and help IHPS make decisions to cope with uncertainties. The proposed ADP algorithm can efficiently take advantage of multi-energy integration and provide a near-optimal operation strategy to ensure the economy of IHPS by recursively solving the Bellman's equation. Simulation results compared with existing methods validate the superiority of the proposed algorithm.
One of the most important operations in the production of growing/finishing pigs is the marketing of pigs for slaughter. While pork production can be managed at different levels (animal, pen, section, or herd), it is ...
详细信息
One of the most important operations in the production of growing/finishing pigs is the marketing of pigs for slaughter. While pork production can be managed at different levels (animal, pen, section, or herd), it is beneficial to consider the herd level when determining the optimal marketing policy due to inter-dependencies, such as those created by fixed transportation costs and cross-level constraints. In this paper, we consider sequential marketing decisions at herd level. A high-dimensional infinite horizon Markov decision process (MDP) is formulated which, due to the curse of dimensionality, cannot be solved using standard MDP optimization techniques. Instead, approximate dynamic programming (ADP) is applied to solve the model and find the best marketing policy at herd level. Under the total expected discounted reward criterion, the proposed ADP approach is first compared with a standard solution algorithm for solving an MDP at pen level to show the accuracy of the solution procedure. Next, numerical experiments at herd level are given to confirm how the marketing policy adapts itself to varying costs (e.g., transportation cost) and cross-level constraints. Finally, a sensitivity analysis for some parameters in the model is conducted and the marketing policy found by ADP is compared with other well-known marketing polices, often applied at herd level. (C) 2019 Elsevier B.V. All rights reserved.
Sequential resource allocation decision-making for the military medical evacuation of wartime casualties consists of identifying which available aeromedical evacuation (MEDEVAC) assets to dispatch in response to each ...
详细信息
Sequential resource allocation decision-making for the military medical evacuation of wartime casualties consists of identifying which available aeromedical evacuation (MEDEVAC) assets to dispatch in response to each casualty event. These sequential decisions are complicated due to uncertainty in casualty demand (i.e., severity, number, and location) and service times. In this research, we present a Markov decision process model solved using a hierarchical aggregation value function approximation scheme within an approximate policy iteration algorithmic framework. The model seeks to optimize this sequential resource allocation decision under uncertainty of how to best dispatch MEDEVAC assets to calls for service. The policies determined via our approximate dynamic programming (ADP) approach are compared to optimal military MEDEVAC dispatching policies for two small-scale problem instances and are compared to a closest-available MEDEVAC dispatching policy that is typically implemented in practice for a large-scale problem instance. Results indicate that our proposed approximation scheme provides high-quality, scalable dispatching policies that are more easily employed by military medical planners in the field. The identified ADP policies attain 99.8% and 99.5% optimal for the 6- and 12-zone problem instances investigated, as well as 9.6%, 9.2%, and 12.4% improvement over the closest-MEDEVAC policy for the 6-, 12-, and 34-zone problem instances investigated. Published by Elsevier Ltd.
approximate dynamic programming (ADP) faces challenges in dealing with constraints in control problems. Model predictive control (MPC) is, in comparison, well-known for its accommodation of constraints and stability g...
详细信息
approximate dynamic programming (ADP) faces challenges in dealing with constraints in control problems. Model predictive control (MPC) is, in comparison, well-known for its accommodation of constraints and stability guarantees, although its computation is sometimes prohibitive. This paper introduces an approach combining the two methodologies to overcome their individual limitations. The predictive control law for constrained linear quadratic regulation (CLQR) problems has been proven to be piecewise affine (PWA) while the value function is piecewise quadratic. We exploit these formal results from MPC to design an ADP method for CLQR problems with a known model. A novel convex and piecewise quadratic neural network with a local-global architecture is proposed to provide an accurate approximation of the value function, which is used as the cost-to-go function in the online dynamicprogramming problem. An efficient decomposition algorithm is developed to generate the control policy and speed up the online computation. Rigorous stability analysis of the closed-loop system is conducted for the proposed control scheme under the condition that a good approximation of the value function is achieved. Comparative simulations are carried out to demonstrate the potential of the proposed method in terms of online computation and optimality.(c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
This paper deals with the dynamic advance scheduling of elective surgeries with multiple sources of uncertainties taken into consideration. A waiting list is established to facilitate the management of elective patien...
详细信息
We consider the use of quadratic approximate value functions for stochastic control problems with input-affine dynamics and convex stage cost and constraints. Evaluating the approximate dynamic programming policy in s...
详细信息
We consider the use of quadratic approximate value functions for stochastic control problems with input-affine dynamics and convex stage cost and constraints. Evaluating the approximate dynamic programming policy in such cases requires the solution of an explicit convex optimization problem, such as a quadratic program, which can be carried out efficiently. We describe a simple and general method for approximate value iteration that also relies on our ability to solve convex optimization problems, in this case, typically a semidefinite program. Although we have no theoretical guarantee on the performance attained using our method, we observe that very good performance can be obtained in *** (c) 2012 John Wiley & Sons, Ltd.
Due to unexpected demand surge and supply disruptions, road traffic conditions could exhibit substantial uncertainty, which often makes bus travelers encounter start delays of service trips and substantially degrades ...
详细信息
Due to unexpected demand surge and supply disruptions, road traffic conditions could exhibit substantial uncertainty, which often makes bus travelers encounter start delays of service trips and substantially degrades the performance of an urban transit system. Meanwhile, rapid advances of information and communication technologies have presented tremendous opportunities for intelligently scheduling a bus fleet. With the full consideration of delay propagation effects, this paper is devoted to formulating the stochastic dynamic vehicle scheduling problem, which dynamically schedules an urban bus fleet to tackle the trip time stochasticity, reduce the delay and minimize the total costs of a transit system. To address the challenge of "curse of dimensionality", we adopt an approximate dynamic programming approach (ADP) where the value function is approximated through a three-layer feed-forward neural network so that we are capable of stepping forward to make decisions and solving the Bellman's equation through sequentially solving multiple mixed integer linear programs. Numerical examples based on the realistic operations dataset of bus lines in Beijing have demonstrated that the proposed neural network -based ADP approach not only exhibits a good learning behavior but also significantly outperforms both myopic and static polices, especially when trip time stochasticity is high.
The objective of infrastructure management is to provide optimal maintenance, rehabilitation and replacement (MR&R) policies for a system of facilities over a planning horizon. While most approaches in the literat...
详细信息
The objective of infrastructure management is to provide optimal maintenance, rehabilitation and replacement (MR&R) policies for a system of facilities over a planning horizon. While most approaches in the literature have studied the decision-making process as a finite resource allocation problem, the impact of construction activities on the road network is often not accounted for. The state-of-the-art Markov decision process (MDP)-based optimization approaches in infrastructure management, while optimal for solving budget allocation problems, become internally inconsistent upon introducing network constraints. In comparison, approximate dynamic programming (ADP) enables solving complex problem formulations by using simulation techniques and lower dimension value function approximations. In this paper, an ADP framework is proposed, wherein capacity losses due to construction activities are subjected to an agency-defined network capacity threshold. A parametric study is conducted on a stylized network configuration to infer the impact of network-based constraints on the decision-making process. (C) 2013 Elsevier Ltd. All rights reserved.
Eco-driving control generates significant energy-saving potential in car-following scenarios. However, the influence of preceding vehicle may impose unnecessary velocity waves and deteriorate fuel economy. In this res...
详细信息
Eco-driving control generates significant energy-saving potential in car-following scenarios. However, the influence of preceding vehicle may impose unnecessary velocity waves and deteriorate fuel economy. In this research, a learning-based method is exploited to achieve satisfied fuel economy for connected plug-in hybrid electric vehicles (PHEVs) with the advantage of vehicle-to-vehicle communication system. A data-driven energy consumption model is leveraged to generate reinforcement signals for approximate dynamic programming (ADP) with the consideration of nonlinear efficiency characteristics of hybrid powertrain system. An advanced ADP scheme is designed for connected PHEVs driving in car-following scenarios. In addition, the cooperative information is incorporated to further improve the fuel economy of the vehicle under the premise of driving safety. The proposed method is mode-free and showcases acceptable computational efficiency as well as adaptability. The simulation results demonstrate that the fuel economy during car-following processes is remarkably improved through cooperative driving information, thereby partially paving the theoretical basis for energy-saving transportation.
Energy allocation in iron and steel industry is the assignment of available energy to various production users. With the increasing price of energy, a perfect allocation plan should ensure that nothing gets wasted and...
详细信息
Energy allocation in iron and steel industry is the assignment of available energy to various production users. With the increasing price of energy, a perfect allocation plan should ensure that nothing gets wasted and no shortage. This is challenging because the energy demand is dynamic due to the changes of orders, production environment, technological level, etc. This paper try to realize on-line energy resources allocation under the situation of dynamic production plan and environment based on typical energy consumption process of steel enterprises. Without definite analytical model, it is a tough task to make the energy allocation plan tracks the dynamic change of production environment in real time. This paper proposes to deal with dynamic energy allocation problem by interactive learning with time-varying environment using approximate dynamic programming method. The problem is formulated as a dynamic model with variable right-hand items, which is an updated energy demand obtained by on-line learning. Reinforcement learning method is designed to learn the energy consumption principle from the historical data to predict energy consumption level corresponding to current production environment and the production plan in future horizon. Using the prediction results, on-line energy allocation plan is made and its performance is demonstrated by comparison with static allocation method.
暂无评论