Real-time surgery management involves a complex and dynamic decision-making process. The duration of surgeries in many cases cannot be known until the surgery has actually been completed. Furthermore, disruptions such...
详细信息
Real-time surgery management involves a complex and dynamic decision-making process. The duration of surgeries in many cases cannot be known until the surgery has actually been completed. Furthermore, disruptions such as equipment failure or the arrival of a non-elective surgery can occur simultaneously. Thus, the assignment of surgeries needs to be updated, as and when disruptions occur, to minimize their effects. In this paper, we present a stochastic dynamicprogramming approach to the surgery allocation problem with multiple operating rooms under uncertainty. Given an elective list for the day, the dy-namic optimization model minimizes the number of surgeries not carried out by the end of the shift and the total waiting times of patients during the day weighted according to their urgency level. Due to the curse of dimensionality, we apply an approximate dynamic programming algorithm to solve the stochastic dynamic surgery management model. Computational experiments are designed to demonstrate the performance of the proposed algorithm and its applicability to practical settings. The results show that the approximate dynamic programming algorithm provides a good approximation to the optimum policy and leads to some managerial insights. (c) 2023 Elsevier B.V. All rights reserved.
Solving optimal control problems serves as the basic demand of industrial control *** methods like model predictive control often suffer from heavy online computational *** learning has shown promise in computer and b...
详细信息
Solving optimal control problems serves as the basic demand of industrial control *** methods like model predictive control often suffer from heavy online computational *** learning has shown promise in computer and board games but has yet to be widely adopted in industrial applications due to a lack of accessible,high-accuracy *** Reinforcement learning(RL)solvers are often developed for academic research and require a significant amount of theoretical knowledge and programming ***,many of them only support Python-based environments and limit to model-free *** address this gap,this paper develops General Optimal control Problems Solver(GOPS),an easy-to-use RL solver package that aims to build real-time and high-performance controllers in industrial *** is built with a highly modular structure that retains a flexible framework for secondary *** the diversity of industrial control tasks,GOPS also includes a conversion tool that allows for the use of Matlab/Simulink to support environment construction,controller design,and performance *** handle large-scale problems,GOPS can automatically create various serial and parallel trainers by flexibly combining embedded buffers and *** offers a variety of common approximate functions for policy and value functions,including polynomial,multilayer perceptron,convolutional neural network,***,constrained and robust algorithms for special industrial control systems with state constraints and model uncertainties are also integrated into *** examples,including linear quadratic control,inverted double pendulum,vehicle tracking,humanoid robot,obstacle avoidance,and active suspension control,are tested to verify the performances of GOPS.
The goals for increased patient access and fast fulfillment have motivated considerable interest in autologous cell therapy manufacturing networks having multiple and geographically distributed manufacturing facilitie...
详细信息
The goals for increased patient access and fast fulfillment have motivated considerable interest in autologous cell therapy manufacturing networks having multiple and geographically distributed manufacturing facilities. However, the cost of safety manufacturing capacity to mitigate supplier disruption risk-a significant risk in the emerging cell manufacturing industry-would be lower if manufacturing is centralized. In this paper, we analyze a decentralized network that has as its objective to minimize the cost of network resilience for mitigating supplier disruption by making use of the fact that bioreactors for autologous therapy manufacturing are small enough to be relocatable. We model this problem as a Markov decision process and develop efficient algorithms that are based on real-time demand data to minimize safety manufacturing capacity and determine how relocatable capacity should be distributed while satisfying resilience constraints. In case studies, based in part on data collected from a Chimeric antigen receptor T cell therapy manufacturing facility at the University of Pennsylvania, we compare decentralized network models with different heuristic algorithms. Results indicate that transshipment in a decentralized network can result in a significant reduction of required safety capacity, reducing the cost of network resilience.
In this paper, we consider the yet-uncharted assortment optimization problem under the exponomial choice model, in which the objective is to determine the revenue-maximizing set of products that should be offered to c...
详细信息
In this paper, we consider the yet-uncharted assortment optimization problem under the exponomial choice model, in which the objective is to determine the revenue-maximizing set of products that should be offered to customers. Ourmain algorithmic contribution comes in the form of a fully polynomial-time approximation scheme, showing that the optimal expected revenue can be efficiently approached within any degree of accuracy. We synthesize several ideas related to approximate dynamic programming, intended to construct a compact discretization of the continuous state space by keeping track of "key statistics" in rounded form and by operating with a suitable bit precision complexity. We complement this result by a number of NP-hardness reductions to natural extensions of this problem. Moreover, we conduct empirical and computational evaluations of the exponomial choice model and our solution method. Focusing on choice models with a simple parametric structure, we provide new empirical evidence that the exponomial choice model can achieve higher predictive accuracy than the multinomial logit (MNL) choice model on several real-world data sets. We uncover that this predictive performance correlates with certain characteristics of the choice instance-namely, the entropy and magnitude of choice probabilities. Finally, we leverage fully ranked preference data to simulate the expected revenue of optimal assortments prescribed using the fitted exponomial and MNL models. On semisynthetic data, the exponomial-based approach can lift revenues by 3%-4% on average against the corresponding MNL benchmark.
Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space ...
详细信息
Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per period is bounded by epsilon, the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as epsilon/(1-gamma), where. is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision objective can be far more robust.
dynamicprogramming(DP) is not a useful tool for solving many control problems because of its complexity in computation. In this paper,we propose approximate dynamic programming(ADP) optimal control strategy for ship ...
详细信息
ISBN:
(纸本)9781479900305
dynamicprogramming(DP) is not a useful tool for solving many control problems because of its complexity in computation. In this paper,we propose approximate dynamic programming(ADP) optimal control strategy for ship course trajectory tracking control *** system transformation,we convert the optimal tracking problem into designing a infinite-horizon optimal regulator for the tracking error ***-dependent Heuristic dynamicprogramming(ADHDP) technique,as one form of ADR is presented to obtain the infinite-horizon optimal tracking *** the ship course optimal tracking control simulation results,we can see that the ADHDP controller makes the performance index and the control sequence for the error dynamics converge to the optimal *** BP neural networks are used as parametric structures to implement ADHDP *** two neural networks aim at approximating the cost function and the control law,respectively.
With the advancement in computing power and data science techniques, reinforcement learning (RL) has emerged as a powerful tool for decision-making problems in complex systems. In recent years, the research on RL for ...
详细信息
With the advancement in computing power and data science techniques, reinforcement learning (RL) has emerged as a powerful tool for decision-making problems in complex systems. In recent years, the research on RL for healthcare operations has grown rapidly. Especially during the COVID-19 pandemic, RL has played a critical role in optimizing decisions with greater degrees of uncertainty. RL for healthcare applications has been an exciting topic across multiple disciplines, including operations research, operations management, healthcare systems engineering, and data science. This review paper first provides a tutorial on the overall framework of RL, including its key components, training models, and approximators. Then, we present the recent advances of RL in the domain of healthcare operations management (HOM) and analyze the current trends. Our paper concludes by presenting existing challenges and future directions for RL in HOM.
Revenue management (RM) plays a vital role to optimize sales processes in real-life applications under incomplete information. The prediction of consumer demand and the anticipation of price reactions of competitors b...
详细信息
Revenue management (RM) plays a vital role to optimize sales processes in real-life applications under incomplete information. The prediction of consumer demand and the anticipation of price reactions of competitors became key factors in RM to be able to apply classical dynamicprogramming (DP) methods for expected long-term reward maximization. Modern model-free deep Reinforcement Learning (RL) approaches are able to derive optimized policies without explicit estimations of underlying model dynamics. However, RL algorithms typically require either vast amounts of training data or a suitable synthetic model to be trained on. As existing studies focus on one group of algorithms only, the relation between established DP approaches and new RL techniques is opaque. To address this issue, in this paper, we use a dynamic pricing framework for an airline ticket market to compare state-of-the-art RL algorithms and data-driven versions of classic DP methods regarding (i) performance and (ii) required data to each other. For the DP techniques, we use estimations of market dynamics to be able to compare their performance and data consumption against RL methods. The numerical results of our experiments, which include monopoly as well as duopoly markets, allow to study how the different approaches' performances relate to each other in exemplary settings. In both setups, we find that with few data (about 10 episodes) fitted DP methods were highly competitive;with medium amounts of data (about 100 episodes) DP methods got outperformed by RL, where PPO provided the best results. Given large amounts of training data (about 1000 episodes), the best RL algorithms, i.e., TD3, DDPG, PPO, and SAC, performed similarly achieving about 90% and more of the optimal solution.
Microgrid energy management stands for challenging optimization problem where continuous (economic dispatch) and discrete optimization (unit commitment) tasks are solved. Often Microgrid optimization leads to complex ...
详细信息
ISBN:
(纸本)9781479929849
Microgrid energy management stands for challenging optimization problem where continuous (economic dispatch) and discrete optimization (unit commitment) tasks are solved. Often Microgrid optimization leads to complex problem where optimization methods usually meet curse of dimensionality. We adopt approximate dynamic programming (ADP) as the promising optimization technique which can overcome curse of dimensionality. In this paper, energy management system based on ADP is introduced and its behavior is demonstrated on small scale Microgrid which is connected to distribution network and includes wind turbine, chiller plant, thermal storage and cooling load. The paper describes policy search approach to ADP and selected approximation architectures in the context of energy optimization. The ADP results are compared with the results of the solution based on dynamicprogramming approach.
The strategy using approximate/adaptive dynamicprogramming(ADP) has been widely used to design a learning controller for complex systems of higher dimension in recent *** paper aims at handling an important problem i...
详细信息
The strategy using approximate/adaptive dynamicprogramming(ADP) has been widely used to design a learning controller for complex systems of higher dimension in recent *** paper aims at handling an important problem in the design of ADP learning controllers,which is the improvement of learning algorithm for its convergence *** analyze ADP controller implementation framework according to the requirement of tracking control task,with emphasis on providing an improved weight-updating gradient descent approach in optimizing connection weights in network structures.A comparison of the proposed method and classic ADP design for tracking and controlling pitch angle of aircraft is *** verifies the feasibility in the design of the proposed ADP based controller.
暂无评论