We investigate sensor scheduling for remote estimation when multiple smart sensors monitor multiple stochastic dynamical systems. The sensors transmit their measurements to a remote estimator through a noisy wireless ...
详细信息
We investigate sensor scheduling for remote estimation when multiple smart sensors monitor multiple stochastic dynamical systems. The sensors transmit their measurements to a remote estimator through a noisy wireless communication channel. Such a remote estimator can receive multiple packets simultaneously sent by local sensors. Sensors transmit their measurements if their Signal Interference and Noise Ratio (SINR) is above a threshold. We compute the optimal policy for sensor scheduling by minimizing expected error covariance subject to total signal transmissions from all sensors. We model this problem as Markov Decision Process (MDP) with discounted cost per stage in the finite time horizon framework, then we employ stochastic dynamicprogramming as the optimization method. A novel algorithm based on sampling and machine learning techniques is proposed as the approximation. At each phase of the DP algorithm, samples are collected using a uniform probability distribution. The data is used to feed Neural Network (NN) and Random Forest (RF) models for cost function and policy approximation. The results of the proposed framework are supported by simulation examples comparing RF and NN as approximate DP (ADP). Note that this idea builds a bridge among the recent advances in the area of data science, Machine Learning, and the ADP.
In this paper, we study a parking management problem where an operator manages a publicly owned parking service system with unknown parking demand. Assuming that the operator has perfect information, we first formulat...
详细信息
In this paper, we study a parking management problem where an operator manages a publicly owned parking service system with unknown parking demand. Assuming that the operator has perfect information, we first formulate the operator's problem as a stochastic dynamicprogramming problem, and to overcome the curse of dimensionality, we resort to approximate dynamic programming for solving it. However, in practice, some information that is essential for centralized management is usually privately known, which provides incentives for strategic behaviors of drivers and could lead to suboptimal system performance. We design a two-step mechanism and prove that, in step 1, drivers' choices of whether or not to enter the managed system following the approximate optimal solution satisfy Bayesian-Nash equilibrium (BNE), and in step 2, that truthful reporting is a dominant strategy for all drivers under any circumstance. We investigate the properties of the resulting equilibria, and further modify the mechanism to ensure that the desired approximate system optimum solution is the only resulting BNE. Numerical examples show that the mechanism design not only enhances the average system performance but also increases the system robustness.
This paper analyzes quasi-random sampling techniques for approximate dynamic programming. Specifically, low-discrepancy sequences and lattice point sets are investigated and compared as efficient schemes for uniform s...
详细信息
ISBN:
(纸本)9781467361279
This paper analyzes quasi-random sampling techniques for approximate dynamic programming. Specifically, low-discrepancy sequences and lattice point sets are investigated and compared as efficient schemes for uniform sampling of the state space in high-dimensional settings. The convergence analysis of the approximate solution is provided basing on geometric properties of the two discretization methods. It is also shown that such schemes are able to take advantage of regularities of the value functions, possibly through suitable transformations of the state vector. Simulation results concerning optimal management of a water reservoirs system and inventory control are presented to show the effectiveness of the considered techniques with respect to pure-random sampling.
We formulate an efficient approximation for multi-agent batch reinforcement learning, the approxi-mated multi-agent fitted Q iteration (AMAFQI). We present a detailed derivation of our approach. We propose an iterativ...
详细信息
We formulate an efficient approximation for multi-agent batch reinforcement learning, the approxi-mated multi-agent fitted Q iteration (AMAFQI). We present a detailed derivation of our approach. We propose an iterative policy search and show that it yields a greedy policy with respect to multiple approximations of the centralized, learned Q-function. In each iteration and policy evaluation, AMAFQI requires a number of computations that scales linearly with the number of agents whereas the analogous number of computations increase exponentially for the fitted Q iteration (FQI), a commonly used approaches in batch reinforcement learning. This property of AMAFQI is fundamental for the design of a tractable multi-agent approach. We evaluate the performance of AMAFQI and compare it to FQI in numerical simulations. The simulations illustrate the significant computation time reduction when using AMAFQI instead of FQI in multi-agent problems and corroborate the similar performance of both approaches. & COPY;2023 Elsevier B.V. All rights reserved.
We study a classic problem in revenue management: quantity-based, single resource revenue management with no-shows. In this problem, a firm observes a sequence of T customers requesting a service. Each arrival is draw...
详细信息
We study a classic problem in revenue management: quantity-based, single resource revenue management with no-shows. In this problem, a firm observes a sequence of T customers requesting a service. Each arrival is drawn independently from a known distribution of k different types, and the firm needs to decide irrevocably whether to accept or reject requests in an online fashion. The firm has a capacity of resources B and wants to maximize its profit. Each accepted service request yields a type-dependent revenue and has a type-dependent probability of requiring a resource once all arrivals have occurred (or be a no-show). If the number of accepted arrivals that require a resource at the end of the horizon is greater than B, the firm needs to pay a fixed compensation for each service request that it cannot fulfill. With a clairvoyant that knows all arrivals ahead of time, as a benchmark, we provide an algorithm with a uniform additive loss bound, that is, its expected loss is inde omega(root T) pendent of T. This improves upon prior works achieving omega(T) guarantees.
There has been a paradigm-shift in urban logistic services in the last years;demand for real-time, instant mobility and delivery services grows. This poses new challenges to logistic service providers as the underlyin...
详细信息
There has been a paradigm-shift in urban logistic services in the last years;demand for real-time, instant mobility and delivery services grows. This poses new challenges to logistic service providers as the underlying stochastic dynamic vehicle routing problems (SDVRPs) require anticipatory real-time routing actions. The complexity of finding efficient routing actions is multiplied by the challenge of evaluating such actions with respect to their effectiveness given future dynamism and uncertainty. Reinforcement learning (RL) is a promising tool for evaluating actions but it is not designed for searching the complex and combinatorial action space. Thus, past work on RL for SDVRP has either restricted the action space, that is solving only subproblems by RL and everything else by established heuristics, or focused on problems that reduce to resource allocation problems. For solving real-world SDVRPs, new strategies are required that address the combined challenge of combinatorial, constrained action space and future uncertainty, but as our findings suggest, such strategies are essentially non-existing. Our survey paper shows that past work relied either on action-space restriction or avoided routing actions entirely and highlights opportunities for more holistic solutions.
Intra-day economic dispatch of an integrated microgrid is a fundamental requirement to integrate distributed generators. The dynamic energy flows in cogeneration units present challenges to the energy management of th...
详细信息
ISBN:
(纸本)9781665440899
Intra-day economic dispatch of an integrated microgrid is a fundamental requirement to integrate distributed generators. The dynamic energy flows in cogeneration units present challenges to the energy management of the microgrid. In this paper, a novel approximate dynamic programming (ADP) approach is proposed to solve this problem based on value function approximation, which is distinct with the consideration of the dynamic process constraints of the combined-cycle gas turbine (CCGT) plant. First, we mathematically formulate the multi-time periods decision problem as a finite-horizon Markov decision process. To deal with the thermodynamic process, an augmented state vector of CCGT is introduced. Second, the proposed VFA-ADP algorithm is employed to derive the near-optimal real-time operation strategies. In addition, to guarantee the monotonicity of piecewise linear function, we apply the SPAR algorithm in the update process. To validate the effectiveness of the proposed method, we conduct experiments with comparisons to some traditional optimization methods. The results indicate that our proposed ADP method achieves better performance on the economic dispatch of the microgrid.
In order to optimise bicycle routes across a variety of multiple parameters, including safety, efficiency and subtle rider preferences, this work explores the difficult domain of the Bike Routing Problem (BRP) using a...
详细信息
In order to optimise bicycle routes across a variety of multiple parameters, including safety, efficiency and subtle rider preferences, this work explores the difficult domain of the Bike Routing Problem (BRP) using a sophisticated Simulated Annealing approach. In this innovative structure, a wide range of limitations and inclinations are combined and carefully calibrated to create routes that skillfully meet the varied and changing needs of cyclists. Extensive testing on a dataset representing a range of rider preferences demonstrates the effectiveness of this novel approach, resulting in significant improvements in route selection. This research is a significant resource for urban planners and politicians. Its data-driven solutions and strategic recommendations will help them strengthen bicycle infrastructure, even beyond its immediate applicability in resolving the BRP.
作者:
Diamant, AdamYork Univ
Schulich Sch Business 111 Ian Macdonald Blvd Toronto ON M3J 1P3 Canada
We investigate the scheduling practices of multistage outpatient health programs that offer care plans customized to the needs of their patients. We formulate the scheduling problem as a Markov decision process (MDP) ...
详细信息
We investigate the scheduling practices of multistage outpatient health programs that offer care plans customized to the needs of their patients. We formulate the scheduling problem as a Markov decision process (MDP) where patients can reschedule their appointment, may fail to show up, and may become ineligible. The MDP has an exponentially large state space and thus, we introduce a linear approximation to the value function. We then formulate an approximatedynamic program (ADP) and implement a dual variable aggregation procedure. This reduces the size of the ADP while still producing dual cost estimates that can be used to identify favorable scheduling actions. We use our scheduling model to study the effectiveness of customized-care plans for a heterogeneous patient population and find that system performance is better than clinics that do not offer such plans. We also demonstrate that our scheduling approach improves clinic profitability, increases throughput, and decreases practitioner idleness as compared to a policy that mimics human schedulers and a policy derived from a deep neural network. Finally, we show that our approach is fairly robust to errors introduced when practitioners inadvertently assign patients to the wrong care plan.
In this paper, we design a theoretical framework allowing to apply model predictive control on hybrid systems. For this, we develop a theory of approximate dynamic programming by leveraging the concept of alternating ...
详细信息
In this paper, we design a theoretical framework allowing to apply model predictive control on hybrid systems. For this, we develop a theory of approximate dynamic programming by leveraging the concept of alternating simulation. We show how to combine these notions in a branch and bound algorithm that can further refine the Q-functions using Lagrangian duality. We illustrate the approach on a numerical example.
暂无评论