This work is concerned with the optimal allocation of limited maintenance resources among a collection of competing multi-state systems, and the dynamic of each multi-state system is modelled by a Markov chain. Determ...
详细信息
This work is concerned with the optimal allocation of limited maintenance resources among a collection of competing multi-state systems, and the dynamic of each multi-state system is modelled by a Markov chain. Determining the optimal dynamic maintenance policy is prohibitively difficult, and hence we propose a heuristic dynamic maintenance policy in which maintenance resources are allocated to systems with higher importance. The importance measure is well justified by the idea of subsidy, yet the computation is expensive. Hence, we further propose two modifications of the importance measure, resulting in two modified heuristic policies. The performance of the two modified heuristics is evaluated in a systematic computational study, showing exceptional competence.
We investigate the problem of learning efficient policy for an infinite-horizon, discounted cost, Markov decision process (MDP) with a large number of states. We compute the actions of a policy that is nearly as good ...
详细信息
ISBN:
(纸本)9781728113982
We investigate the problem of learning efficient policy for an infinite-horizon, discounted cost, Markov decision process (MDP) with a large number of states. We compute the actions of a policy that is nearly as good as a policy chosen by a suitable oracle from a given mixture policy class characterized by the convex hull of a set of base policies. To learn the coefficients of the mixture model, we recast the problem as an approximate linear programming (ALP) formulation for MDPs, where the feature vectors correspond to the occupation measures of base policies on the state-action space. We then propose a projection-free stochastic primal-dual method with Bregman divergence to solve the characterized ALP. Furthermore, we analyze the efficiency of the proposed stochastic algorithm, namely the number of rounds required to achieve near optimal objective value. Numerical results show that the proposed primal-dual algorithm achieves better efficiency and lower variance across different trials compared to the penalty function method.
We formulate a discrete time Markov decision process for a resource assignment problem for multi-skilled resources with a hierarchical skill structure to minimize the average penalty and waiting costs for jobs with di...
详细信息
We formulate a discrete time Markov decision process for a resource assignment problem for multi-skilled resources with a hierarchical skill structure to minimize the average penalty and waiting costs for jobs with different waiting costs and uncertain service times. In contrast to most queueing models, our application leads to service times that are known before the job is actually served but only after it is accepted and assigned to a server. We formulate the corresponding Markov decision process, which is intractable for problems of realistic size due to the curse of dimensionality. Using an affine approximation of the bias function, we develop a simple linear program that yields a lower bound for the minimum average costs. We suggest how the solution of the linear program can be used in a simple heuristic and illustrate its performance in numerical examples and a case study.
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are l...
详细信息
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and approximate linear programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all
A weakness of classical Markov decision processes (MDPs) is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure t...
详细信息
A weakness of classical Markov decision processes (MDPs) is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in general, solutions to factored MDPs do not retain the structure and compactness of the problem representation, forcing approximate solutions, with approximate linear programming (ALP) emerging as a promising MDP-approximation technique. To date, most ALP work has focused on the primal-LP formulation, while the dual LP, which forms the basis for solving constrained Markov problems, has received much less attention. We show that a straightforward linear approximation of the dual optimization variables is problematic, because some of the required computations cannot be carried out efficiently. Nonetheless, we develop a composite approach that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP), leading to a formulation that is computationally feasible and suitable for solving constrained MDPs. We empirically show that this new ALP formulation also performs well on unconstrained problems.
暂无评论