We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of these algorithms are developed for finite state and compact action spaces while the other two are for finite state and f...
详细信息
We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of these algorithms are developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Of the former two, one algorithm uses a linear parameterization for the policy, resulting in reduced memory complexity. Convergence analysis is briefly sketched and illustrative numerical experiments with the four algorithms are shown for a problem of flow control in communication networks.
In this paper, we study simulation-based optimization algorithms for solving discrete time optimal stopping problems. Using large deviation theory for the increments of empirical processes, we derive optimal convergen...
详细信息
In this paper, we study simulation-based optimization algorithms for solving discrete time optimal stopping problems. Using large deviation theory for the increments of empirical processes, we derive optimal convergence rates for the value function estimate and show that they cannot be improved in general. The rates derived provide a guide to the choice of the number of simulated paths needed in optimization step, which is crucial for the good performance of any simulation-based optimization algorithm. Finally, we present a numerical example of solving optimal stopping problem arising in finance that illustrates our theoretical findings.
This paper gives the rst rigorous convergence analysis of analogues of Watkins's Q-learning algorithm, applied to average cost control of finite-state Markov chains. We discuss two algorithms which may be viewed a...
详细信息
This paper gives the rst rigorous convergence analysis of analogues of Watkins's Q-learning algorithm, applied to average cost control of finite-state Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recursively computing the value function of the average cost problem the traditional relative value iteration (RVI) algorithm and a recent algorithm of Bertsekas based on the stochastic shortest path (SSP) formulation of the problem. Both synchronous and asynchronous implementations are considered and analyzed using the ODE method. This involves establishing asymptotic stability of associated ODE limits. The SSP algorithm also uses ideas from two-time-scale stochastic approximation.
We propose an evolutionary Markov chain Monte Carlo (eMCMC) framework for optimal design of large-scale monitoring networks. From a Bayesian decision theoretical perspective, the optimal design is the design that maxi...
详细信息
We propose an evolutionary Markov chain Monte Carlo (eMCMC) framework for optimal design of large-scale monitoring networks. From a Bayesian decision theoretical perspective, the optimal design is the design that maximizes the expected utility. In the case of large-scale monitoring networks, the computation of the expected utility involves a very high dimensional integral with respect to future observations and unknown parameters. based on the work by Muller and coauthors, who have developed a clever simulation-based framework for Bayesian optimal design blending MCMC with simulated annealing, we develop an algorithm that simulates a population of Markov chains, each having its own temperature. The different temperatures allow hotter chains to more easily cross valleys and colder chains to rapidly climb hills. The population evolves according to genetic operators such as mutation and crossover, allowing the chains to explore the decision space both locally and globally by exchanging information among chains. As a result, our framework explores the decision space very effectively. We illustrate the power of the methodology we propose with the optimal redesign of a network of monitoring stations for spatiotemporal ground-level ozone in the eastern USA. (C) 2011 Elsevier B.V. All rights reserved.
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an exa...
详细信息
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision processes is cast as a two time Scale stochastic approximation. Convergence analysis, approximation issues and an example are studied.
We consider the problem of joint network coding and packet scheduling for multimedia transmission from the Access Point (AP) to multiple receivers in 802.11 networks. The state of receivers is described by a hidden Ma...
详细信息
ISBN:
(纸本)9781424446513
We consider the problem of joint network coding and packet scheduling for multimedia transmission from the Access Point (AP) to multiple receivers in 802.11 networks. The state of receivers is described by a hidden Markov model and the AP acts as a decision maker which employs a partially observable Markov decision process (POMDP) to optimize the media transmission. Importantly, we introduce a simulation-based dynamic programming algorithm as a solution tool for our POMDP abstract. Our simulation-based algorithm simplifies the modeling process as well as reduces the computational complexity of the solution process. Our simulation results demonstrate that the proposed scheme provides higher performance than the network coding scheme without using optimization techniques and traditional retransmission scheme.
In this paper we consider a method of solving optimal stopping problems in discrete and continuous time based on their dual representation. A novel and generic simulation-based optimization algorithm not involving nes...
详细信息
In this paper we consider a method of solving optimal stopping problems in discrete and continuous time based on their dual representation. A novel and generic simulation-based optimization algorithm not involving nested simulations is proposed and studied. The algorithm involves the optimization of a genuinely penalized dual objective functional over a class of adapted martingales. We prove the convergence of the proposed algorithm and demonstrate its efficiency for optimal stopping problems arising in option pricing.
暂无评论