Calculations of service availability of a High-Availability (HA) cluster are usually based on the assumption of load-independent machine availabilities. In this paper, we study the issues and show how the service avai...
详细信息
Calculations of service availability of a High-Availability (HA) cluster are usually based on the assumption of load-independent machine availabilities. In this paper, we study the issues and show how the service availabilities can be calculated under the assumption that machine availabilities are load dependent. We present a Markov chain analysis to derive the steady-state service availabilities of a load-dependent machine availability HA cluster. We show that with a load-dependent machine availability, the attained service availability is now policy dependent. After formulating the problem as a Markov Decision Process, we proceed to determine the optimal policy to achieve the maximum service availabilities by using the method of policy iteration. Two greedy assignment algorithms are studied: least load and first derivative length (FDL) based, where least load corresponds to some load balancing algorithms. We carry out the analysis and simulations on two cases of load profiles: In the first profile, a single machine has the capacity to host all services in the HA cluster;in the second profile, a single machine does not have enough capacity to host all services. We show that the service availabilities achieved under the first load profile are the same, whereas the service availabilities achieved under the second load profile are different. Since the service availabilities achieved are different in the second load profile, we proceed to investigate how the distribution of service availabilities across the services can be controlled by adjusting the rewards vector.
作者:
Tadic, VBUniv Sheffield
Dept Automat Control & Syst Engn Sheffield S1 3JD S Yorkshire England
The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discoun...
详细信息
The mean-square asymptotic behavior of temporal-difference learning algorithrns with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.
The problem addressed in this study is that of determining how to allocate the workstation processing and buffering capacity in a capacitated re-entrant line to the job instances competing for it, in order to maximize...
详细信息
The problem addressed in this study is that of determining how to allocate the workstation processing and buffering capacity in a capacitated re-entrant line to the job instances competing for it, in order to maximize its long-run/steady-state throughput, while maintaining the logical correctness of the underlying material flow, i.e., deadlock-free operations. An approximation scheme for the optimal policy that is based on neuro-dynamic programming theory is proposed, and its performance is assessed through a numerical experiment. The derived results indicate that the proposed method holds considerable promise for providing a viable, computationally efficient approach to the problem and highlight directions for further investigation.
This paper reviews dynamicprogramming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and neuro-dynamic programming (NDP),...
详细信息
This paper reviews dynamicprogramming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and neuro-dynamic programming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques for dynamicprogramming and identify those most suitable for process control problems. Several open issues are also identified and discussed.
After a brief review of recent developments in the pricing and hedging of American options, this paper modifies the basis function approach to adaptive control and neuro-dynamic programming, and applies it to develop:...
详细信息
After a brief review of recent developments in the pricing and hedging of American options, this paper modifies the basis function approach to adaptive control and neuro-dynamic programming, and applies it to develop: 1) nonparametric pricing formulas for actively traded American options and 2) simulation-based optimization strategies for complex over-the-counter options, whose optimal stopping problems are prohibitively difficult to solve numerically by standard backward induction algorithms because of the curse of dimensionality. An important issue in this approach is the choice of basis functions, for which some guidelines and their underlying theory are provided.
In this paper, we present a simulation-based dynamicprogramming method that learns the 'cost-to-go' function in an iterative manner. The method is intended to combat two important drawbacks of the conventiona...
详细信息
In this paper, we present a simulation-based dynamicprogramming method that learns the 'cost-to-go' function in an iterative manner. The method is intended to combat two important drawbacks of the conventional Model Predictive Control (MPC) formulation, which are the potentially exorbitant online computational requirement and the inability to consider the future interplay between uncertainty and estimation in the optimal control calculation. We use a nonlinear Van de Vusse reactor to investigate the efficacy of the proposed approach and identify further research issues.
In this paper, we present how the approach of neuro-dynamic programming(NDP) can be used to combat two important deficiencies of the conventional Model Predictive Control (MPC) formulation, the sometimes exorbitant on...
详细信息
In this paper, we present how the approach of neuro-dynamic programming(NDP) can be used to combat two important deficiencies of the conventional Model Predictive Control (MPC) formulation, the sometimes exorbitant on-line computational requirement and the inability to consider the evolution of uncertainty in the optimal control calculation. We use a simple Van de Vusse reactor to investigate the feasibility of the proposed approach and identify further research issues.
We use simulation-based approach to find the optimal feeding strategy for cloned invertase expression in Saccharomyces cerevisiae in a fed-batch bioreactor. The optimal strategy maximizes the productivity and minimize...
详细信息
We use simulation-based approach to find the optimal feeding strategy for cloned invertase expression in Saccharomyces cerevisiae in a fed-batch bioreactor. The optimal strategy maximizes the productivity and minimizes the fermentation time. This procedure is motivated from neurodynamicprogramming (NDP) literature. wherein the optimal solution is parameterized in the form of a cost-to-go or profit-to-go functions. The proposed approach uses simulations from a heuristic feeding policy as a starting point to generate the profit-to-go vs state data. An artificial neural network is used to obtain profit-to-go as a function of system state. Iterations of Bellman equation are used to improve the profit function . The profit-to-go function thus obtained, is then implemented in an online controller, which essentially converts infinite horizon problem into an equivalent one-step-ahead problem.
Optimal control of systems with complex nonlinear behaviour such as steady state multiplicity results in a nonlinear optimization problem that needs to be solved online at each sample time. We present an approach base...
详细信息
Optimal control of systems with complex nonlinear behaviour such as steady state multiplicity results in a nonlinear optimization problem that needs to be solved online at each sample time. We present an approach based on simulation, function approximation and evolutionary improvement aimed towards simplifying online optimization. Closed loop data from a suboptimal control law, such as MPC based on successive linearization, are used to obtain an approximation of the 'cost-to-go' function, which is subsequently improved through iterations of the Bellman equation. Using this offline-computed cost approximation, an infinite horizon problem is converted to an equivalent single stage problem-substantially reducing the computational burden. This approach is tested on continuous culture of microbes growing on a nutrient medium containing two substrates that exhibits steady state multiplicity. Extrapolation of the cost-to-go function approximator can lead to deterioration of online performance. Some remedies to prevent such problems caused by extrapolation are proposed. Copyright (C) 2003 John Wiley Sons, Ltd.
Markov decision processes (MDPs) may involve three types of delays. First, state information, rather than being available instantaneously, may arrive with a delay (observation delay). Second, an action may take effect...
详细信息
Markov decision processes (MDPs) may involve three types of delays. First, state information, rather than being available instantaneously, may arrive with a delay (observation delay). Second, an action may take effect at a later decision stage rather than immediately (action delay). Third, the cost induced by an action may be collected after a number of stages (cost delay). We derive two results, one for constant and one for random delays, for reducing an MDP with delays to an MDP without delays, which, differs only in the size of the state space. The results are based on the intuition that costs may be collected asynchronously, i.e., at a stage other than the one in which they are induced, as long as they are discounted properly.
暂无评论