We address a lessmodel-based dynamic routing problem arising from home parcel pick-up service, where lessmodel-based means existing customers who dynamically request services independently following Poisson process wi...
详细信息
We address a lessmodel-based dynamic routing problem arising from home parcel pick-up service, where lessmodel-based means existing customers who dynamically request services independently following Poisson process with a stochastic rate. Overall, through an extended application of re-optimization (RO) strategy, a Markov decision process formulation and approximation dynamicprogramming- and Bayes' theorem-based solution approaches are proposed. Specifically, first a pool of basic policies corresponding to all possible values of the rate are developed offline via approximate value iteration. Then, Bayes' theorem-based sequential learning is designed that can sequentially update the belief about the rate's probability distribution over its possible values. Third, coupled with the updated belief, basic policies are collectively implemented in two different ways, resulting in RO approaches which involve constructions of two different online policies, i.e. a belief-weighted deterministic policy and a belief-based random policy, and their re-optimizations at decision epochs. In the numerical study, through comparison with model-based (i.e. using full knowledge of the rate) and model-free heuristics, our approaches are examined and valuable insights are obtained. Important insights include that (i) the belief-weighted deterministic policy outperforms the belief-based random policy, and further, (ii) the former is better than the latter at preserving the improvement resulting from the improved model-based policy.
We address the combined problem of supplier (or vendor) selection and ordering decision when a buyer can choose to procure from multiple suppliers whose yields are uncertain and potentially correlated. We model this p...
详细信息
We address the combined problem of supplier (or vendor) selection and ordering decision when a buyer can choose to procure from multiple suppliers whose yields are uncertain and potentially correlated. We model this problem as a stochastic program with recourse in which the buyer purchases from the suppliers in the first period and, if needed, chooses to purchase from the spot market or from the suppliers with excess supply, whichever is beneficial, in the second period in order to meet the target procurement quantity. We solve the above problem using sample average approximation (SAA) technique that enables us to solve the problem easily in practice. We compare the performance of our solution with the certainty equivalent problem, which is practiced widely and which we use as the benchmark, to evaluate the efficacy of our approach. Next, we extend our model to incorpo-rate buyer's risk aversion with respect to the quantity procured. We reformulate the multi-sourcing problem as a mixed integer linear program (MILP) and adopt a statistical approach to account for buyer's risk aversion. Thus, we design a simple computational technique that provides an optimal sourcing policy from a set of suppliers when each supplier's yield is uncertain with a generic probability distribution.
This paper introduces a new concept called a Virtual Generator (VG). VGs are simplified representations of groups of coherent synchronous generators in a power system. They resemble commonly used power system dynamic ...
详细信息
ISBN:
(纸本)9781467327275
This paper introduces a new concept called a Virtual Generator (VG). VGs are simplified representations of groups of coherent synchronous generators in a power system. They resemble commonly used power system dynamic equivalents obtained via generator aggregation techniques. Traditionally power system dynamic equivalents are developed offline, fixed, and used to replace large portions of the system that are considered external to the portion of the system being analyzed in detail. In contrast, VGs are calculated online, are not limited to representing external areas of the system being analyzed/controlled, and do not replace any portion of the power system. Instead, they allow wide-area damping controllers (WADCs) to exploit the realization that a group of coherent synchronous generators in a power system can be controlled as a single generating unit for achieving wide-area damping control objectives. The implementation of VGs is made possible by the availability of Wide-Area Measurements (WAMs) from Phasor Measurement Units (PMUs). To the authors' knowledge, this is the first time that the use of power system equivalencing techniques has been extended to real-time WADC. Simulation studies carried out on the 68-bus New England/New York power system demonstrate that intelligent controllers developed using VGs can significantly improve the stability of a power system by effectively damping low-frequency interarea oscillations.
This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision pr...
详细信息
This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.
Satisficing is an efficient strategy for applying existing knowledge in a complex, constrained, environment. We present a set of agent-based simulations that demonstrate a higher payoff for satisficing strategies than...
详细信息
ISBN:
(纸本)9781467327428
Satisficing is an efficient strategy for applying existing knowledge in a complex, constrained, environment. We present a set of agent-based simulations that demonstrate a higher payoff for satisficing strategies than for exploring strategies when using approximate dynamic programming methods for learning complex environments. In our constrained learning environment, satisficing agents outperformed exploring agent by approximately six percent, in terms of the number of tasks completed.
The adaptive dynamicprogramming(ADP) approach is employed to design an optimal controller for unknown discrete-time nonlinear systems with control ***,a neural network is constructed to identify the unknown dynamical...
详细信息
The adaptive dynamicprogramming(ADP) approach is employed to design an optimal controller for unknown discrete-time nonlinear systems with control ***,a neural network is constructed to identify the unknown dynamical system with stability ***,the iterative ADP algorithm is developed to solve the optimal control problem with convergence ***,two other neural networks are introduced to approximate the cost function and its derivative and the control law,under the framework of globalized dual heuristic programming ***,two simulation examples are included to verify the theoretical results.
Intra-day economic dispatch of an integrated microgrid is a fundamental requirement to integrate distributed *** dynamic energy flows in cogeneration units present challenges to the energy management of the *** this p...
详细信息
Intra-day economic dispatch of an integrated microgrid is a fundamental requirement to integrate distributed *** dynamic energy flows in cogeneration units present challenges to the energy management of the *** this paper,a novel approximate dynamic programming(ADP) approach is proposed to solve this problem based on value function approximation,which is distinct with the consideration of the dynamic process constraints of the combined-cycle gas turbine(CCGT) ***,we mathematically formulate the multi-time periods decision problem as a finite-horizon Markov decision *** deal with the thermodynamic process,an augmented state vector of CCGT is ***,the proposed VFA-ADP algorithm is employed to derive the near-optimal real-time operation *** addition,to guarantee the monotonicity of piecewise linear function,we apply the SPAR algorithm in the update *** validate the effectiveness of the proposed method,we conduct experiments with comparisons to some traditional optimization *** results indicate that our proposed ADP method achieves better performance on the economic dispatch of the microgrid.
As an important class of approximate dynamic programming, the direct heuristic dynamicprogramming (DHDP) is discussed in this *** performs well due to its model-free online learning *** the classical DHDP is implemen...
详细信息
As an important class of approximate dynamic programming, the direct heuristic dynamicprogramming (DHDP) is discussed in this *** performs well due to its model-free online learning *** the classical DHDP is implemented with gradient-based adaptation learning algorithm of neural network, in this paper we present a design strategy of DHDP with a novel hybrid estimation of distribution algorithm for online learning and control, and the proposed design optimization method achieves the weight training of neural networks with faster convergence *** proposed approach can be viewed as an improvement for *** simulation is conducted on a practical system plant to test the online learning performance by using our ***, the simulation results show the effectiveness of our approach.
This paper proposes a reinforcement learning (RL) algorithm based on approximate dynamic programming to optimally auto-tune a Proportional Integral Derivative (PID) controller by solving an infinite-horizon optimal tr...
详细信息
ISBN:
(纸本)9781467386838
This paper proposes a reinforcement learning (RL) algorithm based on approximate dynamic programming to optimally auto-tune a Proportional Integral Derivative (PID) controller by solving an infinite-horizon optimal tracking control problem for a special class of linear systems. The algorithm is based on an actor/critic framework where a critic approximator is used to learn the optimal cost and an actor approximator is used to learn the optimal PID gains. The adaptive control nature of the algorithm requires a persistence of excitation condition to be a-priori validated, but this can be relaxed by using previously stored data concurrently with current data in the tuning of the critic approximator. Simulation results show the effectiveness of the proposed approach for a stirred-tank plant reactor.
This paper investigates the properties of integral value iteration (I-VI) which is one of the reinforcement learning (RL) technique for solving online the continuous-time (CT) optimal control problems without using th...
详细信息
ISBN:
(纸本)9781479901777
This paper investigates the properties of integral value iteration (I-VI) which is one of the reinforcement learning (RL) technique for solving online the continuous-time (CT) optimal control problems without using the system drift dynamics. The target I-VI is the one applied to CT linear quadratic regulation problems. As a result, two modes of global monotone convergence of I-VI are presented. One behaves like policy iteration (PI) (PI-mode of convergence) and the other is named VI-mode of convergence. All of the other properties-positive definiteness, stability, and relation between I-VI and integral PI-are presented within these two frameworks. Finally, numerical simulations are carried out to verify and further investigate these properties.
暂无评论