We propose a comprehensive framework for policy gradient methods tailored to continuous time reinforcement learning. This is based on the connection between stochastic control problems and randomised problems, enablin...
详细信息
We propose a comprehensive framework for policy gradient methods tailored to continuous time reinforcement learning. This is based on the connection between stochastic control problems and randomised problems, enabling applications across various classes of Markovian continuous time control problems, beyond diffusion models, including e.g. regular, impulse and optimal stopping/switching problems. By utilizing change of measure in the control randomisation technique, we derive a new policy gradient representation for these randomised problems, featuring parametrised intensity policies. We further develop actor-critic algorithms specifically designed to address general Markovian stochastic control issues. Our framework is demonstrated through its application to optimal switching problems, with two numerical case studies in the energy sector focusing on real options.
In this paper, we introduce a non -linear Snell envelope which at each time represents the maximal value that can be achieved by stopping a BSDE with constrained jumps. We establish the existence of the Snell envelope...
详细信息
In this paper, we introduce a non -linear Snell envelope which at each time represents the maximal value that can be achieved by stopping a BSDE with constrained jumps. We establish the existence of the Snell envelope by employing a penalization technique and the primary challenge we encounter is demonstrating the regularity of the limit for the scheme. Additionally, we relate the Snell envelope to a finite horizon, zero -sum stochastic differential game, where one player controls a path -dependent stochastic system by invoking impulses, while the opponent is given the opportunity to stop the game prematurely. Importantly, by developing new techniques within the realm of control randomization, we demonstrate that the value of the game exists and is precisely characterized by our non -linear Snell envelope.
Mining operations are affected by significant uncertainty in commodity prices, combined with geological uncertainties (both in quantity and quality of the available reserves). Technical difficulties and costs associat...
详细信息
ISBN:
(纸本)9780987214355
Mining operations are affected by significant uncertainty in commodity prices, combined with geological uncertainties (both in quantity and quality of the available reserves). Technical difficulties and costs associated with ore extraction together with a highly uncertain environment present significant risks for profitability of mineral projects. Optimising operating strategies in response to changing market conditions and information about the available reserves is crucial for project profitability in the face of uncertainty. A natural resource extraction problem can be viewed as a stochastic optimal control (real options) problem, with extraction rate representing a control variable. In a finite horizon, finite reserve setting, an additional complexity arises from the need to consider a large number of feasible remaining reserve levels, which significantly increases the computational complexity of the algorithms. Extraction of a natural resource problems have attracted the attention of researchers in the fields of real options and stochastic optimal control since the 1980s. However, there is still no computational framework available that would allow realistic high-dimensional real options problems in minerals industry to be solved. Over the last decade, the approach based on value function approximation via basis functions has attracted significant attention from financial applications, and has given rise to a class of methods known as regression Monte Carlo methods. Regression Monte Carlo is a very versatile simulation-based technique. It can deal with a rich description of the mining problem, and very elaborate models for the risk factors. In this paper, we propose to combine several crucial improvements to make the regression Monte Carlo method practical for multi-dimensional models: 1) Firstly, we avoid the discretisation of reserve level by using the control randomization technique. First, the reserve is replaced by a dummy random factor during the forward
暂无评论