This article is concerned with the stochastic recursive optimal control problem with mixed delay. The connection between Pontryagin's maximum principle and Bellman's dynamic programming principle is discussed....
详细信息
This article is concerned with the stochastic recursive optimal control problem with mixed delay. The connection between Pontryagin's maximum principle and Bellman's dynamic programming principle is discussed. Without containing any derivatives of the value function, relations among the adjoint processes and the value function are investigated by employing the notions of super- and sub-jets introduced in defining the viscosity solutions. Stochastic verification theorem is also given to verify whether a given admissible control is really optimal.
This paper presents a theoretical framework for the business decision -making process of the power generators as price takers when considering the participation of energy storage. The framework assesses rational valua...
详细信息
This paper presents a theoretical framework for the business decision -making process of the power generators as price takers when considering the participation of energy storage. The framework assesses rational valuation, optimal sales strategies, and hedging options for power plants with and without a gross sales constraint. The valuation and optimal sales strategy problems are analyzed using a risk -neutral pricing approach, dynamic programming principles, and the trinomial tree model suitable for the regime switching model. The formulation of a price risk hedging scheme flexible and widely used over-the-counter electricity derivative, the electricity contract for difference, as a tool for hedging electricity spot price risk. The minimum variance hedge ratio and its corresponding hedging efficiency formula are derived. In the section of numerical simulations, we first use the EM algorithm to calibrate the electricity spot model based on electricity spot price data of Nord Pool. Numerical simulations are then conducted on the operational decision -making of power generators under three different forms of energy storage. The results of the simulations provide a basis for power generators to evaluate the realtime value of power plants, to select optimal real-time power sales, and to determine the optimal timing of power plant transfer and storage methods.
We unify and establish equivalence between the pathwise and the quasi-sure approaches to robust modelling of financial markets in finite discrete time. In particular, we prove a fundamental theorem of asset pricing an...
详细信息
We unify and establish equivalence between the pathwise and the quasi-sure approaches to robust modelling of financial markets in finite discrete time. In particular, we prove a fundamental theorem of asset pricing and a superhedging theorem which encompass the formulations of Bouchard and Nutz [12] and Burzoni et al. [13]. In bringing the two streams of literature together, we examine and compare their many different notions of arbitrage. We also clarify the relation between robust and classical P-specific results. Furthermore, we prove when a superhedging property with respect to the set of martingale measures supported on a set Omega of paths may be extended to a pathwise superhedging on Omega without changing the superhedging price.
Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to approximate cooperative MARL by a mean...
详细信息
Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach and shows that the approximation error is of O(1/root N). By establishing an appropriate form of the dynamic programming principle for both the value function and the Q function, it proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature. It further establishes that the convergence rate and the sample complexity of MFC-K-Q are independent of the number of agents N, which provides an O(1/root N) approximation to the MARL problem with N agents in the learning environment. Empirical studies for the network traffic congestion problem demonstrate that MFC-K-Q outperforms existing MARL algorithms when N is large, for instance, when N > 50.
Blockchain-based token platform economy is a new branch of digital platform economics. Constructing a continuous time dynamic model of token platform economy, this paper analyzes what kind of ESG policy is appropriate...
详细信息
Blockchain-based token platform economy is a new branch of digital platform economics. Constructing a continuous time dynamic model of token platform economy, this paper analyzes what kind of ESG policy is appropriate for the government, meanwhile the token platform participants (developers, users and speculators) make optimal investments and decisions under ESG policy. Simulation result shows neutral ESG policy is optimal. Based on the given neutral ESG policy, we have done the research on ESG investment and decision strategies for platform participants. Our research shows that the tokens selling rate and efforts of green platform (ESG score greater than 0) developers are lower than the ones of brown platform (ESG score less than 0). Consequently, when developers' token retention is about half of the initial amount, users should invest more brown tokens. Speculators should invest brown tokens for developers' high token retention. Green token investments of speculators and users are needed in other cases. Next, the impact of the government's three ESG policies on the maturity or termination of the platform also been analyzed. An important conclusion occurred: the government's aggressive or conservative ESG policy cannot make the development of the green platform better;Therefore, we suggest a neutral ESG policy which means that the government could adopt high tax incentive and high tax burden on the green and brown platform while it is not necessary to implement the extra subsidy and punishment policy on the green and brown platform.
We study a new class of two-player, zero-sum, deterministic differential games where each player uses both continuous and impulse controls in an infinite horizon with discounted payoff. We assume that the form and cos...
详细信息
We study a new class of two-player, zero-sum, deterministic differential games where each player uses both continuous and impulse controls in an infinite horizon with discounted payoff. We assume that the form and cost of impulses depend on nonlinear functions and the state of the system, respectively. We use Bellman's dynamic programming principle (DPP) and viscosity solutions approach to show, for this class of games, the existence and uniqueness of a solution for the associated Hamilton-Jacobi-Bellman-Isaacs (HJBI) partial differential equations (PDEs). We then, under Isaacs' condition, deduce that the lower and upper value functions coincide, and we give a computational procedure with a numerical test for the game.
We use optimal control via a distributed exterior field to steer the dynamics of an ensemble of N interacting ferromagnetic particles which are immersed into a heat bath by minimizing a quadratic functional. Using the...
详细信息
We use optimal control via a distributed exterior field to steer the dynamics of an ensemble of N interacting ferromagnetic particles which are immersed into a heat bath by minimizing a quadratic functional. Using the dynamic programming principle, we show the existence of a unique strong solution of the optimal control problem. By the Hopf-Cole transformation, the associated Hamilton-Jacobi-Bellman equation of the dynamic programming principle may be re-cast into a linear PDE on the manifold M=(S2)N, whose classical solution may be represented via Feynman-Kac formula. We use this probabilistic representation for Monte-Carlo simulations to illustrate optimal switching dynamics.
In this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decisio...
详细信息
In this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on Q-learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler's fallacy;the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented Q-learning algorithm and on a different naive method based on deep reinforcement learning.
We propose a new monotone finite difference discretization for the variational p-Laplace operator, Delta(p)u = div(vertical bar del u vertical bar(p-2)del u), and present a convergent numerical scheme for related Diri...
详细信息
We propose a new monotone finite difference discretization for the variational p-Laplace operator, Delta(p)u = div(vertical bar del u vertical bar(p-2)del u), and present a convergent numerical scheme for related Dirichlet problems. The resulting nonlinear system is solved using two different methods: one based on Newton-Raphson and one explicit method. Finally, we exhibit some numerical simulations supporting our theoretical results. To the best of our knowledge, this is the first monotone finite difference discretization of the variational p-Laplacian and also the first time that nonhomogeneous problems for this operator can be treated numerically with a finite difference scheme.
In this paper, we study value functions of time-dependent tug-of-war games. We first prove the existence and uniqueness of value functions and verify that these game values satisfy a dynamic programming principle. Usi...
详细信息
In this paper, we study value functions of time-dependent tug-of-war games. We first prove the existence and uniqueness of value functions and verify that these game values satisfy a dynamic programming principle. Using the arguments in the proof of existence of game values, we can also deduce asymptotic behavior of game values when T -> infinity. Furthermore, we investigate boundary regularity for game values. Thereafter, based on the regularity results for value functions, we deduce that game values converge to viscosity solutions of the normalized parabolic p-Laplace equation. (C) 2021 Elsevier Ltd. All rights reserved.
暂无评论