We advance market-making strategies by integrating Adversarial Reinforcement learning (ARL), Hawkes Processes, and variable volatility levels while also expanding the action space available to market makers (MMs). To ...
详细信息
ISBN:
(纸本)9798400710810
We advance market-making strategies by integrating Adversarial Reinforcement learning (ARL), Hawkes Processes, and variable volatility levels while also expanding the action space available to market makers (MMs). To enhance the adaptability and robustness of these strategies - which can quote always, quote only on one side of the market or not quote at all - we shift from the commonly used Poisson process to the Hawkes process, which better captures real market dynamics and self-exciting behaviors. We then train and evaluate strategies under volatility levels of 2 and 200. Our findings show that the 4-action MM trained in a low-volatility environment effectively adapts to high-volatility conditions, maintaining stable performance and providing two-sided quotes at least 92% of the time. this indicates that incorporating flexible quoting mechanisms and realistic market simulations significantly enhances the effectiveness of market-making strategies.
this paper develops a model-based policy gradient algorithm for tracking dynamic targets using a mobile agent equipped with an onboard sensor with limited field of view. the task is to obtain a continuous control poli...
详细信息
this paper develops a model-based policy gradient algorithm for tracking dynamic targets using a mobile agent equipped with an onboard sensor with limited field of view. the task is to obtain a continuous control policy for the mobile agent to collect sensor measurements that reduce uncertainty in the target states, measured by the target distribution entropy. We design a neural network control policy withthe agent SE(3) pose and the mean vector and information matrix of the joint target distribution as inputs and attention layers to handle variable numbers of targets. We also derive the gradient of the target entropy with respect to the network parameters explicitly, allowing efficient model-based policy gradient optimization.
We study the problem of learning a linear system model from the observations of M clients. the catch: Each client is observing data from a different dynamical system. this work addresses the question of how multiple c...
详细信息
We study the problem of learning a linear system model from the observations of M clients. the catch: Each client is observing data from a different dynamical system. this work addresses the question of how multiple clients collaboratively learn dynamical models in the presence of heterogeneity. We pose this problem as a federated learning problem and characterize the tension between achievable performance and system heterogeneity. Furthermore, our federated sample complexity result provides a constant factor improvement over the single agent setting. Finally, we describe a meta federated learning algorithm, FedSysID, that leverages existing federated algorithms at the client level.
We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the gra...
详细信息
We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the graph Laplacian matrix that models the network feedback. then, a reduced network is built, where each node represents the aggregate dynamics of each coherent group, and the reduced network captures the dynamic coupling between the groups. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight stochastic block model. Finally, numerical experiments align with and validate our theoretical findings.
We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem ...
详细信息
We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. thereby, we factorize the operator's otherwise intractable action space, but still obtain a globally coordinated decision. Experiments based on real-world taxi data show that our method outperforms state of the art benchmarks with respect to performance, stability, and computational tractability.
In this paper, we focus on learningthe time delay and nonlinearity of autonomous dynamical systems using trainable time delay neural networks. We demonstrate that, with delays trained together with weights and biases...
详细信息
In this paper, we focus on learningthe time delay and nonlinearity of autonomous dynamical systems using trainable time delay neural networks. We demonstrate that, with delays trained together with weights and biases, the trained neural networks may approximate the right hand side of delay differential equations. It is shown that data collected from the vicinity a stable equilibrium or limit cycle do not contain rich enough dynamics, therefore the trained networks can have very poor generalization. However, including data about the transient behavior can significantly enhance the performance, and similar improvements can be achieved when data collected near a chaotic attractor is utilized. We also evaluate how the learning performance is affected by the selected loss function and measurement noise. Numerical results are presented for learning examples: Mackey-Glass equation and a predator-prey model.
We consider the fundamental problem of online control of a linear dynamical system from two different viewpoints: regret minimization and competitive analysis. We prove that the optimal competitive policy is well-appr...
详细信息
We consider the fundamental problem of online control of a linear dynamical system from two different viewpoints: regret minimization and competitive analysis. We prove that the optimal competitive policy is well-approximated by a convex parameterized policy class, known as a disturbance-action control (DAC) policies. Using this structural result, we show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight, and optimal competitive ratio, up to an additive correction which grows sublinearly in the time horizon. We further conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown, and even when a stabilizing controller for the dynamics is not available a priori.
Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. the Lyapunov ...
详细信息
Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. the Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. this paper introduces a novel framework, LYapunov-Guided Exploration (LYGE), for learning stabilizing controllers tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learningthe local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by other certificate functions for unknown systems.
Variational autoencoders allow to learn a lower-dimensional latent space based on high-dimensional input/output data. Using video clips as input data, the encoder may be used to describe the movement of an object in t...
详细信息
Variational autoencoders allow to learn a lower-dimensional latent space based on high-dimensional input/output data. Using video clips as input data, the encoder may be used to describe the movement of an object in the video without ground truth data (unsupervised learning). Even though the object's dynamics is typically based on first principles, this prior knowledge is mostly ignored in the existing literature. thus, we propose a physics-enhanced variational autoencoder that places a physical-enhanced Gaussian process prior on the latent dynamics to improve the efficiency of the variational autoencoder and to allow physically correct predictions. the physical prior knowledge expressed as linear dynamical system is here reflected by the Green's function and included in the kernel function of the Gaussian process. the benefits of the proposed approach are highlighted in a simulation with an oscillating particle.
In this work, we present the hyperparameter optimization of an online, off-policy reinforcement learning algorithm based on a parallel search. Since this model-free learning algorithm solves the H-infinity optimal tra...
详细信息
In this work, we present the hyperparameter optimization of an online, off-policy reinforcement learning algorithm based on a parallel search. Since this model-free learning algorithm solves the H-infinity optimal tracking problem iteratively using ordinary least squares regression, we propose using the condition number of the data matrix as a model-free measure for tuning the hyperparameters. this addition enables automated optimization of the involved hyperparameters. We demonstrate that the condition number is a useful metric for tuning the number of collected samples, sampling interval, and other hyperparameters involved. In addition, we demonstrate a correlation between this condition number and properties of the sum of sinusoids persistent excitation.
暂无评论