This paper considers the problem of constructing finite-dimensional state space realizations for stochastic processes that can be represented as the outputs of a certain type of a causal system driven by a continuous ...
详细信息
Both the search for extraterrestrial intelligence (SETI) and messaging extraterrestrial intelligence (METI) struggle with a strong indeterminacy in what data to look for and when to do so. This has led to attempts at ...
详细信息
The majorizing measure theorem of Fernique and Talagrand is a fundamental result in the theory of random processes. It relates the boundedness of random processes indexed by elements of a metric space to complexity me...
详细信息
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms. The main technical tool is a probabilistic decorrelation lemma based on a change of measure a...
详细信息
In this paper we consider a class of nonlinear systems with two kinds of inputs: one is slowly-varying, the other is fast-varying and periodic, and both are only piecewise continuous. Under the assumption that the ori...
详细信息
In this paper, stochastic optimal control problems in continuous time and space are considered. In recent years, such problems have received renewed attention from the lens of reinforcement learning (RL) which is also...
详细信息
The use of TD-learning has been widely employed in reinforcement learning algorithms due to its efficiency and practicality. Herein, we study the convergence of a variant of Monte Carlo Exploring Starts when operatorn...
详细信息
We consider a communication system consisting of a server that tracks and publishes updates about a time-varying data source or event, and a gossip network of users interested in closely tracking the event. The timeli...
Direct policy search has been widely applied in modern reinforcement learning and continuous control. However, the theoretical properties of direct policy search on nonsmooth robust control synthesis have not been ful...
详细信息
In our problem, we are given access to a number of sequences of nonnegative i.i.d. random variables, whose realizations are observed sequentially. All sequences are of the same finite length. The goal is to pick one e...
详细信息
ISBN:
(数字)9798350382846
ISBN:
(纸本)9798350382853
In our problem, we are given access to a number of sequences of nonnegative i.i.d. random variables, whose realizations are observed sequentially. All sequences are of the same finite length. The goal is to pick one element from each sequence in order to maximize a reward equal to the expected value of the sum of the selections from all sequences. The decision on which element to pick is irrevocable, i.e., rejected observations cannot be revisited. Furthermore, the procedure terminates upon having a single selection from each sequence. Our observation constraint is that we cannot observe the current realization of all sequences at each time instant. Instead, we can observe only a smaller, yet arbitrary, subset of them. Thus, together with a stopping rule that determines whether we choose or reject the sample, the solution requires a sampling rule that determines which sequence to observe at each instant. The problem can be solved via dynamic programming, but with an exponential complexity in the length of the sequences. In order to make the solution computationally tractable, we introduce a decoupling approach and determine each stopping time using either a single-sequence dynamic programming, or a Prophet Inequality inspired threshold method, with polynomial complexity in the length of the sequences. We prove that the decoupling approach guarantees at least 0.745 of the optimal expected reward of the joint problem. In addition, we describe how to efficiently compute the optimal number of samples for each sequence, and its' dependence on the variances.
暂无评论