We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPS). We prove a new bound for a modified version of Upper Confidence ...
详细信息
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPS). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by being both more general (it applies to all policies) and tighter. The upper and lower bounds match up to logarithmic factors provided the transition matrix is not too dense. (C) 2014 Elsevier B.V. All rights reserved.
In the problem of classical group testing one aims to identify a small subset (of size d) of diseased individuals/defective items in a large population (of size n). This process is based on a minimal number of suitabl...
详细信息
In the problem of classical group testing one aims to identify a small subset (of size d) of diseased individuals/defective items in a large population (of size n). This process is based on a minimal number of suitably-designed group tests on subsets of items, where the test outcome is positive iff the given test contains at least one defective item. Motivated by physical considerations, such as scenarios with imperfect test apparatus, we consider a generalized setting that includes as special cases multiple other group-testing-like models in the literature. In our setting the test outcome is governed by an arbitrary monotonically increasing (stochastic) test function f (.), with the test outcome being positive with probability f (x), where x is the number of defectives tested in that pool. This formulation subsumes as special cases a variety of noiseless and noisy group-testing models in the literature. Our main contributions are as follows. Firstly, for any monotone test function f (.) we present a non-adaptive scheme that with probability 1 - 6 identifies all defective items. Our scheme requires at most O (?(f )d log (n )) tests, where ?(f ) e is a suitably defined "sensitivity parameter" of f (.), and is never larger than O (d1+o(1)), but indeed can be substantially smaller for a variety of f (.). Secondly, we argue that any non-adaptive group testing scheme needs at least O ((1 - 6)? (f )d log (n )) d tests to ensure high reliability recovery. Here ? (f ) is a suitably defined "concentration parameter" of f (.), and ? (f ) ? O(1). Thirdly, we prove that our sample-complexity bounds for generalized group testing are information-theoretically near-optimal for a variety of sparse-recovery group-testing models in the literature. That is, for any "noisy" test function f (.) (i.e., 0 < f (0) < f (d) < 1), and for a variety of "(one-sided) noiseless" test functions f (.) (i.e., either f (0) = 0, or f (d) = 1, or both) studied in the literature we show that ?(f ) ? (f
暂无评论