Several classical adaptive optimization algorithms, such as line search and trust-region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are es...
详细信息
Several classical adaptive optimization algorithms, such as line search and trust-region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are estimated via stochastic oracles. Unlike the majority of stochastic methods, these methods do not use a pre-specified sequence of step size parameters, but adapt the step size parameter according to the estimated progress of the algorithm and use it to dictate the accuracy required from the stochastic oracles. The requirements on the stochastic oracles are, thus, also adaptive and the oracle costs can vary from iteration to iteration. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero due to possible oracle failures, and bounds on the step size parameter have not been previously derived. This creates obstacles in the total complexity analysis of such methods, because the oracle costs are typically decreasing in the step size parameter, and could be arbitrarily large as the step size parameter goes to 0. Thus, until now only the total iteration complexity of these methods has been analyzed. In this paper, we derive a lower bound on the step size parameter that holds with high probability for a large class of adaptive stochastic methods. We then use this lower bound to derive a framework for analyzing the expected and high probability total oracle complexity of any method in this class. Finally, we apply this framework to analyze the total sample complexity of two particular algorithms, STORM (Blanchet et al. in INFORMS J Optim 1(2):92-119, 2019) and SASS (Jin et al. in High probability complexity bounds for adaptive step search based on stochastic oracles, 2021. https://***/10.48550/ARXIV.2106.06454), in the expected risk minimization problem.
作者:
Trunschke, PhilippUniv Nantes
Ecole Cent Nantes LMJL UMR CNRS 6629 2 Chemin HoussiniereBP 92208 F-44322 Nantes 3 France
We consider the problem of approximating a function in a general nonlinear subset of L-2, when only a weighted Monte Carlo estimate of the L-2-norm is accessible. The concept of sample complexity, i.e. the number of s...
详细信息
We consider the problem of approximating a function in a general nonlinear subset of L-2, when only a weighted Monte Carlo estimate of the L-2-norm is accessible. The concept of sample complexity, i.e. the number of sample points necessary to achieve a prescribed error with high probability, is of particular interest in this setting. Reasonable worst-case bounds for this quantity exist only for particular model classes, like linear spaces or sets of sparse vectors. However, the existing bounds are very pessimistic for more general sets, like tensor networks or neural networks. Restricting the model class to a neighborhood of the best approximation allows us to derive improved worst-case bounds for the sample complexity. When the considered neighborhood is a manifold with positive local reach, its sample complexity can be estimated through the sample complexities of the tangent and normal spaces and the manifold's curvature.
In parallel with the standardization of lattice-based cryptosystems, the research community in Post-quantum Cryptography focused on non-lattice-based hard problems for constructing public-key cryptographic primitives....
详细信息
In parallel with the standardization of lattice-based cryptosystems, the research community in Post-quantum Cryptography focused on non-lattice-based hard problems for constructing public-key cryptographic primitives. The Linear Code Equivalence (LCE) Problem has gained attention regarding its practical applications and cryptanalysis. Recent advancements, including the LESS signature scheme and its candidacy in the NIST standardization for additional signatures, supported LCE as a foundation for post-quantum cryptographic primitives. However, recent cryptanalytic results have revealed vulnerabilities in LCE-based constructions when multiple related public keys are available for one specific code rate. In this work, we generalize the proposed attacks to cover all code rates. We show that the complexity of recovering the private key from multiple public keys is significantly reduced for any code rate scenario. Thus, we advise against constructing specific cryptographic primitives using LCE.
System stabilization via policy gradient (PG) methods has drawn increasing attention in both control and machine learning communities. In this article, we study their convergence and sample complexity for stabilizing ...
详细信息
System stabilization via policy gradient (PG) methods has drawn increasing attention in both control and machine learning communities. In this article, we study their convergence and sample complexity for stabilizing linear time-invariant systems in terms of the number of system rollouts. Our analysis is built upon a discounted linear quadratic regulator (LQR) method which alternatively updates the policy and the discount factor of the LQR problem. First, we propose an explicit rule to adaptively adjust the discount factor by exploring the stability margin of a linear control policy. Then, we establish the sample complexity of PG methods for stabilization, which only adds a coefficient logarithmic in the spectral radius of the state matrix to that for solving the LQR problem with a prior stabilizing policy. Finally, we perform simulations to validate our theoretical findings and demonstrate the effectiveness of our method on a class of nonlinear systems.
This article assesses the use of high-resolution Unmanned Aerial Vehicle (UAV) data from commercial field sensors for classifying small-scale agricultural patterns in four crop types (Winter Wheat, Spring Barley, Rape...
详细信息
This article assesses the use of high-resolution Unmanned Aerial Vehicle (UAV) data from commercial field sensors for classifying small-scale agricultural patterns in four crop types (Winter Wheat, Spring Barley, Rapeseed, and Corn) acquired at ground sample distances (GSDs) of 0.027 m, 0.053 m and 0.064 m. Image harmonization challenges due to spectral and textural variations from varying GSDs and sensors are addressed. The study investigates the data and sample complexity required to develop an effective machine/deep learning (ML/DL) model, using techniques such as the Jeffries-Matusita Distance for assessment of class separability and feature importance ranking for feature and layer selection, semivariogram analysis for determining minimum sample patch sizes. The results demonstrate distinct classification capabilities based on spectral information in differentiating between sub-classes such as weed infestation, bare soil, disturbed canopy areas, and undisturbed canopy areas. However, there are limitations in detecting refined sub-classes of undisturbed canopy areas assigned to phenological groups, highlighting the need for class reduction and tailored feature and layer selection. The final set of sub-classes was proposed. The study also proposes a customized set of input layers for each crop type and identifies minimum patch sizes to enhance the efficiency of detecting specific agricultural patterns. It has been confirmed that to exploit texture information for classification (at smaller sample patch sizes < 120 pixels), Ground Sampling Distances (GSDs) between 0.027 m and 0.064 m (for RGB and CIR sensors of commercial drones, respectively) are suitable for capturing detailed patterns of Corn and Spring Barley. However, the CIR sensor, at GSDs of 0.053 m and 0.064 m, performs better for Winter Wheat and Rapeseed.
We show that the sample complexity of worst-case H-infinity-identification is of order n(2), by proving that the minimal length of a fractional H-infinity-cover for C-n, regarded as the linear space of complex-valued ...
详细信息
We show that the sample complexity of worst-case H-infinity-identification is of order n(2), by proving that the minimal length of a fractional H-infinity-cover for C-n, regarded as the linear space of complex-valued sequences of length n, is of order n(2). A unit vector u in l(infinity) is a fractional H-infinity-cover for C-n if for some 0 < alpha < 1, \\u * h\\(infinity) greater than or equal to alpha\\ (h) over tilde \\(H infinity) for all h is an element of C-n, where (h) over tilde(z) = Sigma(j=o)(n-1)h(j)z(j) is the z-transform of h. We also give similar results for real-valued sequences.
We consider the problem of PAC learning probabilistic networks in the case where the structure of the net is specified beforehand. We allow the conditional probabilities to be represented in any manner (as tables or s...
详细信息
We consider the problem of PAC learning probabilistic networks in the case where the structure of the net is specified beforehand. We allow the conditional probabilities to be represented in any manner (as tables or specialized functions) and obtain sample complexity bounds for learning nets with and without hidden nodes.
In a statistical setting of the classification (pattern recognition) problem the number of examples required to approximate an unknown labelling function is linear in the VC dimension of the target learning class. In ...
详细信息
In a statistical setting of the classification (pattern recognition) problem the number of examples required to approximate an unknown labelling function is linear in the VC dimension of the target learning class. In this work we consider the question of whether such bounds exist if we restrict our attention to computable classification methods, assuming that the unknown labelling function is also computable. We find that in this case the number of examples required for a computable method to approximate the labelling function not only is not linear, but grows faster (in the VC dimension of the class) than any computable function. No time or space constraints are put on the predictors or target functions;the only resource we consider is the training examples. The task of classification is considered in conjunction with another learning problem - data compression. An impossibility result for the task of data compression allows us to estimate the sample complexity for pattern recognition.
This letter considers the use of total variation (TV) minimization in the recovery of a given gradient sparse vector from Gaussian linear measurements. It has been shown in recent studies that there exists a sharp pha...
详细信息
This letter considers the use of total variation (TV) minimization in the recovery of a given gradient sparse vector from Gaussian linear measurements. It has been shown in recent studies that there exists a sharp phase transition behavior in TV minimization for the number of measurements necessary to recover the signal in asymptotic regimes. The phase-transition curve specifies the boundary of success and failure of TV minimization for large number of measurements. It is a challenging task to obtain a theoretical bound that reflects this curve. In this letter, we present a novel upper bound that suitably approximates this curve and is asymptotically sharp. Numerical results show that our bound is closer to the empirical TV phase-transition curve than the previously known bound obtained by Kabanava.
For self-tuning control of a finite state Markov chain whose parametrized transition probabilities satisfy an 'identifiability condition', we establish a bound on the number of samples required to attain a pre...
详细信息
For self-tuning control of a finite state Markov chain whose parametrized transition probabilities satisfy an 'identifiability condition', we establish a bound on the number of samples required to attain a prescribed measure of near-optimality with a prescribed probability. (C) 2000 Elsevier Science B.V. All rights reserved.
暂无评论