检索结果-内蒙古大学图书馆

Graph fission and cross-validation

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Leiner, James Ramdas, Aaditya Department of Statistics and Data Science Carnegie Mellon University United States Machine Learning Department Carnegie Mellon University United States

We introduce a technique called graph fission which takes in a graph which potentially contains only one observation per node (whose distribution lies in a known class) and produces two (or more) independent graphs with the same node/edge set in a way that splits the original graph’s information amongst them in any desired proportion. Our proposal builds on data fission/thinning, a method that uses external randomization to create independent copies of an unstructured dataset. We extend this idea to the graph setting where there may be latent structure between observations. We demonstrate the utility of this framework via two applications: inference after structural trend estimation on graphs and a model selection procedure we term "graph cross-validation". Copyright © 2024, The Authors. All rights reserved.

关键词： machine learning

Positive Semidefinite Matrix Supermartingales

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Wang, Hongjian Ramdas, Aaditya Department of Statistics and Data Science Carnegie Mellon University United States Machine Learning Department Carnegie Mellon University United States

We explore the asymptotic convergence and nonasymptotic maximal inequalities of supermartingales and backward submartingales in the space of positive semidefinite matrices. These are natural matrix analogs of scalar nonnegative supermartingales and backward nonnegative submartingales, whose convergence and maximal inequalities are the theoretical foundations for a wide and ever-growing body of results in statistics, econometrics, and theoretical computer science. Our results lead to new concentration inequalities for either martingale dependent or exchangeable random symmetric matrices under a variety of tail conditions, encompassing now-standard Chernoff bounds to self-normalized heavy-tailed settings. Further, these inequalities are usually expressed in the Loewner order, are sometimes valid simultaneously for all sample sizes or at an arbitrary data-dependent stopping time, and can often be tightened via an external randomization *** Codes 60B20, 60G48, 62L10 Copyright © 2024, The Authors. All rights reserved.

关键词： Matrix algebra

Testing by Betting while Borrowing and Bargaining

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Wang, Hongjian Ramdas, Aaditya Department of Statistics and Data Science Carnegie Mellon University United States Machine Learning Department Carnegie Mellon University United States

Testing by betting has been a cornerstone of the game-theoretic statistics literature. In this framework, a betting score (or more generally an e-process), as opposed to a traditional p-value, is used to quantify the evidence against a null hypothesis: the higher the betting score, the more money one has made betting against the null, and thus the larger the evidence that the null is false. A key ingredient assumed throughout past works is that one cannot bet more money than they currently have. In this paper, we ask what happens if the bettor is allowed to borrow money after going bankrupt, allowing further financial flexibility in this game of hypothesis testing. We propose various definitions of (adjusted) evidence relative to the wealth borrowed, indebted, and accumulated. We also ask what happens if the bettor can "bargain", in order to obtain odds bettor than specified by the null hypothesis. The adjustment of wealth in order to serve as evidence appeals to the characterization of arbitrage, interest rates, and numéraire-adjusted pricing in this setting. Copyright © 2024, The Authors. All rights reserved.

关键词： Game theory

UNDERSTANDING THE GENERALIZATION OF ADAM IN learning NEURAL NETWORKS WITH PROPER REGULARIZATION 11

学校读者我要写书评

暂无评论

UNDERSTANDING THE GENERALIZATION OF ADAM IN LEARNING NEURAL ...

11th International Conference on learning Representations, ICLR 2023

作者： Zou, Difan Cao, Yuan Li, Yuanzhi Gu, Quanquan Department of Computer Science Institute of Data Science The University of Hong Kong Hong Kong Department of Statistics & Actuarial Science The University of Hong Kong Hong Kong Machine Learning Department Carnegie Mellon University United States Department of Computer Science University of California Los Angeles United States

Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed in many deep learning applications such as image classification, Adam can converge to a different solution with a worse test error compared to (stochastic) gradient descent, even with a fine-tuned regularization. In this paper, we provide a theoretical explanation for this phenomenon: we show that in the nonconvex setting of learning over-parameterized two-layer convolutional neural networks starting from the same random initialization, for a class of data distributions (inspired from image data), Adam and gradient descent (GD) can converge to different global solutions of the training objective with provably different generalization errors, even with weight decay regularization. In contrast, we show that if the training objective is convex, and the weight decay regularization is employed, any optimization algorithms including Adam and GD will converge to the same solution if the training is successful. This suggests that the generalization gap between Adam and SGD in the presence of weight decay regularization is closely tied to the nonconvex landscape of deep learning optimization, which cannot be covered by the recent neural tangent kernel (NTK) based analysis. © 2023 11th International Conference on learning Representations, ICLR 2023. All rights reserved.

关键词： Gradient methods

A unified recipe for deriving (time-uniform) PAC-Bayes bounds

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Chugg, Ben Wang, Hongjian Ramdas, Aaditya Machine Learning Department United States Department of Statistics and Data Science Carnegie Mellon University United States

We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of wellknown classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions;in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound. Copyright © 2023, The Authors. All rights reserved.

关键词： Random processes

Time-Uniform Confidence Spheres for Means of Random Vectors

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Chugg, Ben Wang, Hongjian Ramdas, Aaditya Machine Learning Department United States Department of Statistics and Data Science Carnegie Mellon University United States

We study sequential mean estimation in d. In particular, we derive time-uniform confidence spheres—confidence sphere sequences (CSSs)—which contain the mean of random vectors with high probability simultaneously across all sample sizes. Our results include a dimension-free CSS for log-concave random vectors, a dimension-free CSS for sub-Gaussian random vectors, and CSSs for sub-ψ random vectors (which includes sub-gamma, sub-Poisson, and sub-exponential distributions). Many of our results are optimal. For sub-Gaussian distributions we also provide a CSS which tracks a time-varying mean, generalizing Robbins’ mixture approach to the multivariate setting. Finally, we provide several CSSs for heavy-tailed random vectors (two moments only). Our bounds hold under a martingale assumption on the mean and do not require that the observations be iid. Our work is based on PAC-Bayesian theory and inspired by an approach of Catoni and Giulini. © 2023, CC BY.

关键词： Spheres

CONFORMALIZED INTERACTIVE IMITATION learning: HANDLING EXPERT SHIFT & INTERMITTENT FEEDBACK

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Zhao, Michelle Simmons, Reid Admoni, Henny Ramdas, Aaditya Bajcsy, Andrea Robotics Institute School of Computer Science Carnegie Mellon University United States Departments of Statistics and Machine Learning Carnegie Mellon University United States

In interactive imitation learning (IL), uncertainty quantification offers a way for the learner (i.e. robot) to contend with distribution shifts encountered during deployment by actively seeking additional feedback from an expert (i.e. human) online. Prior works use mechanisms like ensemble disagreement or Monte Carlo dropout to quantify when black-box IL policies are uncertain;however, these approaches can lead to overconfident estimates when faced with deployment-time distribution shifts. Instead, we contend that we need uncertainty quantification algorithms that can leverage the expert human feedback received during deployment time to adapt the robot’s uncertainty online. To tackle this, we draw upon online conformal prediction, a distribution-free method for constructing prediction intervals online given a stream of ground-truth labels. Human labels, however, are intermittent in the interactive IL setting. Thus, from the conformal prediction side, we introduce a novel uncertainty quantification algorithm called intermittent quantile tracking (IQT) that leverages a probabilistic model of intermittent labels, maintains asymptotic coverage guarantees, and empirically achieves desired coverage levels. From the interactive IL side, we develop ConformalDAgger, a new approach wherein the robot uses prediction intervals calibrated by IQT as a reliable measure of deployment-time uncertainty to actively query for more expert feedback. We compare ConformalDAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn’t) present because of changes in the expert’s policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, ConformalDAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior. Project page at ***/conformalized-interactive-il/. © 2024, CC BY.

关键词： Intelligent systems

Generalized equivalences between subsampling and ridge regularization 23

学校读者我要写书评

暂无评论

Generalized equivalences between subsampling and ridge regul...

Proceedings of the 37th International Conference on Neural Information Processing Systems

作者： Pratik Patil Jin-Hong Du Department of Statistics University of California Berkeley CA Department of Statistics and Data Science & Machine Learning Department Carnegie Mellon University Pittsburgh PA

We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels λ and subsample aspect ratios ψ, are asymptotically equivalent along specific paths in the (λ, ψ)-plane (where ψ is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a data-dependent method to determine the equivalent paths of (λ, ψ). An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. [1] for general data distributions under proportional asymptotics, assuming a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.

关键词：

Robust Universal Inference

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Park, Beomjo Balakrishnan, Sivaraman Wasserman, Larry Department of Statistics & Data Science Machine Learning Department Carnegie Mellon University PittsburghPA15213 United States

In statistical inference, it is rarely realistic that the hypothesized statistical model is well-specified, and consequently it is important to understand the effects of misspecification on inferential procedures. When the hypothesized statistical model is misspecified, the natural target of inference is a projection of the data generating distribution onto the model. We present a general method for constructing valid confidence sets for such projections, under weak regularity conditions, despite possible model misspecification. Our method builds upon the universal inference method of Wasserman et al. [41] and is based on inverting a family of split-sample tests of relative fit. We study settings in which our methods yield either exact or approximate, finite-sample valid confidence sets for various projection distributions. We study rates at which the resulting confidence sets shrink around the target of inference and complement these results with a simulation study. Copyright © 2023, The Authors. All rights reserved.

关键词： machine learning