检索结果-内蒙古大学图书馆

Unbiased estimators for random design regression

学校读者我要写书评

暂无评论

The Journal of Machine Learning Research

The Journal of Machine Learning Research 2022年第1期23卷 7539-7584页

作者： Michał Dereziński Manfred K. Warmuth Daniel Hsu Department of Electrical Engineering & Computer Science University of Michigan UC Santa Cruz and Google Inc. Department of Computer Science Columbia University

In linear regression we wish to estimate the optimum linear least squares predictor for a distribution over d-dimensional input points and real-valued responses, based on a small sample. Under standard random design analysis, where the sample is drawn i.i.d. from the input distribution, the least squares solution for that sample can be viewed as the natural estimator of the optimum. Unfortunately, this estimator almost always incurs an undesirable bias coming from the randomness of the input points, which is a significant bottleneck in model averaging. In this paper we show that it is possible to draw a noni.i.d. sample of input points such that, regardless of the response model, the least squares solution is an unbiased estimator of the optimum. Moreover, this sample can be produced efficiently by augmenting a previously drawn i.i.d. sample with an additional set of d points, drawn jointly according to a certain determinantal point process constructed from the input distribution rescaled by the squared volume spanned by the points. Motivated by this, we develop a theoretical framework for studying volume-rescaled sampling, and in the process prove a number of new matrix expectation identities. We use them to show that for any input distribution and ε > 0 there is a random design consisting of O(d log d + d/ε) points from which an unbiased estimator can be constructed whose expected square loss over the entire distribution is bounded by 1 + ε times the loss of the *** provide efficient algorithms for constructing such unbiased estimators in a number of practical settings. In one such setting, we let the input distribution be uniform over a large dataset of n > d points. Here, we obtain the first unbiased least squares estimator that can be constructed in time nearly-linear in the data size, resulting in strong guarantees for model averaging. We achieve these computational gains by introducing a new algorithmic technique, called distortion-free intermediate sa

关键词： volume sampling determinantal point process linear regression unbiased estimators random design

The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Tan, Yan Shuo Ronen, Omer Saarinen, Theo Yu, Bin Department of Statistics and Data Science National University of Singapore Singapore Department of Statistics UC Berkeley United States Department of Electrical Engineering and Computer Sciences UC Berkeley United States Center for Computational Biology UC Berkeley United States

Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression model that is commonly used in causal inference and beyond. Its strong predictive performance is supported by theoretical guarantees that its posterior distribution concentrates around the true regression function at optimal rates under various data generative settings and for appropriate prior choices. In this paper, we show that the BART sampler often converges slowly, confirming empirical observations by other researchers. Assuming discrete covariates, we show that, while the BART posterior concentrates on a set comprising all optimal tree structures (smallest bias and complexity), the Markov chain's hitting time for this set increases with n (training sample size), under several common data generative settings. As n increases, the approximate BART posterior thus becomes increasingly different from the exact posterior (for the same number of MCMC samples), contrasting with earlier concentration results on the exact posterior. This contrast is highlighted by our simulations showing worsening frequentist undercoverage for approximate posterior intervals and a growing ratio between the MSE of the approximate posterior and that obtainable by artificially improving convergence via averaging multiple sampler chains. Finally, based on our theoretical insights, possibilities are discussed to improve the BART sampler convergence *** Codes 62G08, 65C40 © 2024, CC BY.

关键词： Big data

Random Gradient Masking as a Defensive Measure to Deep Leakage in Federated Learning

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Kim, Joon Park, Sejin Dept. of Electrical Engineering and Computer Science UC Berkeley BerkeleyCA94720 United States Computer Science Dept. Keimyung University Dalgubeoldaero Dalseogu Dalseogu2800 Korea Republic of

Federated Learning(FL), in theory, preserves privacy of individual clients’ data while producing quality machine learning models. However, attacks such as Deep Leakage from Gradients(DLG) severely question the practicality of FL. In this paper, we empirically evaluate the efficacy of four defensive methods against DLG: Masking, Clipping, Pruning, and Noising. Masking, while only previously studied as a way to compress information during parameter transfer, shows surprisingly robust defensive utility when compared to the other three established methods. Our experimentation is two-fold. We first evaluate the minimum hyper-parameter threshold for each method across MNIST, CIFAR-10, and lfw datasets. Then, we train FL clients with each method and their minimum threshold values to investigate the trade-off between DLG defense and training performance. Results reveal that Masking and Clipping show near to none degradation in performance while obfuscating enough information to effectively defend against DLG. © 2024, CC BY.

关键词： Federated learning

One less reason for filter-pruning: gaining free adversarial robustness with structured grouped kernel pruning 23

学校读者我要写书评

暂无评论

One less reason for filter-pruning: gaining free adversarial...

Proceedings of the 37th International Conference on Neural Information Processing Systems

作者： Shaochen (Henry) Zhong Zaichuan You Jiamu Zhang Sebastian Zhao Zachary LeClaire Zirui Liu Daochen Zha Vipin Chaudhary Shuai Xu Xia Hu Department of Computer Science Rice University Department of Computer and Data Sciences Case Western Reserve University Electrical Engineering and Computer Sciences UC Berkeley

Densely structured pruning methods utilizing simple pruning heuristics can deliver immediate compression and acceleration benefits with acceptable benign performances. However, empirical findings indicate such naïvely pruned networks are extremely fragile under simple adversarial attacks. Naturally, we would be interested in knowing if such a phenomenon also holds for carefully designed modern structured pruning methods. If so, then to what extent is the severity? And what kind of remedies are available? Unfortunately, both questions remain largely unaddressed: no prior art is able to provide a thorough investigation on the adversarial performance of modern structured pruning methods (spoiler: it is not good), yet the few works that attempt to provide mitigation often do so at various extra costs with only to-be-desired *** this work, we answer both questions by fairly and comprehensively investigating the adversarial performance of 10+ popular structured pruning methods. Solution-wise, we take advantage of Grouped Kernel Pruning (GKP)'s recent success in pushing densely structured pruning freedom to a more fine-grained level. By mixing up kernel smoothness — a classic robustness-related kernel-level metric — into a modified GKP procedure, we present a one-shot-post-train-weight-dependent GKP method capable of advancing SOTA performance on both the benign and adversarial scale, while requiring no extra (in fact, often less) cost than a standard pruning procedure. Please refer to our GitHub repository for code implementation, tool sharing, and model checkpoints.

关键词：

Improved classical shadows from local symmetries in the Schur basis

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Grier, Daniel Liu, Sihan Mahajan, Gaurav Department of Computer Science and Engineering Department of Mathematics UC San Diego United States Department of Computer Science and Engineering UCSD California CA92092 United States Institute for Foundations of Data Science Yale University Connecticut CT06511 United States

We study the sample complexity of the classical shadows task: what is the fewest number of copies of an unknown state you need to measure to predict expected values with respect to some class of observables? Large joint measurements are likely required in order to minimize sample complexity, but previous joint measurement protocols only work when the unknown state is pure. We present the first joint measurement protocol for classical shadows whose sample complexity scales with the rank of the unknown state. In particular we prove O(√rB/ǫ2) samples suffice, where r is the rank of the state, B is a bound on the squared Frobenius norm of the observables, and ϵ is the target accuracy. In the low-rank regime, this is a nearly quadratic advantage over traditional approaches that use single-copy measurements. We present several intermediate results that may be of independent interest: a solution to a new formulation of classical shadows that captures functions of non-identical input states;a generalization of a "nice" Schur basis used for optimal qubit purification and quantum majority vote;and a measurement strategy that allows us to use local symmetries in the Schur basis to avoid intractable Weingarten calculations in the analysis. © 2024, CC BY.

关键词：

Theory Acquisition as Constraint-Based Program Synthesis 43

学校读者我要写书评

暂无评论

Theory Acquisition as Constraint-Based Program Synthesis

43rd Annual Meeting of the Cognitive science Society: Comparative Cognition: Animal Minds, CogSci 2021

作者： Wang, Haoliang Vul, Edward Polikarpova, Nadia Fan, Judith E. Dept. of Psychology UC San Diego United States Dept. of Computer Science and Engineering UC San Diego United States

What computations enable humans to leap from mere observations to rich explanatory theories? Prior work has focused on stochastic algorithms that rely on random, local perturbations to model the search for satisfactory theories. Here we introduce a new approach inspired by the practice of ‘debugging’ from computer programming, whereby learners use past experience to constrain future proposals, and are thus able to consider large leaps in their current theory to fix specific deficiencies. We apply our ‘debugging’ algorithm to the magnetism domain introduced by (Ullman, Goodman, & Tenenbaum, 2010) and compare its efficiency and accuracy to their stochastic-search algorithm. We find that our algorithm not only requires fewer iterations to find a solution, but that the solutions it finds more reliably recover the correct latent theories, and are more robust to sparse data. Our findings suggest the promise of such constraint-based approaches to emulate the way humans efficiently navigate large, discrete hypothesis spaces. © Cognitive science Society: Comparative Cognition: Animal Minds, CogSci *** rights reserved.

关键词： Stochastic systems

FKeras: A Fault Tolerance Library for Keras

学校读者我要写书评

暂无评论

FKeras: A Fault Tolerance Library for Keras

3D Image Acquisition and Display: Technology, Perception and Applications, 3D, COSI, DH, FLatOptics, IS, pcAOP 2023 - Part of Imaging and Applied Optics Congress 2023

作者： Weng, Olivia Meza, Andres Duarte, Javier M. Tran, Nhan Kastner, Ryan Computer Science and Engineering Department UC San Diego 9500 Gilman Dr La Jolla CA92093 United States Physics Department UC San Diego 9500 Gilman Dr La Jolla CA92093 United States Fermi National Accelerator Laboratory PO Box 500 BataviaIL60510 United States

We present FKeras, an open-source tool that uses Hessian information to quickly find which parameters in a neural network are sensitive to radiation faults, reducing the usual 200% resource overhead needed to protect ... 详细信息

ISBN: (纸本)9781957171289

关键词： Fault tolerance

Learning Part-Based Abstractions for Visual Object Concepts 43

学校读者我要写书评

暂无评论

Learning Part-Based Abstractions for Visual Object Concepts

43rd Annual Meeting of the Cognitive science Society: Comparative Cognition: Animal Minds, CogSci 2021

作者： Wang, Haoliang Polikarpova, Nadia Fan, Judith E. Dept. of Psychology UC San Diego United States Dept. of Computer Science and Engineering UC San Diego United States

The ability to represent semantic structure in the environment — objects, parts, and relations — is a core aspect of human visual perception and cognition. Here we leverage recent advances in program synthesis to develop an algorithm for learning the part-based structure of drawings as represented by graphics programs. This algorithm iteratively learns a library of abstract subroutines that can be used to more compactly represent a set of drawings by capturing common structural elements. Our experiments explore how this algorithm exploits statistical regularities across drawings to learn new subroutines. Together, these findings highlight the potential for understanding human visual concept learning via program-like abstractions. © Cognitive science Society: Comparative Cognition: Animal Minds, CogSci *** rights reserved.

关键词： Semantics

Context-Scaling versus Task-Scaling in In-Context Learning

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Abedsoltan, Amirhesam Radhakrishnan, Adityanarayanan Wu, Jingfeng Belkin, Mikhail Department of Computer Science and Engineering UC San Diego United States Eric and Wendy Schmidt Center Broad Institute of MIT and Harvard United States School of Engineering and Applied Sciences Harvard University United States Halicioglu Data Science Institute UC San Diego United States Simons Institute UC Berkeley United States

Transformers exhibit In-Context Learning (ICL), where these models solve new tasks by using examples in the prompt without additional training. In our work, we identify and analyze two key components of ICL: (1) context-scaling, where model performance improves as the number of in-context examples increases and (2) task-scaling, where model performance improves as the number of pre-training tasks increases. While transformers are capable of both context-scaling and task-scaling, we empirically show that standard Multi-Layer Perceptrons (MLPs) with vectorized input are only capable of task-scaling. To understand how transformers are capable of context-scaling, we first propose a significantly simplified transformer architecture without key, query, value weights. We show that it performs ICL comparably to the original GPT-2 model in various statistical learning tasks including linear regression, teacher-student settings. Furthermore, a single block of our simplified transformer can be viewed as data dependent "feature map" followed by an MLP. This feature map on its own is a powerful predictor that is capable of context-scaling but is not capable of task-scaling. We show empirically that concatenating the output of this feature map with vectorized data as an input to MLPs enables both context-scaling and task-scaling. This finding provides a simple setting to study context and task-scaling for ICL. © 2024, CC0.

关键词： Distribution transformers