Large language models (LLMs) have emerged as valuable tools for enhancing textual features in various text-related tasks. Despite their superiority in capturing the lexical semantics between tokens for text analysis, ...
详细信息
The digital era has made seamless sharing and keeping of media such as images on cloud platforms an integral part of our lives. Still, there is a big issue about user privacy and data security in these repositories. W...
详细信息
This book is written to offer a humble, but unified, treatment of e-values inhypothesis testing. The book is organized into three parts: FundamentalConcepts, Core Ideas, and Advanced Topics. The first part includes th...
In nonparametric independence testing, we observe i.i.d. data {(Xi, Yi)}ni=1, where X ∈ Χ, Y ∈ Y lie in any general spaces, and we wish to test the null that X is independent of Y. Modern test statistics such as th...
详细信息
In nonparametric independence testing, we observe i.i.d. data {(Xi, Yi)}ni=1, where X ∈ Χ, Y ∈ Y lie in any general spaces, and we wish to test the null that X is independent of Y. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Hence, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. In this paper, we provide a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced "cross" HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the permutation tests, our variants have the same power within a constant factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.
Bagging is a commonly used ensemble technique in statistics and machinelearning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under ...
详细信息
Bagging is a commonly used ensemble technique in statistics and machinelearning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.
Existing concentration bounds for bounded vector-valued random variables include extensions of the scalar Hoeffding and Bernstein inequalities. While the latter is typically tighter, it requires knowing a bound on the...
详细信息
We study the Personalized PageRank (PPR) algorithm, a local spectral method for clustering, which extracts clusters using locally-biased random walks around a given seed node. In contrast to previous work, we adopt a ...
详细信息
作者:
Zade, NikitaLangote, MeherVerma, Prateek
Faculty of Engineering & Technology Department of Artificial Intelligence & Data Science Maharashtra Sawangi442001 India
Faculty of Engineering & Technology Department of Artificial Intelligence & Machine Learning Maharashtra Sawangi442001 India
XAI is now transforming the use of AI in diagnosing diseases by overcoming some of the problems inherent in most black-box approaches. In time-sensitive speciality areas like computer-aided diagnosis, image analysis, ...
详细信息
We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general condi...
We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture the alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting. For example, a negative regularization level can be optimal under covariate shift or regression shift, even when the training features are isotropic or the design is underparameterized. Furthermore, we prove that the optimally tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting and when optimizing over negative regularization levels. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization levels.
Large Multimodal Models (LMMs) have achieved strong performance across a range of vision and language tasks. However, their spatial reasoning capabilities are under-investigated. In this paper, we construct a novel VQ...
详细信息
暂无评论