Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of ...
Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer can lead to feature learning; characterized by the appearance of a separated rank-one component--spike--in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting largedimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the training and test errors, we demonstrate that these non-linear features can enhance learning.
We consider the supercooled Stefan problem, which captures the freezing of a supercooled liquid, in one space dimension. A probabilistic reformulation of the problem allows us to define global solutions, even in the p...
详细信息
Statistical mechanics can provide a versatile theoretical framework for investigating the collective dynamics of weakly nonlinear waves-settings that can be utterly complex to describe otherwise. In optics, composite ...
详细信息
Forest fires have been detected to occur in Indonesia since 1998. The forest fires mostly resulted from human activities in order to expand their land, especially oil palm lands. Once a land clearing is carried out, i...
详细信息
Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a c...
In this article we consider the estimation of static parameters for partially observed diffusion process with discrete-time observations over a fixed time interval. In particular, we assume that one must time-discreti...
详细信息
Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studie...
详细信息
The two-dimensional electron gas (2DEG) is a fundamental model, which is drawing increasing interest because of recent advances in experimental and theoretical studies of 2D materials. Current understanding of the gro...
详细信息
The prediction accuracy of machine learning methods is steadily increasing, but the calibration of their uncertainty predictions poses a significant challenge. Numerous works focus on obtaining well-calibrated predict...
详细信息
The prediction accuracy of machine learning methods is steadily increasing, but the calibration of their uncertainty predictions poses a significant challenge. Numerous works focus on obtaining well-calibrated predictive models, but less is known about reliably assessing model calibration. This limits our ability to know when algorithms for improving calibration have a real effect, and when their improvements are merely artifacts due to random noise in finite datasets. In this work, we consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. The null hypothesis is that the predictive model is calibrated, while the alternative hypothesis is that the deviation from calibration is sufficiently *** find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. When the conditional class probabilities are Hölder continuous, we propose T-Cal, a minimax optimal test for calibration based on a debiased plug-in estimator of the ℓ2-Expected Calibration Error (ECE). We further propose adaptive T-Cal, a version that is adaptive to unknown smoothness. We verify our theoretical findings with a broad range of experiments, including with several popular deep neural net architectures and several standard post-hoc calibration methods. T-Cal is a practical general-purpose tool, which--combined with classical tests for discrete-valued predictors--can be used to test the calibration of virtually any probabilistic classification method. T-Cal is available at https://***/dh7401/T-Cal.
In this paper we consider the filtering of partially observed multi-dimensional diffusion processes that are observed regularly at discrete times. This is a challenging problem which requires the use of advanced numer...
详细信息
暂无评论