The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transforme...
详细信息
Modern longitudinal studies collect feature data at many timepoints, often of the same order of sample size. Such studies are typically affected by dropout and positivity violations. We tackle these problems by genera...
The problem of estimating a one-dimensional signal possessing mixed degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy. For example, in the time domain, an astronomical object m...
详细信息
We consider the problem of global optimization of an unknown non-convex smooth function with zeroth-order feedback. In this setup, an algorithm is allowed to adaptively query the underlying function at different locat...
详细信息
In basketball and hockey, state-of-the-art player value statistics are often variants of Adjusted Plus-Minus (APM). But APM hasn’t had the same impact in soccer, since soccer games are low scoring with a low number o...
详细信息
Most classifiers operate by selecting the maximum of an estimate of the conditional distribution p(yjx) where x stands for the features of the instance to be classified and y denotes its label. This often results in a...
详细信息
Mode-based clustering methods define clusters to be the basins of attraction of the modes of a density estimate. The most common version is mean shift clustering which uses a gradient ascent algorithm to find the basi...
Deep generative models, such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAs) have been demonstrated to produce images of high visual quality. However, the existing hardware on which these m...
详细信息
We study minimax convergence rates of nonparametric density estimation under a large class of loss functions called "adversarial losses", which, besides classical Lp losses, includes maximum mean discrepancy...
详细信息
The great success of deep learning poses urgent challenges for understanding its working mechanism and rationality. The depth, structure, and massive size of the data are recognized to be three key ingredients for dee...
详细信息
暂无评论