As algorithmic tools increasingly aid experts in making consequential decisions, the need to understand the precise factors that mediate their influence has grown commensurately. In this paper, we present a crowdsourc...
详细信息
Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional su...
详细信息
Suppose that one can construct a valid (1 − δ)-confidence interval (CI) for each of K parameters of potential interest. If a data analyst uses an arbitrary data-dependent criterion to select some subset S of paramete...
We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general condi...
详细信息
We study the implicit regularization effects induced by (observation) weighting of pretrained features. For weight and feature matrices of bounded operator norms that are infinitesimally free with respect to (normaliz...
详细信息
Whether future AI models are fair, trustworthy, and aligned with the public's interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality d...
Whether future AI models are fair, trustworthy, and aligned with the public's interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. Recent research in data-centric AI has show that higher quality training data leads to better performing models, making this the right moment to introduce AI/ML researchers to the field of survey methodology, the science of data collection. We summarize insights from the survey methodology literature and discuss how they can improve the quality of training and feedback data. We also suggest collaborative research ideas into how biases in data collection can be mitigated, making models more accurate and human-centric.
We consider a variant of the best arm identification (BAI) problem in multi-armed bandits (MAB) in which there are two sets of arms (source and target), and the objective is to determine the best target arm while only...
详细信息
The kernel Maximum Mean Discrepancy (MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic u...
详细信息
One of the central objects in the theory of optimal transport is the Brenier map: the unique monotone transformation which pushes forward an absolutely continuous probability law onto any other given law. A line of re...
详细信息
The effect of public health interventions on an epidemic are often estimated by adding the intervention to epidemic models. During the Covid-19 epidemic, numerous papers used such methods for making scenario predictio...
详细信息
暂无评论