Monkeypox is a viral illness that has been known to affect the humans. It is commonly misidentified as chickenpox due to the similarity of its rashes with chickenpox, resulting in improper treatment and further spread...
详细信息
Semi-supervised learning (SSL) is a powerful paradigm for leveraging unlabeled data and has been proven to be successful across various tasks. Conventional SSL studies typically assume close environment scenarios wher...
详细信息
Artificial intelligence (AI) and machine learning (ML) systems are developing at a quick pace, which has brought about a major change for many organisations, including digital marketing. Machine learning is a tool use...
详细信息
Decision tree learning algorithms such as CART are generally based on heuristics that maximizes the purity gain greedily. Though these algorithms are practically successful, theoretical properties such as consistency ...
详细信息
Decision tree learning algorithms such as CART are generally based on heuristics that maximizes the purity gain greedily. Though these algorithms are practically successful, theoretical properties such as consistency are far from clear. In this paper, we discover that the most serious obstacle encumbering consistency analysis for decision tree learning algorithms lies in the fact that the worst-case purity gain, i.e., the core heuristics for tree splitting, can be zero. Based on this recognition, we present a new algorithm, named Grid Classification And Regression Tree (GridCART), with a provable consistency rate O(n(-1)/((d+2))) which is the first consistency rate proved for heuristic tree learning algorithms.
We develop an asymptotic framework to compare the test performance of (personalized) federated learning algorithms whose purpose is to move beyond algorithmic convergence arguments. To that end, we study a high-dimens...
详细信息
We develop an asymptotic framework to compare the test performance of (personalized) federated learning algorithms whose purpose is to move beyond algorithmic convergence arguments. To that end, we study a high-dimensional linear regression model to elucidate the statistical properties (per client test error) of loss minimizers. Our techniques and model allow precise predictions about the benefits of personalization and information sharing in federated scenarios, including that Federated Averaging with simple client fine-tuning achieves identical asymptotic risk to more intricate meta-learning approaches and outperforms naive Federated Averaging. We evaluate and corroborate these theoretical predictions on federated versions of the EMNIST, CIFAR-100, Shakespeare, and Stack Overflow datasets.
Estimating heterogeneous treatment effects (HTE) from observational studies is rising in importance due to the widespread accumulation of data in many fields. Due to the selection bias behind the inaccessibility of co...
详细信息
Estimating heterogeneous treatment effects (HTE) from observational studies is rising in importance due to the widespread accumulation of data in many fields. Due to the selection bias behind the inaccessibility of counterfactual data, the problem differs fundamentally from supervised learning in a challenging way. However, existing works on modeling selection bias and corresponding algorithms do not naturally generalize to non-binary treatment spaces. To address this limitation, we propose to use mutual information to describe selection bias in estimating HTE and derive a novel error bound using the mutual information between the covariates and the treatments, which is the first error bound to cover general treatment schemes including multinoulli or continuous spaces. We then bring forth theoretically justified algorithms, the Mutual Information Treatment Network (MitNet), using adversarial optimization to reduce selection bias and obtain more accurate HTE estimations. Our algorithm reaches remarkable performance in both simulation study and empirical evaluation.
Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SG...
详细信息
ISBN:
(纸本)9781956792034
Recently, information-theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient / Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Renyi's entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon's entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Renyi's entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies.
We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms. We begin by formalizing this goal as one of finding disting...
详细信息
We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms. We begin by formalizing this goal as one of finding distinguishing feature transformations, i.e., input transformations that change the predictions of models trained with one learning algorithm but not the other. We then present MODELDIFF, a framework that leverages data-models (Ilyas et al., 2022) to compare learning algorithms based on how they use training data. We demonstrate MODELDIFF through three case studies, comparing models trained with/without data augmentation, with/without pre-training, and with different SGD hyperparameters. Our code is available at https://***/MadryLab/modeldiff.
Spiking neural networks (SNNs) have received extensive attention in multi-disciplinary fields, due to their rich spatiotemporal dynamics and the potential for low processing delay and high energy efficiency on neuromo...
详细信息
ISBN:
(纸本)9781665488679
Spiking neural networks (SNNs) have received extensive attention in multi-disciplinary fields, due to their rich spatiotemporal dynamics and the potential for low processing delay and high energy efficiency on neuromorphic hardware. The research on SNN learning algorithms is active and diverse, and many algorithms differ significantly from those of DNN in terms of computation model/features and weight adjustment mechanisms. This paper proposes FABLE, a multi-level framework for building and running SNN learning algorithms efficiently. Its kernel is an adaptable computation model based on synchronous data flow, which can well express the spatiotemporal parallelism of SNN and then can organize and schedule the underlying SNN-custom tensor operators (OPs) to construct optimized computing procedures. It also provides a flexible programming interface for users to design or customize their learning algorithms. In addition, the implementation of FABLE has high compatibility: It extends PyTorch's OP library, scheduler, and APIs to take advantage of the ecology and usability of the latter. To show the flexibility of the framework, we have ported five different learning algorithms, each with less programming than its original implementation. Further experiments demonstrate that FABLE outperforms all of them (up to 2.61x) in terms of computing performance, while the original implementations are either based on PyTorch, based on some third-party tool using PyTorch, or based on GPGPU's runtime directly.
The Internet of Things (IoT) is a network of interconnected devices that may be used to remotely detect, identify, and operate physical objects. IoT's qualities allow for the incorporation of the real world into a...
详细信息
暂无评论