Machine learning is a data-driven field, and the quality of the underlying datasets plays a crucial role in learning success. However, high performance on held-out test data does not necessarily indicate that a model ...
详细信息
Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn ...
详细信息
ISBN:
(纸本)9798350301298
Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.
We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. VISPROG avoids the need for any task-specific training. Instead, it uses the in-cont...
详细信息
ISBN:
(纸本)9798350301298
We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. VISPROG avoids the need for any task-specific training. Instead, it uses the in-context learning ability of large language models to generate python-like modular programs, which are then executed to get both the solution and a comprehensive and interpretable rationale. Each line of the generated program may invoke one of several off-the-shelf computervision models, image processing subroutines, or python functions to produce intermediate outputs that may be consumed by subsequent parts of the program. We demonstrate the flexibility of VISPROG on 4 diverse tasks - compositional visual question answering, zero-shot reasoning on image pairs, factual knowledge object tagging, and language-guided image editing. We believe neuro-symbolic approaches like VISPROG are an exciting avenue to easily and effectively expand the scope of AI systems to serve the long tail of complex tasks that people may wish to perform.
Current differentiable renderers provide light transport gradients with respect to arbitrary scene parameters. However, the mere existence of these gradients does not guarantee useful update steps in an optimization. ...
详细信息
ISBN:
(纸本)9798350301298
Current differentiable renderers provide light transport gradients with respect to arbitrary scene parameters. However, the mere existence of these gradients does not guarantee useful update steps in an optimization. Instead, inverse rendering might not converge due to inherent plateaus, i.e., regions of zero gradient, in the objective function. We propose to alleviate this by convolving the high-dimensional rendering function, that maps scene parameters to images, with an additional kernel that blurs the parameter space. We describe two Monte Carlo estimators to compute plateau-reduced gradients efficiently, i.e., with low variance, and show that these translate into net-gains in optimization error and runtime performance. Our approach is a straightforward extension to both black-box and differentiable renderers and enables optimization of problems with intricate light transport, such as caustics or global illumination, that existing differentiable renderers do not converge on. Our code is at ***/mfischerucl/prdpt.
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad rang...
详细信息
ISBN:
(纸本)9798350301298
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input, where we model the layout corruption process by modality-wise discrete diffusion. For conditional generation, we propose to inject layout constraints in the form of masking or logit adjustment during inference. We show in the experiments that our LayoutDM successfully generates high-quality layouts and outperforms both task-specific and task-agnostic baselines on several layout tasks.(1)
We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computervision is substantial i...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computervision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie(1), a famous Italian company with hundreds of shops located in different areas within the country. In particular, we focus on a specific prediction problem, namely short-observation new product sale forecasting (SO-fore). SO-fore assumes that the season has started and a set of new products is on the shelves of the different stores. The goal is to forecast the sales for a particular horizon, given a short, available past (few weeks), since no earlier statistics are available. To be successful, SO-fore approaches should capture this short past and exploit other modalities or exogenous data. To these aims, Visuelle 2.0 is equipped with disaggregated data at the item-shop level and multi-modal information for each clothing item, allowing computervision approaches to come into play. The main message that we deliver is that the use of image data with deep networks boosts performances obtained when using the time series in long-term forecasting scenarios, ameliorating the WAPE by 8.2% and the MAE by 7.7%. The dataset is available at : https://***/forecasting/visuelle.
Due to the covariate shift, deep neural networks performance always degrades when applied to novel domains. In order to mitigate this problem, domain adaptation techniques require samples from target data during the f...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Due to the covariate shift, deep neural networks performance always degrades when applied to novel domains. In order to mitigate this problem, domain adaptation techniques require samples from target data during the feature extraction training, which is not always applicable in real-world scenarios. Batch Normalization is a known component of computervision models, aiming at reducing the training-time covariate shift. However, facing distribution shift results in an internal state mismatch inside the Batch-Norm layers during the inference time. In favor of alleviating the induced mismatch, this paper proposes a source-free, lightweight and straightforward approach by introducing the "Visual Domain Bridge" concept reducing the BatchNorm's internal mismatch in the cross-domain settings. Compared to the other BatchNorm-based source-free domain adaptation techniques such as AdaBN and Prediction-BN, our method formed a new state-of-the-art cross-domain few-shot fine-tuning method neglecting extra augmentations;while improving the performance in near-domain settings too. The proposed method can integrate with other domain adaptation methods and enhance their performance requiring just a few lines of modification in the BatchNorm's implementation. Implementations are available in https://***/MosyMosy/VDB
Remote photoplethysmography (rPPG) is a contactless method to measure human vital signs by detecting subtle skin color changes through a camera. Although many studies have used region of interest (ROI) selection tools...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Remote photoplethysmography (rPPG) is a contactless method to measure human vital signs by detecting subtle skin color changes through a camera. Although many studies have used region of interest (ROI) selection tools to improve rPPG signal extraction, no study has investigated the influence of the ROI's surface orientation. We propose a novel 'angle map' representation of the face to study the effects of the surface orientation on the extracted rPPG signal. The angle map is generated by mapping each facial pixel to an angle of reflection (angle between the skin surface and the camera) calculated from the surface normal of the facial landmarks and the camera axis. Our results show that surface orientation significantly affects the correlation between the extracted rPPG signal and ground truth blood volume pulse (BVP). Regions with small angles of reflection contained stronger signals, which explains why areas near the cheeks and forehead are often chosen for rPPG signal extraction. Moreover, we applied a thresholding method to the angle map and demonstrated its potential for dynamic ROI selection, thereby optimising the rPPG signal extraction process.
Transformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as image completion or image classification. Here we propose to use a sequential image representation...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Transformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as image completion or image classification. Here we propose to use a sequential image representation, where each prefix of the complete sequence describes the whole image at reduced resolution. Using such Fourier Domain Encodings (FDEs), an auto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution input. Additionally, we show that an encoder-decoder setup can be used to query arbitrary Fourier coefficients given a set of Fourier domain observations. We demonstrate the practicality of this approach in the context of computed tomography (CT) image reconstruction. In summary, we show that Fourier Image Transformer (FIT) can be used to solve relevant image analysis tasks in Fourier space, a domain inherently inaccessible to convolutional architectures.
How to build a system for robust classification and recognition of facial expressions has been one of the most important research issues for successful interactive computing applications. However, previous datasets an...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
How to build a system for robust classification and recognition of facial expressions has been one of the most important research issues for successful interactive computing applications. However, previous datasets and studies mainly focused on facial expression recognition in a controlled/lab setting, therefore, could hardly be generalized in a more practical and real-life environment. The Affective Behavior Analysis in-the-wild (ABAW) 2022 competition released a dataset consisting of various video clips of facial expressions in-the-wild. In this paper, we propose a method based on the ensemble of multi-head cross attention networks to address the facial expression classification task introduced in the ABAW 2022 competition. We built a uni-task approach for this task, achieving the average F1-score of 34.60 on the validation set and 33.77 on the test set, ranking second place on the final leaderboard.
暂无评论