Unsupervised domain adaptation is one of the challenging problems in computervision. This paper presents a novel approach to unsupervised domain adaptations based on the optimal transport-based distance. Our approach...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Unsupervised domain adaptation is one of the challenging problems in computervision. This paper presents a novel approach to unsupervised domain adaptations based on the optimal transport-based distance. Our approach allows aligning target and source domains without the requirement of meaningful metrics across domains. In addition, the proposal can associate the correct mapping between source and target domains and guarantee a constraint of topology between source and target domains. The proposed method is evaluated on different datasets in various problems, i.e. (i) digit recognition on MNIST, MNIST-M, USPS datasets, (ii) Object recognition on Amazon, Webcam, DSLR, and VisDA datasets, (iii) Insect recognition on the IP102 dataset. The experimental results show our proposed method consistently improves performance accuracy. Also, our framework can be incorporated with any other CNN frameworks within an end-to-end deep network design for recognition problems to improve their performance.
Despite the success of machine learning applications in science, industry, and society in general, many approaches are known to be non-robust, often relying on spurious correlations to make predictions. Spuriousness o...
详细信息
ISBN:
(纸本)9781665445092
Despite the success of machine learning applications in science, industry, and society in general, many approaches are known to be non-robust, often relying on spurious correlations to make predictions. Spuriousness occurs when some features correlate with labels but are not causal;relying on such features prevents models from generalizing to unseen environments where such correlations break. In this work, we focus on image classification and propose two data generation processes to reduce spuriousness. Given human annotations of the subset of the features responsible (causal) for the labels (e.g. bounding boxes), we modify this causal set to generate a surrogate image that no longer has the same label (i.e. a counterfactual image). We also alter non-causal features to generate images still recognized as the original labels, which helps to learn a model invariant to these features. In several challenging datasets, our data generations outperform state-of-the-art methods in accuracy when spurious correlations break, and increase the saliency focus on causal features providing better explanations.
Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes loc...
详细信息
ISBN:
(纸本)9781665445092
Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This makes them expressive, but also computationally infeasible for long sequences, such as high-resolution images. We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images. We show how to (i) use CNNs to learn a contextrich vocabulary of image constituents, and in turn (ii) utilize transformers to efficiently model their composition within high-resolution images. Our approach is readily applied to conditional synthesis tasks, where both non-spatial information, such as object classes, and spatial information, such as segmentations, can control the generated image. In particular, we present the first results on semantically-guided synthesis of megapixel images with transformers. Project page at https://***/JLlvY.
Personal photos of individuals when shared online, apart from exhibiting a myriad of memorable details, also reveals a wide range of private information and potentially entails privacy risks (e.g., online harassment, ...
详细信息
ISBN:
(纸本)9781665448994
Personal photos of individuals when shared online, apart from exhibiting a myriad of memorable details, also reveals a wide range of private information and potentially entails privacy risks (e.g., online harassment, tracking). To mitigate such risks, it is crucial to study techniques that allow individuals to limit the private information leaked in visual data. We tackle this problem in a novel image obfuscation framework: to maximize entropy on inferences over targeted privacy attributes, while retaining image fidelity. We approach the problem based on an encoder-decoder style architecture, with two key novelties: (a) introducing a discriminator to perform bi-directional translation simultaneously from multiple unpaired domains;(b) predicting an image interpolation which maximizes uncertainty over a target set of attributes. We find our approach generates obfuscated images faithful to the original input images and additionally increases uncertainty by 6.2x (or up to 0.85 bits) over the non-obfuscated counterparts.
In many real-world applications of Deep Neural Networks (DNNs) in visual recognition, data augmentation stands out as a premier tool for enhancing model robustness. Stemming from the understanding of the common mechan...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
In many real-world applications of Deep Neural Networks (DNNs) in visual recognition, data augmentation stands out as a premier tool for enhancing model robustness. Stemming from the understanding of the common mechanisms of data augmentation methods, we introduce the mask-based "data augmentation boost" (DaBoost) method, a strategic approach that exploits the control of game interaction strength. Our empirical results are telling: DaBoost not only consistently surpasses the state-of-the-art PixMix method but also achieves impressive robustness metrics, with a vanilla WideResNet registering a mere 6.5% mCE and a 2.3% RMS calibration error on CIFAR-10 data. An intriguing observation from our study is the Long-Rope Effect. We discerned that penalizing high-order interactions inadvertently leads to a boost in mid-order interactions, mirroring patterns inherent to human cognitive processes. This interplay hints at the potential avenues for optimizing DNNs' performance further.
Social media images are generally transformed by filtering to obtain aesthetically more pleasing appearances. However, CNNs generally fail to interpret both the image and its filtered version as the same in the visual...
详细信息
ISBN:
(纸本)9781665448994
Social media images are generally transformed by filtering to obtain aesthetically more pleasing appearances. However, CNNs generally fail to interpret both the image and its filtered version as the same in the visual analysis of social media images. We introduce Instagram Filter Removal Network (IFRNet) to mitigate the effects of image filters for social media analysis applications. To achieve this, we assume any filter applied to an image substantially injects a piece of additional style information to it, and we consider this problem as a reverse style transfer problem. The visual effects of filtering can be directly removed by adaptively normalizing external style information in each level of the encoder. Experiments demonstrate that IFRNet outperforms all compared methods in quantitative and qualitative comparisons, and has the ability to remove the visual effects to a great extent. Additionally, we present the filter classification performance of our proposed model, and analyze the dominant color estimation on the images unfiltered by all compared methods.
Combining Natural Language with vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the ...
详细信息
ISBN:
(纸本)9781665448994
Combining Natural Language with vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://***/cscribano/AYCE_2021.
Fairness in visual recognition is becoming a prominent and critical topic of discussion as recognition systems are deployed at scale in the real world. Models trained from data in which target labels are correlated wi...
详细信息
ISBN:
(纸本)9781665445092
Fairness in visual recognition is becoming a prominent and critical topic of discussion as recognition systems are deployed at scale in the real world. Models trained from data in which target labels are correlated with protected attributes (e.g., gender, race) are known to learn and exploit those correlations. In this work, we introduce a method for training accurate target classifiers while mitigating biases that stem from these correlations. We use GANs to generate realistic-looking images, and perturb these images in the underlying latent space to generate training data that is balanced for each protected attribute. We augment the original dataset with this generated data, and empirically demonstrate that target classifiers trained on the augmented dataset exhibit a number of both quantitative and qualitative benefits. We conduct a thorough evaluation across multiple target labels and protected attributes in the CelebA dataset, and provide an in-depth analysis and comparison to existing literature in the space. Code can be found at https://***/princetonvisualai/gan-debiasing.
The evolution of Knowledge Graphs (KGs), during the last two decades, has encouraged developers to create more and more context related KGs. This advance is extremely important because Artificial Intelligence (AI) app...
详细信息
ISBN:
(纸本)9781665482639
The evolution of Knowledge Graphs (KGs), during the last two decades, has encouraged developers to create more and more context related KGs. This advance is extremely important because Artificial Intelligence (AI) applications can access open domain specific information in a semantically rich, machine understandable format. In this paper, we present a KG for the various types of Skin Cancer, which can represent information about the symptoms and dangers to provoke, of the most common, based on cases, types of Skin Cancer. Moreover, we provide a data integration mechanism that can map information from various datasets with skin cancer cases into the skin cancer ontology. A use case scenario in which the KG can be used as a decision support system by doctors, is provided, in which a computervision (CV) simulates a doctor. The CV mechanism in the cases that is not able to classify the case it can consult the KG to retrieve similar cases based on the characteristics of the one at hand.
Handwritten Text recognition (HTR) is an open problem at the intersection of computervision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation ...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Handwritten Text recognition (HTR) is an open problem at the intersection of computervision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting - even of the same author over a wide time-span - and the scarcity of data from ancient, poorly represented languages. With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available. For both configurations, we analyze quantitative and qualitative characteristics, also with respect to other line-level HTR benchmarks, and present the recognition performance of state-of-the-art HTR architectures. The dataset is available for download at https://***/go/lam.
暂无评论