Digital art restoration has benefited from inpainting models to correct the degradation or missing sections of a painting. This work compares three current state-of-the art models for inpainting of large missing regio...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Digital art restoration has benefited from inpainting models to correct the degradation or missing sections of a painting. This work compares three current state-of-the art models for inpainting of large missing regions. We provide qualitative and quantitative comparison of the performance by CoModGANs, LaMa and GLIDE in inpainting of blurry and missing sections of images. We use Escher's incomplete painting Print Gallery as our test study since it presents several of the challenges commonly present in restorative inpainting.
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus...
详细信息
ISBN:
(纸本)9781665448994
Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities. We demonstrate the utility of DMVAE two image modalities of MNIST and Google Street View House Number (SVHN) datasets as well as image and text modalities from the Oxford-102 Flowers dataset. Our experiments indicate the essence of retaining the private representation as well as the private-shared disentanglement to effectively direct the information across multiple analysis-synthesis conduits.
Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variabl...
详细信息
ISBN:
(纸本)9781665487399
Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variables as a regularization term. The intuition behind it was to ensure high likelihood latent variables that produce the closest restoration. However, high-likelihood latent variables may generate unrealistic samples as we show in our experiments. We therefore propose a solver to directly produce high-likelihood reconstructions. We hypothesize that our approach could make generative flows a general purpose solver for inverse problems. Furthermore, we propose 1x1 coupling functions to introduce permutations in a generative flow. It has the advantage that its inverse does not require to be calculated in the generation process. Finally, we evaluate our method for denoising, deblurring, inpainting, and colorization. We observe a compelling improvement of our method over prior works.
Before we can obfuscate portions of an image to enhance privacy, we must know what portions are considered sensitive. In this paper, we report results from a study aimed at identifying sensitive content in photos from...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Before we can obfuscate portions of an image to enhance privacy, we must know what portions are considered sensitive. In this paper, we report results from a study aimed at identifying sensitive content in photos from a human-centered perspective. We collected sensitive photos and/or descriptions of sensitive photos from participants and asked them to identify which elements of the photo made each photo sensitive. Using this information, we propose an initial two-level taxonomy of sensitive content categories. This taxonomy may be useful to privacy researchers, online social network designers, policy makers, computervision researchers and anyone wishing to identify potentially sensitive content in photos. We conclude by providing insights about how these results may be used to enhance computervision approaches to protecting image privacy.
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement tec...
详细信息
ISBN:
(纸本)9781665487399
The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement technique, ResFPN (Residual Feature Pyramid Network), as a generic preprocessing method agnostic to the recognition pipeline. A bottom-up pyramidal architecture using the novel Structure Detection block (SDBlock) facilitates extraction of veins of varied widths. Using a feature aggregation module (FAM), we combine these vein-structures, and train the proposed ResFPN for detection of veins across scales. With enhanced presentations, our experiments indicate a reduction upto 5% in the average recognition errors for commonly used recognition pipeline over two publicly available datasets. These improvements are persistent even in cross-dataset scenario where the dataset used to train the ResFPN is different from the one used for recognition.
We present an efficient method for the reconstruction of multispectral information from RGB images, as part of the NTIRE 2022 Spectral Reconstruction Challenge. Given an input image, our method determines a global RGB...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present an efficient method for the reconstruction of multispectral information from RGB images, as part of the NTIRE 2022 Spectral Reconstruction Challenge. Given an input image, our method determines a global RGB-to-spectral linear transformation matrix, based on a search through optimal matrices from training images that share low-level features with the input. The resulting spectral signatures are then adjusted by a global scaling factor, determined through a lightweight SqueezeNet-inspired neural network. By combining the efficiency of linear transformation matrices with the data-driven effectiveness of convolutional neural networks, we are able to achieve superior performance than winners of the previous editions of the challenge.
Few-shot learning aims to build classifiers for new classes from a small number of labeled examples and is commonly facilitated by access to examples from a distinct set of 'base classes'. The difference in da...
详细信息
ISBN:
(纸本)9781665448994
Few-shot learning aims to build classifiers for new classes from a small number of labeled examples and is commonly facilitated by access to examples from a distinct set of 'base classes'. The difference in data distribution between the test set (novel classes) and the base classes used to learn an inductive bias often results in poor generalization on the novel classes. To alleviate problems caused by the distribution shift, previous research has explored the use of unlabeled examples from the novel classes, in addition to labeled examples of the base classes, which is known as the transductive setting. In this work, we show that, surprisingly, off-the-shelf self-supervised learning outperforms transductive few-shot methods by 3.9% for 5-shot accuracy on miniImageNet without using any base class labels. This motivates us to examine more carefully the role of features learned through self-supervision in few-shot learning. Comprehensive experiments are conducted to compare the transferability, robustness, efficiency, and the complementarity of supervised and self-supervised features.
The goal of this paper is to model the fashion compatibility of an outfit and provide the explanations. We first extract features of all attributes of all items via convolutional neural networks, and then train the bi...
详细信息
ISBN:
(纸本)9781665448994
The goal of this paper is to model the fashion compatibility of an outfit and provide the explanations. We first extract features of all attributes of all items via convolutional neural networks, and then train the bidirectional Long Shortterm Memory (Bi-LSTM) model to learn the compatibility of an outfit by treating these attribute features as a sequence. Gradient penalty regularization is exploited for training inter-factor compatibility net which is used to compute the loss for judgment and provide its explanation which is generated from the recognized reasons related to the judgment. To train and evaluate the proposed approach, we expanded the EVALUATION3 dataset in terms of the number of items and attributes. Experiment results show that our approach can successfully evaluate compatibility with reason.
Fashion-on-demand is becoming an important concept for fashion industries. Many attempts have been made to leverage machine learning methods to generate fashion designs tailored to customers' tastes. However, how ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Fashion-on-demand is becoming an important concept for fashion industries. Many attempts have been made to leverage machine learning methods to generate fashion designs tailored to customers' tastes. However, how to assemble items together (e.g., compatibility) is crucial in designing high-quality outfits for synthesis images. Here we propose a fashion generation model, named OutfitGAN, which contains two core modules: a Generative Adversarial Network and a Compatibility Network. The generative module is able to generate new realistic high quality fashion items from a specific category, while the compatibility network ensures reasonable compatibility among all items. The experimental results show the superiority of our OutfitGAN.
In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR) an image is combined with a text that provides infor...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR) an image is combined with a text that provides information regarding user intentions, and is relevant for application domains like e-commerce. The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder. Then in a second training stage we learn a more complex combiner network that merges visual and textual features. Contrastive learning is used in both stages. The proposed approach obtains state-of-the-art performance for conditioned CBIR on the FashionIQ dataset and for composed CBIR on the more recent CIRR dataset.
暂无评论