Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variabl...
详细信息
ISBN:
(纸本)9781665487399
Due to the success of generative flows to model data distributions, they have been explored in inverse problems. Given a pre-trained generative flow, previous work proposed to minimize the 2-norm of the latent variables as a regularization term. The intuition behind it was to ensure high likelihood latent variables that produce the closest restoration. However, high-likelihood latent variables may generate unrealistic samples as we show in our experiments. We therefore propose a solver to directly produce high-likelihood reconstructions. We hypothesize that our approach could make generative flows a general purpose solver for inverse problems. Furthermore, we propose 1x1 coupling functions to introduce permutations in a generative flow. It has the advantage that its inverse does not require to be calculated in the generation process. Finally, we evaluate our method for denoising, deblurring, inpainting, and colorization. We observe a compelling improvement of our method over prior works.
Fashion-on-demand is becoming an important concept for fashion industries. Many attempts have been made to leverage machine learning methods to generate fashion designs tailored to customers' tastes. However, how ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Fashion-on-demand is becoming an important concept for fashion industries. Many attempts have been made to leverage machine learning methods to generate fashion designs tailored to customers' tastes. However, how to assemble items together (e.g., compatibility) is crucial in designing high-quality outfits for synthesis images. Here we propose a fashion generation model, named OutfitGAN, which contains two core modules: a Generative Adversarial Network and a Compatibility Network. The generative module is able to generate new realistic high quality fashion items from a specific category, while the compatibility network ensures reasonable compatibility among all items. The experimental results show the superiority of our OutfitGAN.
We introduce a multimodal vision framework for precision livestock farming, harnessing the power of GroundingDINO, HQSAM, and ViTPose models. This integrated suite enables comprehensive behavioral analytics from video...
详细信息
ISBN:
(纸本)9798350365474
We introduce a multimodal vision framework for precision livestock farming, harnessing the power of GroundingDINO, HQSAM, and ViTPose models. This integrated suite enables comprehensive behavioral analytics from video data without invasive animal tagging. GroundingDINO generates accurate bounding boxes around livestock, while HQSAM segments individual animals within these boxes. ViTPose estimates key body points, facilitating posture and movement analysis. Demonstrated on a sheep dataset with grazing, running, sitting, standing, and walking activities, our framework extracts invaluable insights: activity and grazing patterns, interaction dynamics, and detailed postural evaluations. Applicable across species and video resolutions, this framework revolutionizes non-invasive livestock monitoring for activity detection, counting, health assessments, and posture analyses. It empowers data-driven farm management, optimizing animal welfare and productivity through AI-powered behavioral understanding.
With the emergence of fashion recommendation, many researchers have attempted to recommend fashion items that fit consumers' tastes. However, few have looked into fashion outfits as a whole when making recommendat...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
With the emergence of fashion recommendation, many researchers have attempted to recommend fashion items that fit consumers' tastes. However, few have looked into fashion outfits as a whole when making recommendations. In this paper, we propose a neural network that learns one's fashion taste and predicts whether an individual likes a fashion outfit. To improve learning, we also develop a fashion outfit negative sampling scheme to sample fashion outfits that are different enough. With experiments on the collected Polyvore dataset, we find that using complete images offashion outfits performs well when learning individuals' tastes toward fashion outfits. Our proposed negative sampling scheme also improves the model's performance significantly, compared to random negative sampling.
Few-shot learning aims to build classifiers for new classes from a small number of labeled examples and is commonly facilitated by access to examples from a distinct set of 'base classes'. The difference in da...
详细信息
ISBN:
(纸本)9781665448994
Few-shot learning aims to build classifiers for new classes from a small number of labeled examples and is commonly facilitated by access to examples from a distinct set of 'base classes'. The difference in data distribution between the test set (novel classes) and the base classes used to learn an inductive bias often results in poor generalization on the novel classes. To alleviate problems caused by the distribution shift, previous research has explored the use of unlabeled examples from the novel classes, in addition to labeled examples of the base classes, which is known as the transductive setting. In this work, we show that, surprisingly, off-the-shelf self-supervised learning outperforms transductive few-shot methods by 3.9% for 5-shot accuracy on miniImageNet without using any base class labels. This motivates us to examine more carefully the role of features learned through self-supervision in few-shot learning. Comprehensive experiments are conducted to compare the transferability, robustness, efficiency, and the complementarity of supervised and self-supervised features.
We demonstrate that is it possible to automatically find representative example images of a specified object category These canonical examples are perhaps the kind of images that one would show a child to teach them w...
详细信息
ISBN:
(纸本)9781424439942
We demonstrate that is it possible to automatically find representative example images of a specified object category These canonical examples are perhaps the kind of images that one would show a child to teach them what, for example a horse is - images with a large object clearly separated from the background. Given a large collection of images returned by a web search for an object category, our approach proceeds without an), user supplied training data for the category. First images are ranked according to a category independent composition model that predicts whether the), contain a large clearly depicted object, and outputs an estimated location of that object. Then local features calculated on the proposed object regions are used to eliminate images not distinctive to the category, and to cluster images by similarity of object appearance. We present results and a user evaluation on a variety of object categories, demonstrating the effectiveness of the approach.
Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has be...
详细信息
ISBN:
(纸本)9798350365474
Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has been made in multimodal representation learning for text and image data. This paper explores a novel research direction that aims to connect the NeRF modality with other modalities, similar to established methodologies for images and text. To this end, we propose a simple framework that exploits pre-trained models for NeRF representations alongside multimodal models for text and image processing. Our framework learns a bidirectional mapping between NeRF embeddings and those obtained from corresponding images and text. This mapping unlocks several novel and useful applications, including NeRF zero-shot classification and NeRF retrieval from images or text.
The goal of this paper is to model the fashion compatibility of an outfit and provide the explanations. We first extract features of all attributes of all items via convolutional neural networks, and then train the bi...
详细信息
ISBN:
(纸本)9781665448994
The goal of this paper is to model the fashion compatibility of an outfit and provide the explanations. We first extract features of all attributes of all items via convolutional neural networks, and then train the bidirectional Long Shortterm Memory (Bi-LSTM) model to learn the compatibility of an outfit by treating these attribute features as a sequence. Gradient penalty regularization is exploited for training inter-factor compatibility net which is used to compute the loss for judgment and provide its explanation which is generated from the recognized reasons related to the judgment. To train and evaluate the proposed approach, we expanded the EVALUATION3 dataset in terms of the number of items and attributes. Experiment results show that our approach can successfully evaluate compatibility with reason.
The semantic segmentation of agricultural aerial images is very important for the recognition and analysis of farmland anomaly patterns, such as drydown, endrow, nutrient deficiency, etc. Methods for general semantic ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The semantic segmentation of agricultural aerial images is very important for the recognition and analysis of farmland anomaly patterns, such as drydown, endrow, nutrient deficiency, etc. Methods for general semantic segmentation such as Fully Convolutional Networks can extract rich semantic features, but are difficult to exploit the long-range information. Recently, vision Transformer architectures have made outstanding performances in image segmentation tasks, but transformer-based models have not been fully explored in the field of ***, we propose a novel architecture called Agricultural Aerial Transformer (AAFormer) to solve the semantic segmentation of aerial farmland images. We adopt Mix Transformer (MiT) in the encoder stage to enhance the ability of field anomaly patternrecognition and leverage the Squeeze-and-Excitation (SE) module in the decoder stage to improve the effectiveness of key channels. The boundary maps of farmland are introduced into the decoder. Evaluated on the Agriculture-vision validation set, the mIoU of our proposed model reaches 45.44%.
Irrigation systems can vary widely in scale, from smallscale subsistence farming to large commercial agriculture (see Fig. 1 ). The heterogeneity in irrigation practices and systems across different regions adds to th...
详细信息
暂无评论