Composed query image retrieval is a new problem where the query consists of an image together with a requested modification expressed via a textual sentence. The goal is then to retrieve the images that are generally ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171685
Composed query image retrieval is a new problem where the query consists of an image together with a requested modification expressed via a textual sentence. The goal is then to retrieve the images that are generally similar to the query image, but differ according to the requested modification. Previous methods usually consider the image as a whole. In this paper, we propose a novel method that represents the image using a set of local areas in the image. The relationship between each word in the modification text and each area in the image is then explicitly established, allowing the model to accurately correlate the modification text to parts of the image. We conduct extensive experiments on three benchmark datasets. The results show that our method outperforms other state-of-the-art approaches by a considerable margin.
User identification from hand images only is still a challenging task. In this paper, we propose a new biometric identification system based solely on a skin patch from a multispectral image. The system is utilizing a...
详细信息
ISBN:
(纸本)9781728132938
User identification from hand images only is still a challenging task. In this paper, we propose a new biometric identification system based solely on a skin patch from a multispectral image. The system is utilizing a novel modified 3D CNN architecture which is taking advantage of multispectral data. We demonstrate the application of our system for the example of human identification from multispectral images of hands. To the best of our knowledge, this paper is the first to describe a pose-invariant and robust to overlapping real-time human identification system using hands. Additionally, we provide a framework to optimize the required spectral bands for the given spatial resolution limitations.
During the past few years, different methods for optimizing the camera settings and post-processing techniques to improve the subjective quality of consumer photos have been studied extensively. However, most of the r...
详细信息
ISBN:
(纸本)9781728132938
During the past few years, different methods for optimizing the camera settings and post-processing techniques to improve the subjective quality of consumer photos have been studied extensively. However, most of the research in the priorart has focused on finding the optimal method for an average user. Since there is large deviation in personal opinions and aesthetic standards, the next challenge is to find the settings and post-processing techniques thatfit to the individualusers 'personaltaste. In this study, we aim to predict the personallyperceived image quality by combining classical imagefeature analysis and collaboration filtering approach known from the recommendation systems. The experimental resultsfor the proposed method show promising results. As a practical application, our work can be used for personalizing the camera settings orpost-processingparameterfsor different users and images.
With the development of deep learning, neural networks tend to be deeper and larger to achieve good performance. Trained models are more compute-intensive and memory-intensive, which lead to the big challenges on memo...
详细信息
ISBN:
(纸本)9781665445092
With the development of deep learning, neural networks tend to be deeper and larger to achieve good performance. Trained models are more compute-intensive and memory-intensive, which lead to the big challenges on memory bandwidth, storage, latency, and throughput. In this paper, we propose the neural network compression method named minimally invasive surgery. Different from traditional model compression and knowledge distillation methods, the proposed method refers to the minimally invasive surgery principle. It learns the principal features from a pair of dense and compressed models in a contrastive manner. It also optimizes the neural networks to meet the specific hardware acceleration requirements. Through qualitative, quantitative, and ablation experiments, the proposed method shows a compelling performance, acceleration, and generalization in various tasks.
We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of computationally involved adversarial objectives, ne...
详细信息
ISBN:
(纸本)9781665445092
We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of computationally involved adversarial objectives, network ensembles and style transfer. Instead, we employ standard data augmentation techniques photometric noise, flipping and scaling and ensure consistency of the semantic predictions across these image transformations. We develop this principle in a lightweight self-supervised framework trained on co-evolving pseudo labels without the need for cumbersome extra training rounds. Simple in training from a practitioner's standpoint, our approach is remarkably effective. We achieve significant improvements of the state-of-the-art segmentation accuracy after adaptation, consistent both across different choices of the backbone architecture and adaptation scenarios.
A phrase grounding model receives an input image and a text phrase and outputs a suitable localization map. We present an effective way to refine a phrase ground model by considering self-similarity maps extracted fro...
详细信息
ISBN:
(纸本)9798350301298
A phrase grounding model receives an input image and a text phrase and outputs a suitable localization map. We present an effective way to refine a phrase ground model by considering self-similarity maps extracted from the latent representation of the model's image encoder. Our main insights are that these maps resemble localization maps and that by combining such maps, one can obtain useful pseudo-labels for performing self-training. Our results surpass, by a large margin, the state of the art in weakly supervised phrase grounding. A similar gap in performance is obtained for a recently proposed downstream task called WWbL, in which only the image is input, without any text. Our code is available at https://***/talshaharabany/Similarity-Maps-forSelf-Training-Weakly-Supervised- Phrase-Grounding.
Depth-from-focus (DFF) is a technique that infers depth using the focus change of a camera. In this work, we propose a convolutional neural network (CNN) to find the best-focused pixels in a focal stack and infer dept...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Depth-from-focus (DFF) is a technique that infers depth using the focus change of a camera. In this work, we propose a convolutional neural network (CNN) to find the best-focused pixels in a focal stack and infer depth from the focus estimation. The key innovation of the network is the novel deep differential focus volume (DFV). By computing the first-order derivative with the stacked features over different focal distances, DFV is able to capture both the focus and context information for focus analysis. Besides, we also introduce a probability regression mechanism for focus estimation to handle sparsely sampled focal stacks and provide uncertainty estimation to the final prediction. Comprehensive experiments demonstrate that the proposed model achieves state-of-the-art performance on multiple datasets with good generalizability and fast speed.
We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary RGB videos. While seemingly easy for humans, this problem poses many challenges for computers. Our approach is...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary RGB videos. While seemingly easy for humans, this problem poses many challenges for computers. Our approach is based on a top-down detection system that finds planes that can be articulated. This approach is followed by optimizing for a 3D plane that explains a sequence of detected articulations. We show that this system can be trained on a combination of videos and 3D scan datasets. When tested on a dataset of challenging Internet videos and the Charades dataset, our approach obtains strong performance.
Blind motion deblurring is an important problem that receives enduring attention in last decade. Based on the observation that a good intermediate estimate of latent image for estimating motion-blur kernel is not nece...
详细信息
ISBN:
(纸本)9781728132938
Blind motion deblurring is an important problem that receives enduring attention in last decade. Based on the observation that a good intermediate estimate of latent image for estimating motion-blur kernel is not necessarily the one closest to latent image, edge selection has proven itself a very powerful technique for achieving state-of-the-art performance in blind deblurring. This paper presented an interpretation of edge selection/reweighting in terms of variational Bayes inference, and therefore developed a novel variational expectation maximization (VEM) algorithm with built-in adaptive edge selection for blind deblurring. Together with a restart strategy for avoiding undesired local convergence, the proposed VEM method not only has a solid mathematical foundation but also noticeably outperformed the state-of-the-art methods on benchmark datasets.
Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. Yet, especially for heterogeneous models, a key bottleneck remains the tran...
详细信息
ISBN:
(纸本)9798350365474
Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. Yet, especially for heterogeneous models, a key bottleneck remains the transfer of knowledge gained from each client model with the server. One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset on which predictions are exchanged. However, in many contexts such a dataset might be difficult to acquire due to privacy and the clients might not allow for storage of a large shared dataset. To this end, in this paper, we introduce a new method that improves this knowledge distillation method to only rely on a single shared image between clients and server. In particular, we propose a novel adaptive dataset pruning algorithm that selects the most informative crops generated from only a single image. With this, we show that federated learning with distillation under a limited shared dataset budget works better by using a single image compared to multiple individual ones. Finally, we extend our approach to allow for training heterogeneous client architectures by incorporating a non-uniform distillation schedule and client-model mirroring on the server side.
暂无评论