We address the problem of computing a textural loss based on the statistics extracted from the feature activations of a convolutional neural network optimized for object recognition (e.g. VGG-19). The underlying mathe...
详细信息
ISBN:
(纸本)9781665445092
We address the problem of computing a textural loss based on the statistics extracted from the feature activations of a convolutional neural network optimized for object recognition (e.g. VGG-19). The underlying mathematical problem is the measure of the distance between two distributions in feature space. The Gram-matrix loss is the ubiquitous approximation for this problem but it is subject to several shortcomings. Our goal is to promote the Sliced Wasserstein Distance as a replacement for it. It is theoretically proven, practical, simple to implement, and achieves results that are visually superior for texture synthesis by optimization or training generative neural networks.
Capturing and understanding visual signals is one of the core interests of computervision. Much progress has been made w.r.t. many aspects of imaging, but the reconstruction of refractive phenomena, such as turbulenc...
详细信息
ISBN:
(纸本)9781479951178
Capturing and understanding visual signals is one of the core interests of computervision. Much progress has been made w.r.t. many aspects of imaging, but the reconstruction of refractive phenomena, such as turbulence, gas and heat flows, liquids, or transparent solids, has remained a challenging problem. In this paper, we derive an intuitive formulation of light transport in refractive media using light fields and the transport of intensity equation. We show how coded illumination in combination with pairs of recorded images allow for robust computational reconstruction of dynamic two and three-dimensional refractive phenomena.
Gestures are a common form of human communication and important for human computer interfaces (HCI). Recent approaches to gesture recognition use deep learning methods, including multi-channel methods. We show that wh...
详细信息
ISBN:
(纸本)9781538664209
Gestures are a common form of human communication and important for human computer interfaces (HCI). Recent approaches to gesture recognition use deep learning methods, including multi-channel methods. We show that when spatial channels are focused on the hands, gesture recognition improves significantly, particularly when the channels are fused using a sparse network. Using this technique, we improve performance on the ChaLearn IsoGD dataset from a previous best of 67.71% to 82.07%, and on the NVIDIA dataset from 83.8% to 91.28%.
Tire's paper describes a representation for people and animals, called a body plan, which is adapted to segmentation and to recognition in complex environments. The representation is an organized collection of gro...
详细信息
ISBN:
(纸本)0780342364
Tire's paper describes a representation for people and animals, called a body plan, which is adapted to segmentation and to recognition in complex environments. The representation is an organized collection of grouping hints obtained from a combination of constraints on color and texture and constraints on geometric properties such as the structure of individual parts and the relationships between parts. Body plans can be learned from image data, using established statistical learning techniques. The approach is illustrated with two examples of programs that successfully use body plans for recognition: one example involves determining whether a picture contains a scantily clad human, using a body plan built by hand;We other involves determining whether a picture contains a horse, using a body plan learned front image data. In both cases, the system demonstrates excellent performance on large, uncontrolled test sets and very large and diverse control sets.
Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambe...
详细信息
ISBN:
(纸本)0780342364
Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambertian scene by constructing the intensity manifold. For these problems, one must deal with a data matrix with some missing elements. In structure-from-motion, missing elements will occur if some point features are not visible in some frames. To construct the intensity manifold missing matrix elements will arise when the surface normals of some scene points do not face the light source in some images. We propose a novel method for fitting a low rank matrix to a matrix with missing elements. We show experimentally that our method produces good results in the presence of noise. These results can be either used directly, or can serve as an excellent starting point for an iterative method.
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classifi...
详细信息
ISBN:
(纸本)9780769549897
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy.
We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and J...
详细信息
ISBN:
(纸本)9781665445092
We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and Jaccard Frobenius inner product over covariances. Using these combinations of novel losses and using our framework, we obtain state-of-the-art results for early action recognition in UCF101 and JHMDB datasets by obtaining 91.7 % and 83.5 % accuracy respectively for an observation percentage of 20. Similarly, we obtain state-of-the-art results for Epic-Kitchen55 and Breakfast datasets for action anticipation by obtaining 20.35 and 41.8 top-1 accuracy respectively.
This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern ...
详细信息
ISBN:
(纸本)0818672587
This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern retrieval is evaluated on a large texture image database. These features compare favorably with other existing texture representations. A simple hybrid neural network algorithm is used to learn the similarity by simple clustering in the texture feature space. With learning similarity, the performance of similar pattern retrieval improves significantly. An important aspect of this work is its application to real image data. Texture feature extraction with similarity learning is used to search through large aerial photographs. Feature clustering enables efficient search of the database as our experimental results indicate.
Illumination estimation is the process of determining the chromaticity of the illumination in an imaged scene in order to remove undesirable color casts through white-balancing. While computational color constancy is ...
详细信息
ISBN:
(纸本)9781467369640
Illumination estimation is the process of determining the chromaticity of the illumination in an imaged scene in order to remove undesirable color casts through white-balancing. While computational color constancy is a well-studied topic in computervision, it remains challenging due to the ill posed nature of the problem. One class of techniques relies on low-level statistical information in the image color distribution and works under various assumptions (e.g. Grey World, White-Patch, etc). These methods have an advantage that they are simple and fast, but often do not perform well. More recent state-of-the-art methods employ learning-based techniques that produce better results, but often rely on complex features and have long evaluation and training times. In this paper, we present a learning-based method based on four simple color features and show how to use this with an ensemble of regression trees to estimate the illumination. We demonstrate that our approach is not only faster than existing learning-based methods in terms of both evaluation and training time, but also gives the best results reported to date on modern color constancy data sets.
In this work, we formulate a new weakly supervised domain generalization approach for visual recognition by using loosely labeled web images/videos as training data. Specifically, we aim to address two challenging iss...
详细信息
ISBN:
(纸本)9781467369640
In this work, we formulate a new weakly supervised domain generalization approach for visual recognition by using loosely labeled web images/videos as training data. Specifically, we aim to address two challenging issues when learning robust classifiers: 1) coping with noise in the labels of training web images/videos in the source domain;and 2) enhancing generalization capability of learnt classifiers to any unseen target domain. To address the first issue, we partition the training samples in each class into multiple clusters. By treating each cluster as a "bag" and the samples in each cluster as "instances", we formulate a multi-instance learning (MIL) problem by selecting a subset of training samples from each training bag and simultaneously learning the optimal classifiers based on the selected samples. To address the second issue, we assume the training web images/videos may come from multiple hidden domains with different data distributions. We then extend our MIL formulation to learn one classifier for each class and each latent domain such that multiple classifiers from each class can be effectively integrated to achieve better generalization capability. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our new approach for visual recognition by learning from web data.
暂无评论