Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambe...
详细信息
ISBN:
(纸本)0780342364
Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambertian scene by constructing the intensity manifold. For these problems, one must deal with a data matrix with some missing elements. In structure-from-motion, missing elements will occur if some point features are not visible in some frames. To construct the intensity manifold missing matrix elements will arise when the surface normals of some scene points do not face the light source in some images. We propose a novel method for fitting a low rank matrix to a matrix with missing elements. We show experimentally that our method produces good results in the presence of noise. These results can be either used directly, or can serve as an excellent starting point for an iterative method.
Many problems in image/video processing and computervision require the computation of a dense k-nearest neighbor field (k-NNF) between two images. For each patch in a query image, the k-NNF determines the positions o...
详细信息
ISBN:
(纸本)9781538664209
Many problems in image/video processing and computervision require the computation of a dense k-nearest neighbor field (k-NNF) between two images. For each patch in a query image, the k-NNF determines the positions of the k most similar patches in a database image. With the introduction of the PatchMatch algorithm, Barnes et al. demonstrated that this large search problem can be approximated efficiently by collaborative search methods that exploit the local coherency of image patches. After its introduction, several variants of the original PatchMatch algorithm have been proposed, some of them reducing the computational time by two orders of magnitude. In this work we study the convergence of PatchMatch and its variants, and derive bounds on their convergence rate. We consider a generic PatchMatch algorithm from which most specific instances found in the literature can be derived as particular cases. We also derive more specific bounds for two of these particular cases: the original PatchMatch and Coherency Sensitive Hashing. The proposed bounds are validated by contrasting them to the convergence observed in practice.
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classifi...
详细信息
ISBN:
(纸本)9780769549897
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy.
We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and J...
详细信息
ISBN:
(纸本)9781665445092
We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and Jaccard Frobenius inner product over covariances. Using these combinations of novel losses and using our framework, we obtain state-of-the-art results for early action recognition in UCF101 and JHMDB datasets by obtaining 91.7 % and 83.5 % accuracy respectively for an observation percentage of 20. Similarly, we obtain state-of-the-art results for Epic-Kitchen55 and Breakfast datasets for action anticipation by obtaining 20.35 and 41.8 top-1 accuracy respectively.
This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern ...
详细信息
ISBN:
(纸本)0818672587
This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern retrieval is evaluated on a large texture image database. These features compare favorably with other existing texture representations. A simple hybrid neural network algorithm is used to learn the similarity by simple clustering in the texture feature space. With learning similarity, the performance of similar pattern retrieval improves significantly. An important aspect of this work is its application to real image data. Texture feature extraction with similarity learning is used to search through large aerial photographs. Feature clustering enables efficient search of the database as our experimental results indicate.
Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel- wise information together within local receptive fields. In order to boost the...
详细信息
ISBN:
(纸本)9781538664209
Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel- wise information together within local receptive fields. In order to boost the representational power of a network, several recent approaches have shown the benefit of enhancing spatial encoding. In this work, we focus on the channel relationship and propose a novel architectural unit, which we term the " Squeezeand- Excitation" ( SE) block, that adaptively recalibrates channel- wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state- ofthe- art deep architectures at minimal additional computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top- 5 error to 2.251%, achieving a- 25% relative improvement over the winning entry of 2016. Code and models are available at https : //***/hujie-frank/SENet.
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from s...
详细信息
ISBN:
(纸本)0780342364
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from surface shape in shape-from-shading;face identity and head pose in face recognition;or font and letter class in character recognition. We refer to these two factors generically as ''style'' and ''content''. Bilinear models offer a powerful framework for extracting the two-factor structure of a set of observations, and are familiar in computational vision from several well-known lines of research. This paper shows how bilinear models can be used to learn the style-content structure of a pattern analysis or synthesis problem, which can then be generalized to solve related tasks using different styles and/or content. We focus on three tasks: extrapolating the style of data to unseen content classes, classifying data with known content under a novel style, and translating data from novel content classes and style to a known style or content. We show examples from color constancy, face pose estimation, shape-from-shading, typography and speech.
Estimating geographic location from images is a challenging problem that is receiving recent attention. In contrast to many existing methods that primarily model discriminative information corresponding to different l...
详细信息
ISBN:
(纸本)9780769549897
Estimating geographic location from images is a challenging problem that is receiving recent attention. In contrast to many existing methods that primarily model discriminative information corresponding to different locations, we propose joint learning of information that images across locations share and vary upon. Starting with generative and discriminative subspaces pertaining to domains, which are obtained by a hierarchical grouping of images from adjacent locations, we present a top-down approach that first models cross-domain information transfer by utilizing the geometry of these subspaces, and then encodes the model results onto individual images to infer their location. We report competitive results for location recognition and clustering on two public datasets, im2GPS and San Francisco, and empirically validate the utility of various design choices involved in the approach.
This work focuses on object goal visual navigation, aiming at finding the location of an object from a given class, where in each step the agent is provided with an egocentric RGB image of the scene. We propose to lea...
详细信息
ISBN:
(纸本)9781665445092
This work focuses on object goal visual navigation, aiming at finding the location of an object from a given class, where in each step the agent is provided with an egocentric RGB image of the scene. We propose to learn the agent's policy using a reinforcement learning algorithm. Our key contribution is a novel attention probability model for visual navigation tasks. This attention encodes semantic information about observed objects, as well as spatial information about their place. This combination of the "what" and the "where" allows the agent to navigate toward the sought-after object effectively. The attention model is shown to improve the agent's policy and to achieve state-of-the-art results on commonly-used datasets.
The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-syste...
详细信息
ISBN:
(纸本)0780342364
The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-system is subject to external influences such as vibrations, thermal expansion etc. which affect and possibly render invalid the initial calibration. Moreover it is possible that the parameters of the vision-system like e.g. the zoom or the focus are altered intentionally in order to perform specific vision-tasks. This paper describes a technique for automatically maintaining calibration of stereovision systems over time without using again any particular calibration apparatus. It uses all available information, i.e. both spatial and temporal data. Uncertainty is systematically manipulated and maintained. Synthetical and real data are used to validate the proposed technique, and the results compare very favourably with those given by classical calibration methods.
暂无评论