Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambe...
详细信息
ISBN:
(纸本)0780342364
Several vision problems can be reduced to the problem of fitting a linear surface of low dimension to data, including the problems of structure-from-affine-motion, and of characterizing the intensity images of a Lambertian scene by constructing the intensity manifold. For these problems, one must deal with a data matrix with some missing elements. In structure-from-motion, missing elements will occur if some point features are not visible in some frames. To construct the intensity manifold missing matrix elements will arise when the surface normals of some scene points do not face the light source in some images. We propose a novel method for fitting a low rank matrix to a matrix with missing elements. We show experimentally that our method produces good results in the presence of noise. These results can be either used directly, or can serve as an excellent starting point for an iterative method.
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classifi...
详细信息
ISBN:
(纸本)9780769549897
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy.
We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and J...
详细信息
ISBN:
(纸本)9781665445092
We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and Jaccard Frobenius inner product over covariances. Using these combinations of novel losses and using our framework, we obtain state-of-the-art results for early action recognition in UCF101 and JHMDB datasets by obtaining 91.7 % and 83.5 % accuracy respectively for an observation percentage of 20. Similarly, we obtain state-of-the-art results for Epic-Kitchen55 and Breakfast datasets for action anticipation by obtaining 20.35 and 41.8 top-1 accuracy respectively.
This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern ...
详细信息
ISBN:
(纸本)0818672587
This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern retrieval is evaluated on a large texture image database. These features compare favorably with other existing texture representations. A simple hybrid neural network algorithm is used to learn the similarity by simple clustering in the texture feature space. With learning similarity, the performance of similar pattern retrieval improves significantly. An important aspect of this work is its application to real image data. Texture feature extraction with similarity learning is used to search through large aerial photographs. Feature clustering enables efficient search of the database as our experimental results indicate.
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from s...
详细信息
ISBN:
(纸本)0780342364
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from surface shape in shape-from-shading;face identity and head pose in face recognition;or font and letter class in character recognition. We refer to these two factors generically as ''style'' and ''content''. Bilinear models offer a powerful framework for extracting the two-factor structure of a set of observations, and are familiar in computational vision from several well-known lines of research. This paper shows how bilinear models can be used to learn the style-content structure of a pattern analysis or synthesis problem, which can then be generalized to solve related tasks using different styles and/or content. We focus on three tasks: extrapolating the style of data to unseen content classes, classifying data with known content under a novel style, and translating data from novel content classes and style to a known style or content. We show examples from color constancy, face pose estimation, shape-from-shading, typography and speech.
Estimating geographic location from images is a challenging problem that is receiving recent attention. In contrast to many existing methods that primarily model discriminative information corresponding to different l...
详细信息
ISBN:
(纸本)9780769549897
Estimating geographic location from images is a challenging problem that is receiving recent attention. In contrast to many existing methods that primarily model discriminative information corresponding to different locations, we propose joint learning of information that images across locations share and vary upon. Starting with generative and discriminative subspaces pertaining to domains, which are obtained by a hierarchical grouping of images from adjacent locations, we present a top-down approach that first models cross-domain information transfer by utilizing the geometry of these subspaces, and then encodes the model results onto individual images to infer their location. We report competitive results for location recognition and clustering on two public datasets, im2GPS and San Francisco, and empirically validate the utility of various design choices involved in the approach.
This work focuses on object goal visual navigation, aiming at finding the location of an object from a given class, where in each step the agent is provided with an egocentric RGB image of the scene. We propose to lea...
详细信息
ISBN:
(纸本)9781665445092
This work focuses on object goal visual navigation, aiming at finding the location of an object from a given class, where in each step the agent is provided with an egocentric RGB image of the scene. We propose to learn the agent's policy using a reinforcement learning algorithm. Our key contribution is a novel attention probability model for visual navigation tasks. This attention encodes semantic information about observed objects, as well as spatial information about their place. This combination of the "what" and the "where" allows the agent to navigate toward the sought-after object effectively. The attention model is shown to improve the agent's policy and to achieve state-of-the-art results on commonly-used datasets.
The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-syste...
详细信息
ISBN:
(纸本)0780342364
The success of an intelligent robotic system depends on the performance of its vision-system which in turn depends to a great extend upon the quality of its calibration. During the execution of a task the vision-system is subject to external influences such as vibrations, thermal expansion etc. which affect and possibly render invalid the initial calibration. Moreover it is possible that the parameters of the vision-system like e.g. the zoom or the focus are altered intentionally in order to perform specific vision-tasks. This paper describes a technique for automatically maintaining calibration of stereovision systems over time without using again any particular calibration apparatus. It uses all available information, i.e. both spatial and temporal data. Uncertainty is systematically manipulated and maintained. Synthetical and real data are used to validate the proposed technique, and the results compare very favourably with those given by classical calibration methods.
Humans are capable of perceiving a scene at a glance, and obtain deeper understanding with additional time. Similarly, visual recognition deployments should be robust to varying computational budgets. Such situations ...
详细信息
ISBN:
(纸本)9781479951178
Humans are capable of perceiving a scene at a glance, and obtain deeper understanding with additional time. Similarly, visual recognition deployments should be robust to varying computational budgets. Such situations require Anytime recognition ability, which is rarely considered in computervision research. We present a method for learning dynamic policies to optimize Anytime performance in visual architectures. Our model sequentially orders feature computation and performs subsequent classification. Crucially, decisions are made at test time and depend on observed data and intermediate results. We show the applicability of this system to standard problems in scene and object recognition. On suitable datasets, we can incorporate a semantic back-off strategy that gives maximally specific predictions for a desired level of accuracy;this provides a new view on the time course of human visual perception.
Combination of Multiple Classifiers (CMC) has recently drawn attention as a method of improving classification accuracy. This paper presents a method for combining classifiers that uses estimates of each individual cl...
详细信息
ISBN:
(纸本)0818672587
Combination of Multiple Classifiers (CMC) has recently drawn attention as a method of improving classification accuracy. This paper presents a method for combining classifiers that uses estimates of each individual classifier's local accuracy in small regions of feature space surrounding an unknown test sample. Only the output of the most locally accurate classifier is considered. We address issues of 1) optimization of individual classifiers, and 2) the effect of varying the sensitivity of the individual classifiers on the CMC algorithm. Our algorithm performs better on data from a real problem in mammogram image analysis than do other recently proposed CMC techniques.
暂无评论