Recent work in computervision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or...
详细信息
ISBN:
(纸本)9780769549897
Recent work in computervision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or defining characteristics of these unobserved classes are known, leveraging this information at test time to detect an unseen class. We address the more realistic problem of detecting categories that do not appear in the dataset in any form. We denote such a category as an unfamiliar class;it is neither observed at train time, nor do we possess any knowledge regarding its relationships to attributes. this problem is one that has received limited attention within the computervision community. In this work, we propose a novel approach to the unfamiliar class detection task that builds on attribute-based classification methods, and we empirically demonstrate how classification accuracy is impacted by attribute noise and dataset "difficulty," as quantified by the separation of classes in the attribute space. We also present a method for incorporating human users to overcome deficiencies in attribute detection. We demonstrate results superior to existing methods on the challenging CUB-200-2011 dataset.
From a set of images in a particular domain, labeled with part locations and class, we present a method to automatically learn a large and diverse set of highly discriminative intermediate features that we call Part-b...
详细信息
ISBN:
(纸本)9780769549897
From a set of images in a particular domain, labeled with part locations and class, we present a method to automatically learn a large and diverse set of highly discriminative intermediate features that we call Part-based One-vs-One Features (POOFs). Each of these features specializes in discrimination between two particular classes based on the appearance at a particular part. We demonstrate the particular usefulness of these features for fine-grained visual categorization with new state-of-the-art results on bird species identification using the Caltech UCSD Birds (CUB) dataset and parity withthe best existing results in face verification on the Labeled Faces in the Wild (LFW) dataset. Finally, we demonstrate the particular advantage of POOFs when training data is scarce.
3D model-based object recognition has been a noticeable research trend in recent years. Common methods find 2D-to-3D correspondences and make recognition decisions by pose estimation, whose efficiency usually suffers ...
详细信息
ISBN:
(纸本)9780769549897
3D model-based object recognition has been a noticeable research trend in recent years. Common methods find 2D-to-3D correspondences and make recognition decisions by pose estimation, whose efficiency usually suffers from noisy correspondences caused by the increasing number of target objects. To overcome this scalability bottleneck, we propose an efficient 2D-to-3D correspondence filtering approach, which combines a light-weight neighborhood-based step with a finer-grained pairwise step to remove spurious correspondences based on 2D/3D geometric cues. On a dataset of 300 3D objects, our solution achieves similar to 10 times speed improvement over the baseline, with a comparable recognition accuracy. A parallel implementation on a quad-core CPU can run at similar to 3fps for 1280 x 720 images.
Plenoptic cameras are gaining attention for their unique light gathering and post-capture processing capabilities. We describe a decoding, calibration and rectification procedure for lenselet-based plenoptic cameras a...
详细信息
ISBN:
(纸本)9780769549897
Plenoptic cameras are gaining attention for their unique light gathering and post-capture processing capabilities. We describe a decoding, calibration and rectification procedure for lenselet-based plenoptic cameras appropriate for a range of computervision applications. We derive a novel physically based 4D intrinsic matrix relating each recorded pixel to its corresponding ray in 3D space. We further propose a radial distortion model and a practical objective function based on ray reprojection. Our 15-parameter camera model is of much lower dimensionality than camera array models, and more closely represents the physics of lenselet-based cameras. Results include calibration of a commercially available camera using three calibration grid sizes over five datasets. Typical RMS ray reprojection errors are 0.0628, 0.105 and 0.363 mm for 3.61, 7.22 and 35.1 mm calibration grids, respectively. Rectification examples include calibration targets and real-world imagery.
the automatic extraction of line-networks from images is a well-known computervision issue. Appearance and shape considerations have been deeply explored in the literature to improve accuracy in presence of occlusion...
详细信息
ISBN:
(纸本)9780769549897
the automatic extraction of line-networks from images is a well-known computervision issue. Appearance and shape considerations have been deeply explored in the literature to improve accuracy in presence of occlusions, shadows, and a wide variety of irrelevant objects. However most existing works have ignored the structural aspect of the problem. We present an original method which provides structurally-coherent solutions. Contrary to the pixel-based and object-based methods, our result is a graph in which each node represents either a connection or an ending in the line-network. Based on stochastic geometry, we develop a new family of point processes consisting in sampling junction-points in the input image by using a Monte Carlo mechanism. the quality of a configuration is measured by a probability density which takes into account both image consistency and shape priors. Our experiments on a variety of problems illustrate the potential of our approach in terms of accuracy, flexibility and efficiency.
Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact withtheir environment, which includes not only other persons, but also...
详细信息
ISBN:
(纸本)9780769549897
Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact withtheir environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. these models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. the inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of person-object interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.
In the past few years there has been a growing interest on geometric frameworks to learn supervised classification models on Riemannian manifolds [,]. A popular framework, valid over any Riemannian manifold, was propo...
详细信息
ISBN:
(纸本)9780769549897
In the past few years there has been a growing interest on geometric frameworks to learn supervised classification models on Riemannian manifolds [,]. A popular framework, valid over any Riemannian manifold, was proposed in [] for binary classification. Once moving from binary to multi-class classification this paradigm is not valid anymore, due to the spread of multiple positive classes on the manifold []. It is then natural to ask whether the multi-class paradigm could be extended to operate on a large class of Riemannian manifolds. We propose a mathematically well-founded classification paradigm that allows to extend the work in [] to multi-class models, taking into account the structure of the space. the idea is to project all the data from the manifold onto an affine tangent space at a particular point. To mitigate the distortion induced by local diffeomorphisms, we introduce for the first time in the computervision community a well-founded mathematical concept, so-called Rolling map [,] the novelty in this alternate school of thought is that the manifold will be firstly rolled (without slipping or twisting) as a rigid body, then the given data is unwrapped onto the affine tangent space, where the classification is performed.
Recent progress has shown that learning from hierarchical feature representations leads to improvements in various computervision tasks. Motivated by the observation that human activity data contains information at v...
详细信息
ISBN:
(纸本)9780769549897
Recent progress has shown that learning from hierarchical feature representations leads to improvements in various computervision tasks. Motivated by the observation that human activity data contains information at various temporal resolutions, we present a hierarchical sequence summarization approach for action recognitionthat learns multiple layers of discriminative feature representations at different temporal granularities. We build up a hierarchy dynamically and recursively by alternating sequence learning and sequence summarization. For sequence learning we use CRFs with latent variables to learn hidden spatio-temporal dynamics;for sequence summarization we group observations that have similar semantic meaning in the latent space. For each layer we learn an abstract feature representation through non-linear gate functions. this procedure is repeated to obtain a hierarchical sequence summary representation. We develop an efficient learning method to train our model and show that its complexity grows sublinearly withthe size of the hierarchy. Experimental results show the effectiveness of our approach, achieving the best published results on the ArmGesture and Canal9 datasets.
this paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of th...
详细信息
ISBN:
(纸本)9780769549897
this paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of the image data. Because of this, these approaches are sensitive to disturbing photometric phenomena such as highlights and shadows. Moreover, valuable information is neglected by discarding chromaticity from the photometric representation. these issues are addressed by Color STIPs. Color STIPs are multi-channel reformulations of existing intensity-based STIP detectors and descriptors, for which we consider a number of chromatic representations derived from the opponent color space. this enhanced modeling of appearance improves the quality of subsequent STIP detection and description. Color STIPs are shown to substantially outperform their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition benchmarks. Moreover, the results show that color STIPs are currently the single best low-level feature choice for STIP-based approaches to human action recognition.
this paper introduces the Chinese chess recognition algorithm based on computervision and image processing. In order to simplify processing and enhance efficiency, the images of chessboard and chessman need preproces...
详细信息
ISBN:
(纸本)9781479937097
this paper introduces the Chinese chess recognition algorithm based on computervision and image processing. In order to simplify processing and enhance efficiency, the images of chessboard and chessman need preprocessing in advance. the steps of preprocessing include of transformation from color images to gray images, images filtering with mean filter or median filter, and binaryzation of the gray images. the edges of chessboard and chessman are able to be extracted from the binarized images by image segmentation. then the location of center of chessman and the circle edge of chessman can be calculated with an advanced Hough transformation, which can ascertain the location of chessman in the chessboard and the size of each chessman. According to the features of chess images, main recognition method is to analyze the radial chess pixel statistical data with mathematical morphology. Because the values of pixel coordination in any angle of chessman can keep same and stable, the recognition algorithm should be with a good recognition rate from the experimental results. the advanced and modified recognition algorithm is proved to be practical and applicative by the experimentation of computervision system in Chinese chess games provided in this paper.
暂无评论