Many non-rigid 3D structures are not modelled well through a low-rank subspace assumption. This is problematic when it comes to their reconstruction through Structure from Motion (SfM). We argue in this paper that a m...
详细信息
ISBN:
(纸本)9781467388511
Many non-rigid 3D structures are not modelled well through a low-rank subspace assumption. This is problematic when it comes to their reconstruction through Structure from Motion (SfM). We argue in this paper that a more expressive and general assumption can be made around compressible 3D structures. The vision community, however, has hitherto struggled to formulate effective strategies for recovering such structures after projection without the aid of additional priors (e.g. temporal ordering, rigid substructures, etc.). In this paper we present a "prior-less" approach to solve compressible SfM. Specifically, we demonstrate how the problem of SfM - assuming compressible 3D structures -can be theoretically characterized as a block sparse dictionary learning problem. We validate our approach experimentally by demonstrating reconstructions of 3D structures that are intractable using current state-of-theart low-rank SfM approaches.
This paper presents an algorithm coined BORDER (Bounding Oriented-Rectangle Descriptors for Enclosed Regions) for texture-less object recognition. By fusing a regional object encompassment concept with descriptor-base...
详细信息
ISBN:
(纸本)9781467388511
This paper presents an algorithm coined BORDER (Bounding Oriented-Rectangle Descriptors for Enclosed Regions) for texture-less object recognition. By fusing a regional object encompassment concept with descriptor-based pipelines, we extend local-patches into scalable object-sized oriented rectangles for optimal object information encapsulation with minimal outliers. We correspondingly introduce a modified line-segment detection technique termed Linelets to stabilize keypoint repeatability in homogenous conditions. In addition, a unique sampling technique facilitates the incorporation of robust angle primitives to produce discriminative rotation-invariant descriptors. BORDER's high competence in object recognition particularly excels in homogenous conditions obtaining superior detection rates in the presence of high-clutter, occlusion and scale-rotation changes when compared with modern state-of-the-art texture-less object detectors such as BOLD and LINE2D on public texture-less object databases.
Previous approaches to action recognition with deep features tend to process video frames only within a small temporal region, and do not model long-range dynamic information explicitly. However, such information is i...
详细信息
ISBN:
(纸本)9781467388511
Previous approaches to action recognition with deep features tend to process video frames only within a small temporal region, and do not model long-range dynamic information explicitly. However, such information is important for the accurate recognition of actions, especially for the discrimination of complex activities that share sub-actions, and when dealing with untrimmed videos. Here, we propose a representation, VLAD for Deep Dynamics (VLAD(3)), that accounts for different levels of video dynamics. It captures short-term dynamics with deep convolutional neural network features, relying on linear dynamic systems ( LDS) to model medium-range dynamics. To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD3 representation. An extensive evaluation was performed on Olympic Sports, UCF101 and THUMOS15, where the use of the VLAD3 representation leads to state-of-the-art results.
Given a food image, can a fine-grained object recognition engine tell "which restaurant which dish" the food belongs to? Such ultra-fine grained image recognition is the key for many applications like search...
详细信息
ISBN:
(纸本)9781467388511
Given a food image, can a fine-grained object recognition engine tell "which restaurant which dish" the food belongs to? Such ultra-fine grained image recognition is the key for many applications like search by images, but it is very challenging because it needs to discern subtle difference between classes while dealing with the scarcity of training data. Fortunately, the ultra-fine granularity naturally brings rich relationships among object classes. This paper proposes a novel approach to exploit the rich relationships through bipartite-graph labels (BGL). We show how to model BGL in an overall convolutional neural networks and the resulting system can be optimized through back-propagation. We also show that it is computationally efficient in inference thanks to the bipartite structure. To facilitate the study, we construct a new food benchmark dataset, which consists of 37,885 food images collected from 6 restaurants and totally 975 menus. Experimental results on this new food and three other datasets demonstrate BGL advances previous works in fine-grained object recognition. An online demo is available at http://***/fg_demo/.
We are dealing with the problem of fine-grained vehicle make& model recognition and verification. Our contribution is showing that extracting additional data from the video stream - besides the vehicle image itsel...
详细信息
ISBN:
(纸本)9781467388511
We are dealing with the problem of fine-grained vehicle make& model recognition and verification. Our contribution is showing that extracting additional data from the video stream - besides the vehicle image itself - and feeding it into the deep convolutional neural network boosts the recognition performance considerably. This additional information includes: 3D vehicle bounding box used for "unpacking" the vehicle image, its rasterized low-resolution shape, and information about the 3D vehicle orientation. Experiments show that adding such information decreases classification error by 26% (the accuracy is improved from 0.772 to 0.832) and boosts verification average precision by 208% (0.378 to 0.785) compared to baseline pure CNN without any input modifications. Also, the pure baseline CNN outperforms the recent state of the art solution by 0.081. We provide an annotated set "BoxCars" of surveillance vehicle images augmented by various automatically extracted auxiliary information. Our approach and the dataset can considerably improve the performance of traffic surveillance systems.
Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computervision tasks. However, high performance hardware is typically indispensable for the application of CNN models ...
详细信息
ISBN:
(纸本)9781467388511
Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computervision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4 similar to 6x speed-up and 15 similar to 20x compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second.
We propose an explicitly discriminative and 'simple' approach to generate invariance to nuisance transformations modeled as unitary. In practice, the approach works well to handle non-unitary transformations a...
详细信息
ISBN:
(纸本)9781467388511
We propose an explicitly discriminative and 'simple' approach to generate invariance to nuisance transformations modeled as unitary. In practice, the approach works well to handle non-unitary transformations as well. Our theoretical results extend the reach of a recent theory of invariance to discriminative and kernelized features based on unitary kernels. As a special case, a single common framework can be used to generate subject-specific pose-invariant features for face recognition and vice-versa for pose estimation. We show that our main proposed method (DIKF) can perform well under very challenging large-scale semisynthetic face matching and pose estimation protocols with unaligned faces using no landmarking whatsoever. We additionally benchmark on CMU MPIE and outperform previous work in almost all cases on off-angle face matching while we are on par with the previous state-of-the-art on the LFW unsupervised and image-restricted protocols, without any low-level image descriptors other than raw-pixels.
Super-symmetric tensors - a higher-order extension of scatter matrices - are becoming increasingly popular in machine learning and computervision for modeling data statistics, co-occurrences, or even as visual descri...
详细信息
ISBN:
(纸本)9781467388511
Super-symmetric tensors - a higher-order extension of scatter matrices - are becoming increasingly popular in machine learning and computervision for modeling data statistics, co-occurrences, or even as visual descriptors. They were shown recently to outperform second-order approaches [18], however, the size of these tensors are exponential in the data dimensionality, which is a significant concern. In this paper, we study third-order super-symmetric tensor descriptors in the context of dictionary learning and sparse coding. For this purpose, we propose a novel non-linear third-order texture descriptor. Our goal is to approximate these tensors as sparse conic combinations of atoms from a learned dictionary. Apart from the significant benefits to tensor compression that this framework offers, our experiments demonstrate that the sparse coefficients produced by this scheme lead to better aggregation of high-dimensional data and showcase superior performance on two common computervision tasks compared to the state of the art.
Recent face recognition experiments on a major benchmark (LFW [15]) show stunning performance-a number of algorithms achieve near to perfect score, surpassing human recognition rates. In this paper, we advocate evalua...
详细信息
ISBN:
(纸本)9781467388511
Recent face recognition experiments on a major benchmark (LFW [15]) show stunning performance-a number of algorithms achieve near to perfect score, surpassing human recognition rates. In this paper, we advocate evaluations at the million scale ( LFW includes only 13K photos of 5K people). To this end, we have assembled the MegaFace dataset and created the first MegaFace challenge. Our dataset includes One Million photos that capture more than 690K different individuals. The challenge evaluates performance of algorithms with increasing numbers of "distractors" ( going from 10 to 1M) in the gallery set. We present both identification and verification performance, evaluate performance with respect to pose and a persons age, and compare as a function of training data size (# photos and #people). We report results of state of the art and baseline algorithms. The MegaFace dataset, baseline code, and evaluation scripts, are all publicly released for further experimentations.
We propose a method to push the frontiers of unconstrained face recognition in the wild, focusing on the problem of extreme pose variations. As opposed to current techniques which either expect a single model to learn...
详细信息
ISBN:
(纸本)9781467388511
We propose a method to push the frontiers of unconstrained face recognition in the wild, focusing on the problem of extreme pose variations. As opposed to current techniques which either expect a single model to learn pose invariance through massive amounts of training data, or which normalize images to a single frontal pose, our method explicitly tackles pose variation by using multiple pose specific models and rendered face images. We leverage deep Convolutional Neural Networks (CNNs) to learn discriminative representations we call Pose-Aware Models (PAMs) using 500K images from the CASIA WebFace dataset. We present a comparative evaluation on the new IARPA Janus Benchmark A (IJB-A) and PIPA datasets. On these datasets PAMs achieve remarkably better performance than commercial products and surprisingly also outperform methods that are specifically fine-tuned on the target dataset.
暂无评论