Infinite dimensional covariance descriptors can provide richer and more discriminative information than their low dimensional counterparts. In this paper, we propose a novel image descriptor, namely, robust approximat...
详细信息
ISBN:
(纸本)9781467388511
Infinite dimensional covariance descriptors can provide richer and more discriminative information than their low dimensional counterparts. In this paper, we propose a novel image descriptor, namely, robust approximate infinite dimensional Gaussian (RAID-G). The challenges of RAID-G mainly lie on two aspects: (1) description of infinite dimensional Gaussian is difficult due to its non-linear Riemannian geometric structure and the infinite dimensional setting, hence effective approximation is necessary;(2) traditional maximum likelihood estimation (MLE) is not robust to high (even infinite) dimensional covariance matrix in Gaussian setting. To address these challenges, explicit feature mapping (EFM) is first introduced for effective approximation of infinite dimensional Gaussian induced by additive kernel function, and then a new regularized MLE method based on von Neumann divergence is proposed for robust estimation of covariance matrix. The EFM and proposed regularized MLE allow a closed-form of RAID-G, which is very efficient and effective for high dimensional features. We extend RAID-G by using the outputs of deep convolutional neural networks as original features, and apply it to material recognition. Our approach is evaluated on five material benchmarks and one fine-grained benchmark. It achieves 84.9% accuracy on FMD and 86.3% accuracy on UIUC material database, which are much higher than state-of-the-arts.
Feature indexing techniques are promising for object recognition since they can quickly reduce the set of possible matches for a set of image features. This work exploits another property of such techniques. They have...
详细信息
ISBN:
(纸本)0818672587
Feature indexing techniques are promising for object recognition since they can quickly reduce the set of possible matches for a set of image features. This work exploits another property of such techniques. They have inherently parallel structure and connectionist network formulations are easy to develop. Once indexing has been performed, a voting scheme such as geometric hashing can be used to generate object hypotheses in parallel. We describe a framework for the connectionist implementation of such indexing and recognition techniques. With sufficient processing elements, recognition can be performed in a small number of time steps. The number of processing elements necessary to achieve peak performance and the fan-in/fan-out required for the processing elements is examined. These techniques have been simulated on a conventional architecture with good results.
This work demonstrates a novel mobile robot architecture that uses environment-based sensor network that provides a third-person perception. The demonstration is motivated by the idea that a mobile robot working in th...
详细信息
This work demonstrates a novel mobile robot architecture that uses environment-based sensor network that provides a third-person perception. The demonstration is motivated by the idea that a mobile robot working in the area tune in to broadcasts from the video camera network to receive sensor data.
We propose a face recognition approach based on hashing. The approach yields comparable recognition rates with the random l(1) approach [18], which is considered the state-of-the-art. But our method is much faster: it...
详细信息
ISBN:
(纸本)9781424469840
We propose a face recognition approach based on hashing. The approach yields comparable recognition rates with the random l(1) approach [18], which is considered the state-of-the-art. But our method is much faster: it is up to 150 times faster than [18] on the YaleB dataset. We show that with hashing, the sparse representation can be recovered with a high probability because hashing preserves the restrictive isometry property. Moreover, we present a theoretical analysis on the recognition rate of the proposed hashing approach. Experiments show a very competitive recognition rate and significant speedup compared with the state-of-the-art.
Real-time recognition may be limited by scarce memory and computing resources for performing classification. Although, prior research has addressed the problem of training classifiers with limited data and computation...
详细信息
ISBN:
(纸本)9781467312288
Real-time recognition may be limited by scarce memory and computing resources for performing classification. Although, prior research has addressed the problem of training classifiers with limited data and computation, few efforts have tackled the problem of memory constraints on recognition. We explore methods that can guide the allocation of limited storage resources for classifying streaming data so as to maximize discriminatory power. We focus on computation of the expected value of information with nearest neighbor classifiers for online face recognition. Experiments on real-world datasets show the effectiveness and power of the approach. The methods provide a principled approach to vision under bounded resources, and have immediate application to enhancing recognition capabilities in consumer devices with limited memory.
A fundamental problem in depth from defocus is the measurement of relative defocus between images. We propose a class of broadband operators that, when used together, provide invariance to scene texture and produce ac...
详细信息
ISBN:
(纸本)0818672587
A fundamental problem in depth from defocus is the measurement of relative defocus between images. We propose a class of broadband operators that, when used together, provide invariance to scene texture and produce accurate and dense depth maps. Since the operators are broadband, a small number of them are sufficient for depth estimation of scenes with complex textural properties. Experiments are conducted on both synthetic and real scenes to evaluate the performance of the proposed operators. The depth detection gain error is less than 1%, irrespective of texture frequency. Depth accuracy is found to be 0.5 approx. 1.2% of the distance of the object from the imaging optics.
Abnormal event detection in video is a challenging vision problem. Most existing approachesformulate abnormal event detection as an outlier detection task, due to the scarcity of anomalous data during training. Becaus...
详细信息
ISBN:
(纸本)9781728132938
Abnormal event detection in video is a challenging vision problem. Most existing approachesformulate abnormal event detection as an outlier detection task, due to the scarcity of anomalous data during training. Because of the lack of prior information regardingabnormal events, these methods are not fully-equipped to differentiate between normal and abnormal events. In this work, we formalize abnormal event detection as a one-versus-rest binary classification problem. Our contribution is two-fold. First,we introducean unsupervisedfeature learningframework based on object-centric convolutional auto-encoders to encode both motion and appearanceinformation. Second, we propose a supervisedclassificationapproach based on clusteringthe trainingsamples into normality clusters. A one-versus-rest abnormal event classifier is then employed to separate each normality cluster from the rest. For the purpose of training the classifier, the other clusters act as dummy anomalies. During inference, an object is labeled as abnormal if the highest classification score assigned by the one-versus-rest classifiers is negative. Comprehensive experiments are performed on four benchmarks: Avenue, ShanghaiTech, UCSD and UMN. Our approach provides superior results on allfour data sets. On the large-scale ShanghaiTech data set, our method provides an absolute gain of 8.4% in terms offrame-level AUC compared to the state-of-the-artmethod [].
We represent local spatial structure in a color image using feature matrices that are computed from an image region. Feature matrices contain significantly more information about local image structure than previous re...
详细信息
ISBN:
(纸本)0818672587
We represent local spatial structure in a color image using feature matrices that are computed from an image region. Feature matrices contain significantly more information about local image structure than previous representations. Although feature matrices are useful for surface recognition, this representation depends on the spectral properties of the scene illumination. Using a finite dimensional linear model for surface spectral reflectance with the same number of parameters as the number of color bands, we show that illumination changes correspond to linear transformations of the feature matrices and that surface rotations correspond to circular shifts of the matrices. From these relationships we derive an algorithm for illumination and geometry invariant recognition of local surface structure. We demonstrate the algorithm with a series of experiments on images of real objects.
Effective regularization techniques are highly desired in deep learning for alleviating overfitting and improving generalization. This work proposes a new regularization scheme, based on the understanding that the fla...
详细信息
ISBN:
(纸本)9781665445092
Effective regularization techniques are highly desired in deep learning for alleviating overfitting and improving generalization. This work proposes a new regularization scheme, based on the understanding that the flat local minima of the empirical risk cause the model to generalize better. This scheme is referred to as adversarial model perturbation (AMP), where instead of directly minimizing the empirical risk, an alternative "AMP loss" is minimized via SGD. Specifically, the AMP loss is obtained from the empirical risk by applying the "worst" norm-bounded perturbation on each point in the parameter space. Comparing with most existing regularization schemes, AMP has strong theoretical justifications, in that minimizing the AMP loss can be shown theoretically to favour flat local minima of the empirical risk. Extensive experiments on various modern deep architectures establish AMP as a new state of the art among regularization schemes.
Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute...
详细信息
ISBN:
(纸本)9781538604571
Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A topdown architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.
暂无评论