Local spatio-temporal features and bag-of-features representations have become popular for action recognition. A recent trend is to use dense sampling for better performance. While many methods claimed to use dense fe...
详细信息
ISBN:
(纸本)9780769549897
Local spatio-temporal features and bag-of-features representations have become popular for action recognition. A recent trend is to use dense sampling for better performance. While many methods claimed to use dense feature sets, most of them are just denser than approaches based on sparse interest point detectors. In this paper, we explore sampling with high density on action recognition. We also investigate the impact of random sampling over dense grid for computational efficiency. We present a real-time action recognition system which integrates fast random sampling method with local spatio-temporal features extracted from a Local Part Model. A new method based on histogram intersection kernel is proposed to combine multiple channels of different descriptors. Our technique shows high accuracy on the simple Kth dataset, and achieves state-of-the-art on two very challenging real-world datasets, namely, 93% on Kth, 83.3% on UCF50 and 47.6% on HMDB51.
this paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of th...
详细信息
ISBN:
(纸本)9780769549897
this paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of the image data. Because of this, these approaches are sensitive to disturbing photometric phenomena such as highlights and shadows. Moreover, valuable information is neglected by discarding chromaticity from the photometric representation. these issues are addressed by Color STIPs. Color STIPs are multi-channel reformulations of existing intensity-based STIP detectors and descriptors, for which we consider a number of chromatic representations derived from the opponent color space. this enhanced modeling of appearance improves the quality of subsequent STIP detection and description. Color STIPs are shown to substantially outperform their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition benchmarks. Moreover, the results show that color STIPs are currently the single best low-level feature choice for STIP-based approaches to human action recognition.
Computational color constancy is a very important topic in computervision and has attracted many researchers' attention. Recently, lots of research has shown the effects of using high level visual content cues fo...
详细信息
ISBN:
(纸本)9780769549897
Computational color constancy is a very important topic in computervision and has attracted many researchers' attention. Recently, lots of research has shown the effects of using high level visual content cues for improving illumination estimation. However, nearly all the existing methods are essentially combinational strategies in which image's content analysis is only used to guide the combination or selection from a variety of individual illumination estimation methods. In this paper, we propose a novel bilayer sparse coding model for illumination estimation that considers image similarity in terms of both low level color distribution and high level image scene content simultaneously. For the purpose, the image's scene content information is integrated with its color distribution to obtain optimal illumination estimation model. the experimental results on real-world image sets show that our algorithm is superior to some prevailing illumination estimation methods, even better than some combinational methods.
For problems over continuous random variables, MRFs with large cliques pose a challenge in probabilistic inference. Difficulties in performing optimization efficiently have limited the probabilistic models explored in...
详细信息
ISBN:
(纸本)9780769549897
For problems over continuous random variables, MRFs with large cliques pose a challenge in probabilistic inference. Difficulties in performing optimization efficiently have limited the probabilistic models explored in computervision and other fields. One inference technique that handles large cliques well is Expectation Propagation. EP offers run times independent of clique size, which instead depend only on the rank, or intrinsic dimensionality, of potentials. this property would be highly advantageous in computervision. Unfortunately, for grid-shaped models common in vision, traditional Gaussian EP requires quadratic space and cubic time in the number of pixels. Here, we propose a variation of EP that exploits regularities in natural scene statistics to achieve run times that are linear in both number of pixels and clique size. We test these methods on shape from shading, and we demonstrate strong performance not only for Lambertian surfaces, but also on arbitrary surface reflectance and lighting arrangements, which requires highly non-Gaussian potentials. Finally, we use large, non-local cliques to exploit cast shadow, which is traditionally ignored in shape from shading.
In this work, we propose a novel video representation for activity recognitionthat models video dynamics with attributes of activities. A video sequence is decomposed into short-term segments, which are characterized...
详细信息
ISBN:
(纸本)9780769549897
In this work, we propose a novel video representation for activity recognitionthat models video dynamics with attributes of activities. A video sequence is decomposed into short-term segments, which are characterized by the dynamics of their attributes. these segments are modeled by a dictionary of attribute dynamics templates, which are implemented by a recently introduced generative model, the binary dynamic system (BDS). We propose methods for learning a dictionary of BDSs from a training corpus, and for quantizing attribute sequences extracted from videos into these BDS codewords. this procedure produces a representation of the video as a histogram of BDS codewords, which is denoted the bag-of-words for attribute dynamics (BoWAD). An extensive experimental evaluation reveals that this representation outperforms other state-of-the-art approaches in temporal structure modeling for complex activity recognition.
the representation of local image patches is crucial for the good performance and efficiency of many vision tasks. Patch descriptors have been designed to generalize towards diverse variations, depending on the applic...
详细信息
ISBN:
(纸本)9780769549897
the representation of local image patches is crucial for the good performance and efficiency of many vision tasks. Patch descriptors have been designed to generalize towards diverse variations, depending on the application, as well as the desired compromise between accuracy and efficiency. We present a novel formulation of patch description, that serves such issues well. Sparse quantization lies at its heart. this allows for efficient encodings, leading to powerful, novel binary descriptors, yet also to the generalization of existing descriptors like SIFT or BRIEF. We demonstrate the capabilities of our formulation for both keypoint matching and image classification. Our binary descriptors achieve state-of-the-art results for two keypoint matching benchmarks, namely those by Brown [6] and Mikolajczyk [18]. For image classification, we propose new descriptors that perform similar to SIFT on Caltech101 [10] and PASCAL VOC07 [9].
Many computervision problems (e. g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the mos...
详细信息
ISBN:
(纸本)9780769549897
Many computervision problems (e. g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computervision, 2nd order descent methods have two main drawbacks: (1) the function might not be analytically differentiable and numerical approximations are impractical. (2) the Hessian might be large and not positive definite. To address these issues, this paper proposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-of-the-art performance in the problem of facial feature detection. the code is available at ***/intraface.
We present a vision-based method for signer diarization - the task of automatically determining "who signed when?" in a video. this task has similar motivations and applications as speaker diarization but ha...
详细信息
ISBN:
(纸本)9780769549903
We present a vision-based method for signer diarization - the task of automatically determining "who signed when?" in a video. this task has similar motivations and applications as speaker diarization but has received little attention in the literature. In this paper, we motivate the problem and propose a method for solving it. the method is based on the hypothesis that signers make more movements than their interlocutors. Experiments on four videos (a total of 1.4 hours and each consisting of two signers) show the applicability of the method. the best diarization error rate (DER) obtained is 0.16.
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware ...
详细信息
ISBN:
(纸本)9780769549903
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources, as in the case of mobile devices. this demo is based on a novel algorithm for fast and accurate ellipse detection. the proposed algorithm relies on a careful selection of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. the demo will show it working on a commercial smart-phone.
Product quantization is an effective vector quantization approach to compactly encode high-dimensional vectors for fast approximate nearest neighbor (ANN) search. the essence of product quantization is to decompose th...
详细信息
ISBN:
(纸本)9780769549897
Product quantization is an effective vector quantization approach to compactly encode high-dimensional vectors for fast approximate nearest neighbor (ANN) search. the essence of product quantization is to decompose the original high-dimensional space into the Cartesian product of a finite number of low-dimensional subspaces that are then quantized separately. Optimal space decomposition is important for the performance of ANN search, but still remains unaddressed. In this paper, we optimize product quantization by minimizing quantization distortions w.r.t. the space decomposition and the quantization codebooks. We present two novel methods for optimization: a non-parametric method that alternatively solves two smaller sub-problems, and a parametric method that is guaranteed to achieve the optimal solution if the input data follows some Gaussian distribution. We show by experiments that our optimized approach substantially improves the accuracy of product quantization for ANN search.
暂无评论