Alignment of 3D objects from 2D images is one of the most important and well studied problems in computervision. A typical object alignment system consists of a landmark appearance model which is used to obtain an in...
详细信息
ISBN:
(纸本)9780769549897
Alignment of 3D objects from 2D images is one of the most important and well studied problems in computervision. A typical object alignment system consists of a landmark appearance model which is used to obtain an initial shape and a shape model which refines this initial shape by correcting the initialization errors. Since errors in landmark initialization from the appearance model propagate through the shape model, it is critical to have a robust landmark appearance model. While there has been much progress in designing sophisticated and robust shape models, there has been relatively less progress in designing robust landmark detection models. In this paper we present an efficient and robust landmark detection model which is designed specifically to minimize localization errors thereby leading to state-of-the-art object alignment performance. We demonstrate the efficacy and speed of the proposed approach on the challenging task of multi-view car alignment.
Scene text recognition has inspired great interests from the computervision community in recent years. In this paper, we propose a novel scene text recognition method using part-based tree-structured character detect...
详细信息
ISBN:
(纸本)9780769549897
Scene text recognition has inspired great interests from the computervision community in recent years. In this paper, we propose a novel scene text recognition method using part-based tree-structured character detection. Different from conventional multi-scale sliding window character detection strategy, which does not make use of the character-specific structure information, we use part-based tree-structure to model each type of character so as to detect and recognize the characters at the same time. While for word recognition, we build a Conditional Random Field model on the potential character locations to incorporate the detection scores, spatial constraints and linguistic knowledge into one framework. the final word recognition result is obtained by minimizing the cost function defined on the random field. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods significantly both for character detection and word recognition.
Many computervision problems can be formulated as binary quadratic programs (BQPs). Two classic relaxation methods are widely used for solving BQPs, namely, spectral methods and semidefinite programming (SDP), each w...
详细信息
ISBN:
(纸本)9780769549897
Many computervision problems can be formulated as binary quadratic programs (BQPs). Two classic relaxation methods are widely used for solving BQPs, namely, spectral methods and semidefinite programming (SDP), each withtheir own advantages and disadvantages. Spectral relaxation is simple and easy to implement, but its bound is loose. Semidefinite relaxation has a tighter bound, but its computational complexity is high for large scale problems. We present a new SDP formulation for BQPs, with two desirable properties. First, it has a similar relaxation bound to conventional SDP formulations. Second, compared with conventional SDP methods, the new SDP formulation leads to a significantly more efficient and scalable dual optimization approach, which has the same degree of complexity as spectral methods. Extensive experiments on various applications including clustering, image segmentation, co-segmentation and registration demonstrate the usefulness of our SDP formulation for solving large-scale BQPs.
In this paper we present a novel non-rigid optical flow algorithm for dense image correspondence and non-rigid registration. the algorithm uses a unique Laplacian Mesh Energy term to encourage local smoothness whilst ...
详细信息
ISBN:
(纸本)9780769549897
In this paper we present a novel non-rigid optical flow algorithm for dense image correspondence and non-rigid registration. the algorithm uses a unique Laplacian Mesh Energy term to encourage local smoothness whilst simultaneously preserving non-rigid deformation. Laplacian deformation approaches have become popular in graphics research as they enable mesh deformations to preserve local surface shape. In this work we propose a novel Laplacian Mesh Energy formula to ensure such sensible local deformations between image pairs. We express this wholly within the optical flow optimization, and show its application in a novel coarse-to-fine pyramidal approach. Our algorithm achieves the state-of-the-art performance in all trials on the Garg et al. dataset, and top tier performance on the Middlebury evaluation.
Local spatio-temporal features and bag-of-features representations have become popular for action recognition. A recent trend is to use dense sampling for better performance. While many methods claimed to use dense fe...
详细信息
ISBN:
(纸本)9780769549897
Local spatio-temporal features and bag-of-features representations have become popular for action recognition. A recent trend is to use dense sampling for better performance. While many methods claimed to use dense feature sets, most of them are just denser than approaches based on sparse interest point detectors. In this paper, we explore sampling with high density on action recognition. We also investigate the impact of random sampling over dense grid for computational efficiency. We present a real-time action recognition system which integrates fast random sampling method with local spatio-temporal features extracted from a Local Part Model. A new method based on histogram intersection kernel is proposed to combine multiple channels of different descriptors. Our technique shows high accuracy on the simple Kth dataset, and achieves state-of-the-art on two very challenging real-world datasets, namely, 93% on Kth, 83.3% on UCF50 and 47.6% on HMDB51.
Recognizing the category of a visual object remains a challenging computervision problem. In this paper we develop a novel deep learning method that facilitates example-based visual object category recognition. Our d...
详细信息
ISBN:
(纸本)9780769549897
Recognizing the category of a visual object remains a challenging computervision problem. In this paper we develop a novel deep learning method that facilitates example-based visual object category recognition. Our deep learning architecture consists of multiple stacked layers and computes an intermediate representation that can be fed to a nearest-neighbor classifier. this intermediate representation is discriminative and structure-preserving. It is also capable of extracting essential characteristics shared by objects in the same category while filtering out nonessential differences among them. Each layer in our model is a nonlinear mapping, whose parameters are learned through two sequential steps that are designed to achieve the aforementioned properties. the first step computes a discrete mapping called supervised Laplacian Eigenmap. the second step computes a continuous mapping from the discrete version through nonlinear regression. We have extensively tested our method and it achieves state-of-the-art recognition rates on a number of benchmark datasets.
Computational color constancy is a very important topic in computervision and has attracted many researchers' attention. Recently, lots of research has shown the effects of using high level visual content cues fo...
详细信息
ISBN:
(纸本)9780769549897
Computational color constancy is a very important topic in computervision and has attracted many researchers' attention. Recently, lots of research has shown the effects of using high level visual content cues for improving illumination estimation. However, nearly all the existing methods are essentially combinational strategies in which image's content analysis is only used to guide the combination or selection from a variety of individual illumination estimation methods. In this paper, we propose a novel bilayer sparse coding model for illumination estimation that considers image similarity in terms of both low level color distribution and high level image scene content simultaneously. For the purpose, the image's scene content information is integrated with its color distribution to obtain optimal illumination estimation model. the experimental results on real-world image sets show that our algorithm is superior to some prevailing illumination estimation methods, even better than some combinational methods.
Early work in computervision considered a host of geometric cues for both shape reconstruction [11] and recognition [14]. However, since then, the vision community has focused heavily on shading cues for reconstructi...
详细信息
ISBN:
(纸本)9780769549897
Early work in computervision considered a host of geometric cues for both shape reconstruction [11] and recognition [14]. However, since then, the vision community has focused heavily on shading cues for reconstruction [1], and moved towards data-driven approaches for recognition [6]. In this paper, we reconsider these perhaps overlooked "boundary" cues (such as self occlusions and folds in a surface), as well as many other established constraints for shape reconstruction. In a variety of user studies and quantitative tasks, we evaluate how well these cues inform shape reconstruction (relative to each other) in terms of both shape quality and shape recognition. Our findings suggest many new directions for future research in shape reconstruction, such as automatic boundary cue detection and relaxing assumptions in shape from shading (e.g. orthographic projection, Lambertian surfaces).
In this work, we propose a novel video representation for activity recognitionthat models video dynamics with attributes of activities. A video sequence is decomposed into short-term segments, which are characterized...
详细信息
ISBN:
(纸本)9780769549897
In this work, we propose a novel video representation for activity recognitionthat models video dynamics with attributes of activities. A video sequence is decomposed into short-term segments, which are characterized by the dynamics of their attributes. these segments are modeled by a dictionary of attribute dynamics templates, which are implemented by a recently introduced generative model, the binary dynamic system (BDS). We propose methods for learning a dictionary of BDSs from a training corpus, and for quantizing attribute sequences extracted from videos into these BDS codewords. this procedure produces a representation of the video as a histogram of BDS codewords, which is denoted the bag-of-words for attribute dynamics (BoWAD). An extensive experimental evaluation reveals that this representation outperforms other state-of-the-art approaches in temporal structure modeling for complex activity recognition.
the representation of local image patches is crucial for the good performance and efficiency of many vision tasks. Patch descriptors have been designed to generalize towards diverse variations, depending on the applic...
详细信息
ISBN:
(纸本)9780769549897
the representation of local image patches is crucial for the good performance and efficiency of many vision tasks. Patch descriptors have been designed to generalize towards diverse variations, depending on the application, as well as the desired compromise between accuracy and efficiency. We present a novel formulation of patch description, that serves such issues well. Sparse quantization lies at its heart. this allows for efficient encodings, leading to powerful, novel binary descriptors, yet also to the generalization of existing descriptors like SIFT or BRIEF. We demonstrate the capabilities of our formulation for both keypoint matching and image classification. Our binary descriptors achieve state-of-the-art results for two keypoint matching benchmarks, namely those by Brown [6] and Mikolajczyk [18]. For image classification, we propose new descriptors that perform similar to SIFT on Caltech101 [10] and PASCAL VOC07 [9].
暂无评论