Visual attributes are powerful features for many different applications in computervision such as object detection and scene recognition. Visual attributes present another application that has not been examined as ri...
详细信息
ISBN:
(纸本)9780769549897
Visual attributes are powerful features for many different applications in computervision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal withthis uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled method for choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.
Deformable part models have achieved impressive performance for object detection, even on difficult image datasets. this paper explores the generalization of deformable part models from 2D images to 3D spatiotemporal ...
详细信息
ISBN:
(纸本)9780769549897
Deformable part models have achieved impressive performance for object detection, even on difficult image datasets. this paper explores the generalization of deformable part models from 2D images to 3D spatiotemporal volumes to better study their effectiveness for action detection in video. Actions are treated as spatiotemporal patterns and a deformable part model is generated for each action from a collection of examples. For each action model, the most discriminative 3D subvolumes are automatically selected as parts and the spatiotemporal relations between their locations are learned. By focusing on the most distinctive parts of each action, our models adapt to intra-class variation and show robustness to clutter. Extensive experiments on several video datasets demonstrate the strength of spatiotemporal DPMs for classifying and localizing actions.
Unlike traditional images which do not offer information for different directions of incident light, a light field is defined on ray space, and implicitly encodes scene geometry data in a rich structure which becomes ...
详细信息
ISBN:
(纸本)9780769549897
Unlike traditional images which do not offer information for different directions of incident light, a light field is defined on ray space, and implicitly encodes scene geometry data in a rich structure which becomes visible on its epipolar plane images. In this work, we analyze regularization of light fields in variational frameworks and show that their variational structure is induced by disparity, which is in this context best understood as a vector field on epipolar plane image space. We derive differential constraints on this vector field to enable consistent disparity map regularization. Furthermore, we show how the disparity field is related to the regularization of more general vector-valued functions on the 4D ray space of the light field. this way, we derive an efficient variational framework with convex priors, which can serve as a fundament for a large class of inverse problems on ray space.
We propose a novel algorithm to reconstruct the 3D geometry of human hairs in wide-baseline setups using strand-based refinement. the hair strands are first extracted in each 2D view, and projected onto the 3D visual ...
详细信息
ISBN:
(纸本)9780769549897
We propose a novel algorithm to reconstruct the 3D geometry of human hairs in wide-baseline setups using strand-based refinement. the hair strands are first extracted in each 2D view, and projected onto the 3D visual hull for initialization. the 3D positions of these strands are then refined by optimizing an objective function that takes into account cross-view hair orientation consistency, the visual hull constraint and smoothness constraints defined at the strand, wisp and global levels. Based on the refined strands, the algorithm can reconstruct an approximate hair surface: experiments with synthetic hair models achieve an accuracy of similar to 3mm. We also show real-world examples to demonstrate the capability to capture full-head hair styles as well as hair in motion with as few as 8 cameras.
the graph Laplacian operator, which originated in spectral graph theory, is commonly used for learning applications such as spectral clustering and embedding. In this paper we explore the Laplacian distance, a distanc...
详细信息
ISBN:
(纸本)9780769549897
the graph Laplacian operator, which originated in spectral graph theory, is commonly used for learning applications such as spectral clustering and embedding. In this paper we explore the Laplacian distance, a distance function related to the graph Laplacian, and use it for visual search. We show that previous techniques such as Matching by Tone Mapping (MTM) are particular cases of the Laplacian distance. Generalizing the Laplacian distance results in distance measures which are tolerant to various visual distortions. A novel algorithm based on linear decomposition makes it possible to compute these generalized distances efficiently. the proposed approach is demonstrated for tone mapping invariant, outlier robust and multimodal template matching.
Image segmentation is a fundamental process in computervision applications. this paper presents a novel method to deal withthe issue of image segmentation. Each image is first segmented coarsely, and represented as ...
详细信息
Image segmentation is a fundamental process in computervision applications. this paper presents a novel method to deal withthe issue of image segmentation. Each image is first segmented coarsely, and represented as a graph model. then, a semi-supervised algorithm is utilized to estimate the relevance between labeled nodes and unlabeled nodes to construct a relevance matrix. Finally, a normalized cut criterion is utilized to segment images into meaningful units. the experimental results conducted on Berkeley image databases and MSRC image databases demonstrate the effectiveness of the proposed strategy.
Making a high-dimensional (e.g., 100K-dim) feature for face recognition seems not a good idea because it will bring difficulties on consequent training, computation, and storage. this prevents further exploration of t...
详细信息
ISBN:
(纸本)9780769549897
Making a high-dimensional (e.g., 100K-dim) feature for face recognition seems not a good idea because it will bring difficulties on consequent training, computation, and storage. this prevents further exploration of the use of a high-dimensional feature. In this paper, we study the performance of a high-dimensional feature. We first empirically show that high dimensionality is critical to high performance. A 100K-dim feature, based on a single-type Local Binary pattern (LBP) descriptor, can achieve significant improvements over both its low-dimensional version and the state-of-the-art. We also make the high-dimensional feature practical. With our proposed sparse projection method, named rotated sparse regression, both computation and model storage can be reduced by over 100 times without sacrificing accuracy quality.
We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies...
详细信息
ISBN:
(纸本)9780769549897
We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems-quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.
the goal of face hallucination is to generate high-resolution images with fidelity from low-resolution ones. In contrast to existing methods based on patch similarity or holistic constraints in the image space, we pro...
详细信息
ISBN:
(纸本)9780769549897
the goal of face hallucination is to generate high-resolution images with fidelity from low-resolution ones. In contrast to existing methods based on patch similarity or holistic constraints in the image space, we propose to exploit local image structures for face hallucination. Each face image is represented in terms of facial components, contours and smooth regions. the image structure is maintained via matching gradients in the reconstructed high-resolution output. For facial components, we align input images to generate accurate exemplars and transfer the high-frequency details for preserving structural consistency. For contours, we learn statistical priors to generate salient structures in the high-resolution images. A patch matching method is utilized on the smooth regions where the image gradients are preserved. Experimental results demonstrate that the proposed algorithm generates hallucinated face images with favorable quality and adaptability.
In computervisionthere has been increasing interest in learning hashing codes whose Hamming distance approximates the data similarity. the hashing functions play roles in both quantizing the vector space and generat...
详细信息
ISBN:
(纸本)9780769549897
In computervisionthere has been increasing interest in learning hashing codes whose Hamming distance approximates the data similarity. the hashing functions play roles in both quantizing the vector space and generating similarity-preserving codes. Most existing hashing methods use hyper-planes (or kernelized hyper-planes) to quantize and encode. In this paper, we present a hashing method adopting the k-means quantization. We propose a novel Affinity-Preserving K-means algorithm which simultaneously performs k-means clustering and learns the binary indices of the quantized cells. the distance between the cells is approximated by the Hamming distance of the cell indices. We further generalize our algorithm to a product space for learning longer codes. Experiments show our method, named as K-means Hashing (KMH), outperforms various state-of-the-art hashing encoding methods.
暂无评论