Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. T...
详细信息
ISBN:
(纸本)9781467369640
Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. To jointly model all of these tasks, we propose a coarse-to-fine hierarchical representation, where each level of the hierarchy represents objects at a different level of granularity. The hierarchical representation prevents performance loss, which is often caused by the increase in the number of parameters (as we consider more tasks to model), and the joint modeling enables resolving ambiguities that exist in independent modeling of these tasks. We augment PASCAL3D+ [34] dataset with annotations for these tasks and show that our hierarchical model is effective in joint modeling of object detection, 3D pose estimation, and sub-category recognition.
This paper explores two new aspects of photos and human emotions. First, we show through psychovisual studies that different people have different emotional reactions to the same image, which is a strong and novel dep...
详细信息
ISBN:
(纸本)9781467369640
This paper explores two new aspects of photos and human emotions. First, we show through psychovisual studies that different people have different emotional reactions to the same image, which is a strong and novel departure from previous work that only records and predicts a single dominant emotion for each image. Our studies also show that the same person may have multiple emotional reactions to one image. Predicting emotions in "distributions" instead of a single dominant emotion is important for many applications. Second, we show not only that we can often change the evoked emotion of an image by adjusting color tone and texture related features but also that we can choose in which "emotional direction" this change occurs by selecting a target image. In addition, we present a new database, Emotion6, containing distributions of emotions.
Current methods for registering image regions perform well for simple transformations or large image regions. In this paper, we present a new method that is better able to handle small image regions as they deform wit...
详细信息
ISBN:
(纸本)0780342364
Current methods for registering image regions perform well for simple transformations or large image regions. In this paper, we present a new method that is better able to handle small image regions as they deform with non-linear transformations. We introduce difference decompositon, a novel approach to solving the registration problem. The method is a generalization of previous methods and can better handle non-linear transforms. Although the methods are general, we focus on projective transformations and introduce piecewise-projective transformations for modeling the motions of non-planar objects. We conclude with examples from our prototype implementation.
We explore the task of recognizing peoples' identities in photo albums in an unconstrained setting. To facilitate this, we introduce the new People In Photo Albums (PIPA) dataset, consisting of over 60000 instance...
详细信息
ISBN:
(纸本)9781467369640
We explore the task of recognizing peoples' identities in photo albums in an unconstrained setting. To facilitate this, we introduce the new People In Photo Albums (PIPA) dataset, consisting of over 60000 instances of similar to 2000 individuals collected from public Flickr photo albums. With only about half of the person images containing a frontal face, the recognition task is very challenging due to the large variations in pose, clothing, camera viewpoint, image resolution and illumination. We propose the Pose Invariant PErson recognition (PIPER) method, which accumulates the cues of poselet-level person recognizers trained by deep convolutional networks to discount for the pose variations, combined with a face recognizer and a global recognizer. Experiments on three different settings confirm that in our unconstrained setup PIPER significantly improves on the performance of DeepFace, which is one of the best face recognizers as measured on the LFW dataset.
This paper originally proposes the clique-graph and further presents a clique-graph matching method by preserving global and local structures. Especially, we formulate the objective function of clique-graph matching w...
详细信息
ISBN:
(纸本)9781467369640
This paper originally proposes the clique-graph and further presents a clique-graph matching method by preserving global and local structures. Especially, we formulate the objective function of clique-graph matching with respective to two latent variables, the clique information in the original graph and the pairwise clique correspondence constrained by the one-to-one matching. Since the objective function is not jointly convex to both latent variables, we decompose it into two consecutive steps for optimization: 1) clique-to-clique similarity measure by preserving local unary and pairwise correspondences;2) graph-to-graph similarity measure by preserving global clique-to-clique correspondence. Extensive experiments on the synthetic data and real images show that the proposed method can outperform representative methods especially when both noise and outliers exist.
We develop a simple and very fast method for object tracking based exclusively on color information in digitized video images. Running on a Silicon Graphics R4600 Indy system with an IndyCam, our algorithm is capable ...
详细信息
ISBN:
(纸本)0780342364
We develop a simple and very fast method for object tracking based exclusively on color information in digitized video images. Running on a Silicon Graphics R4600 Indy system with an IndyCam, our algorithm is capable of simultaneously tracking objects at full frame size (640 x 480 pixels) and video frame rate (30 fps). Robustness with respect to occlusion is achieved via an explicit hypothesis-tree model of the occlusion process. We demonstrate the efficacy of our technique in the challenging task of tracking people, especially tracking human heads and hands.
Crowded scene understanding is a fundamental problem in computervision. In this study, we develop a multi-task deep model to jointly learn and combine appearance and motion features for crowd understanding. We propos...
详细信息
ISBN:
(纸本)9781467369640
Crowded scene understanding is a fundamental problem in computervision. In this study, we develop a multi-task deep model to jointly learn and combine appearance and motion features for crowd understanding. We propose crowd motion channels as the input of the deep model and the channel design is inspired by generic properties of crowd systems. To well demonstrate our deep model, we construct a new large-scale WWW Crowd dataset with 10,000 videos from 8, 257 crowded scenes, and build an attribute set with 94 attributes on WWW. We further measure user study performance on WWW and compare this with the proposed deep models. Extensive experiments show that our deep models display significant performance improvements in cross-scene attribute recognition compared to strong crowd-related feature-based baselines, and the deeply learned features behave a superior performance in multi-task learning.
Successful methods for visual object recognition typically rely on training datasets containing lots of richly annotated images. Detailed image annotation, e.g. by object bounding boxes, however, is both expensive and...
详细信息
ISBN:
(纸本)9781467369640
Successful methods for visual object recognition typically rely on training datasets containing lots of richly annotated images. Detailed image annotation, e.g. by object bounding boxes, however, is both expensive and often subjective. We describe a weakly supervised convolutional neural network (CNN) for object classification that relies only on image-level labels, yet can learn from cluttered scenes containing multiple objects. We quantify its object classification and object location prediction performance on the Pascal VOC 2012 (20 object classes) and the much larger Microsoft COCO (80 object classes) datasets. We find that the network (i) outputs accurate image-level labels, (ii) predicts approximate locations (but not extents) of objects, and (iii) performs comparably to its fully-supervised counterparts using object bounding box annotation for training.
In the depth from defocus (DFD) method two defocused images of a scene are obtained by capturing the scene with different sets of camera parameters. An arbitrary selection of the camera settings can result in observed...
详细信息
ISBN:
(纸本)0780342364
In the depth from defocus (DFD) method two defocused images of a scene are obtained by capturing the scene with different sets of camera parameters. An arbitrary selection of the camera settings can result in observed images whose relative blurring is insufficient to yield a good estimate of the depth. In this paper, we study the effect of the degree of relative blurring on the accuracy of the estimate of the depth by addressing the DFD problem in a maximum likelihood-based framework. We propose a criterion for optimal selection of camera parameters to obtain an improved estimate of the depth. The optimality criterion is based on the Cramer-Rao bound of the variance of the error in the estimate of blur. Simulations as well as experimental results on real images are presented for validation.
We propose an iterative method for estimating rigid transformations from point sets using adiabatic quantum computation. Compared to existing quantum approaches, our method relies on an adaptive scheme to solve the pr...
详细信息
ISBN:
(纸本)9781665469463
We propose an iterative method for estimating rigid transformations from point sets using adiabatic quantum computation. Compared to existing quantum approaches, our method relies on an adaptive scheme to solve the problem to high precision, and does not suffer from inconsistent rotation matrices. Experimentally, our method performs robustly on several 2D and 3D datasets even with high outlier ratio.
暂无评论