We propose a perceptual grouping framework that organizes image edges into meaningful structures and demonstrate its usefulness on various computervision tasks. Our grouper formulates edge grouping as a graph partiti...
详细信息
ISBN:
(纸本)9781467369640
We propose a perceptual grouping framework that organizes image edges into meaningful structures and demonstrate its usefulness on various computervision tasks. Our grouper formulates edge grouping as a graph partition problem, where a learning to rank method is developed to encode probabilities of candidate edge pairs. In particular, RankSVM is employed for the first time to combine multiple Gestalt principles as cue for edge grouping. Afterwards, an edge grouping based object proposal measure is introduced that yields proposals comparable to state-of-the-art alternatives. We further show how human-like sketches can be generated from edge groupings and consequently used to deliver state-of-the-art sketch-based image retrieval performance. Last but not least, we tackle the problem of freehand human sketch segmentation by utilizing the proposed grouper to cluster strokes into semantic object parts.
The grasp type provides crucial information about human action. However, recognizing the grasp type from unconstrained scenes is challenging because of the large variations in appearance, occlusions and geometric dist...
详细信息
ISBN:
(纸本)9781467369640
The grasp type provides crucial information about human action. However, recognizing the grasp type from unconstrained scenes is challenging because of the large variations in appearance, occlusions and geometric distortions. In this paper, first we present a convolutional neural network to classify functional hand grasp types. Experiments on a public static scene hand data set validate good performance of the presented method. Then we present two applications utilizing grasp type classification: (a) inference of human action intention and (b) fine level manipulation action segmentation. Experiments on both tasks demonstrate the usefulness of grasp type as a cognitive feature for computervision. This study shows that the grasp type is a powerful symbolic representation for action understanding, and thus opens new avenues for future research.
Mathematical optimization plays a fundamental role in solving many problems in computervision (e.g., camera calibration, image alignment, structure from motion). It is generally accepted that second order descent met...
详细信息
ISBN:
(纸本)9781467369640
Mathematical optimization plays a fundamental role in solving many problems in computervision (e.g., camera calibration, image alignment, structure from motion). It is generally accepted that second order descent methods are the most robust, fast, and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computervision, second order descent methods have two main drawbacks: 1) the function might not be analytically differentiable and numerical approximations are impractical, and 2) the Hessian may be large and not positive definite. Recently, Supervised Descent Method (SDM), a method that learns the "weighted averaged gradients" in a supervised manner has been proposed to solve these issues. However, SDM is a local algorithm and it is likely to average conflicting gradient directions. This paper proposes Global SDM (GSDM), an extension of SDM that divides the search space into regions of similar gradient directions. GSDM provides a better and more efficient strategy to minimize non-linear least squares functions in computervision problems. We illustrate the effectiveness of GSDM in two problems: non-rigid image alignment and extrinsic camera calibration.
Action recognition and pose estimation from video are closely related tasks for understanding human motion, most methods, however, learn separate models and combine them sequentially. In this paper, we propose a frame...
详细信息
ISBN:
(纸本)9781467369640
Action recognition and pose estimation from video are closely related tasks for understanding human motion, most methods, however, learn separate models and combine them sequentially. In this paper, we propose a framework to integrate training and testing of the two tasks. A spatial-temporal And-Or graph model is introduced to represent action at three scales. Specifically the action is decomposed into poses which are further divided to mid-level ST-parts and then parts. The hierarchical structure of our model captures the geometric and appearance variations of pose at each frame and lateral connections between ST-parts at adjacent frames capture the action-specific motion information. The model parameters for three scales are learned discriminatively, and action labels and poses are efficiently inferred by dynamic programming. Experiments demonstrate that our approach achieves state-of-art accuracy in action recognition while also improving pose estimation.
We present an unsupervised learning approach for optical flow estimation by improving the upsampling and learning of pyramid network. We design a self-guided upsample module to tackle the interpolation blur problem ca...
详细信息
ISBN:
(纸本)9781665445092
We present an unsupervised learning approach for optical flow estimation by improving the upsampling and learning of pyramid network. We design a self-guided upsample module to tackle the interpolation blur problem caused by bilinear upsampling between pyramid levels. Moreover, we propose a pyramid distillation loss to add supervision for intermediate levels via distilling the finest flow as pseudo labels. By integrating these two components together, our method achieves the best performance for unsupervised optical flow learning on multiple leading benchmarks, including MPI-SIntel, KITTI 2012 and KITTI 2015. In particular, we achieve EPE=1.4 on KITTI 2012 and F1=9.38% on KITTI 2015, which outperform the previous state-of-the-art methods by 22.2% and 15.7%, respectively.
We propose a new method for view synthesis from real images using stereo vision. The method does not explicitly model scene geometry, and enables fast and exact generation of synthetic views. We also reevaluate the re...
详细信息
ISBN:
(纸本)0818672587
We propose a new method for view synthesis from real images using stereo vision. The method does not explicitly model scene geometry, and enables fast and exact generation of synthetic views. We also reevaluate the requirements on stereo algorithms for the application of view synthesis and discuss ways of dealing with partially occluded regions of unknown depth and with completely occluded regions of unknown texture. Our experiments demonstrate that it is possible to efficiently synthesize realistic new views even from inaccurate and incomplete depth information.
Previous work [5], [2] have developed an approach for estimating shape and albedo from multiple images assuming Lambertian reflectance with single light sources. The main contributions of this paper are: (i) to show h...
详细信息
ISBN:
(纸本)0780342364
Previous work [5], [2] have developed an approach for estimating shape and albedo from multiple images assuming Lambertian reflectance with single light sources. The main contributions of this paper are: (i) to show how the approach can be generalized to include ambient background illumination, (ii) to demonstrate the use of the integrability constraint for solving this problem, and (iii) an iterative algorithm which is able to improve the analysis by finding shadows and rejecting them.
A local parallel method is described for computing the stochastic completion field introduced in an earlier report. The local parallel method can be interpreted as a stable finite difference scheme for solving the und...
详细信息
ISBN:
(纸本)0818672587
A local parallel method is described for computing the stochastic completion field introduced in an earlier report. The local parallel method can be interpreted as a stable finite difference scheme for solving the underlying Fokker-Planck equation identified by Mumford. The new method is more plausible as a neural model since (1) unlike the previous method, it can be computed in a sparse, locally connected network;and (2) the network dynamics are consistent with psycophysical measurements of the time course of illusory contour formation.
In this paper we target at generating generic action proposals in unconstrained videos. Each action proposal corresponds to a temporal series of spatial bounding boxes, i.e., a spatio-temporal video tube, which has a ...
详细信息
ISBN:
(纸本)9781467369640
In this paper we target at generating generic action proposals in unconstrained videos. Each action proposal corresponds to a temporal series of spatial bounding boxes, i.e., a spatio-temporal video tube, which has a good potential to locate one human action. Assuming each action is performed by a human with meaningful motion, both appearance and motion cues are utilized to measure the actionness of the video tubes. After picking those spatiotemporal paths of high actionness scores, our action proposal generation is formulated as a maximum set coverage problem, where greedy search is performed to select a set of action proposals that can maximize the overall actionness score. Compared with existing action proposal approaches, our action proposals do not rely on video segmentation and can be generated in nearly real-time. Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of our action proposals as well as competitive results on action detection and search.
Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. T...
详细信息
ISBN:
(纸本)9781467369640
Despite the fact that object detection, 3D pose estimation, and sub-category recognition are highly correlated tasks, they are usually addressed independently from each other because of the huge space of parameters. To jointly model all of these tasks, we propose a coarse-to-fine hierarchical representation, where each level of the hierarchy represents objects at a different level of granularity. The hierarchical representation prevents performance loss, which is often caused by the increase in the number of parameters (as we consider more tasks to model), and the joint modeling enables resolving ambiguities that exist in independent modeling of these tasks. We augment PASCAL3D+ [34] dataset with annotations for these tasks and show that our hierarchical model is effective in joint modeling of object detection, 3D pose estimation, and sub-category recognition.
暂无评论