Graph matching is a fundamental problem in computervision and patternrecognition area. In general, it can be formulated as an Integer Quadratic Programming (IQP) problem. Since it is NP-hard, approximate relaxations...
详细信息
ISBN:
(纸本)9781538604571
Graph matching is a fundamental problem in computervision and patternrecognition area. In general, it can be formulated as an Integer Quadratic Programming (IQP) problem. Since it is NP-hard, approximate relaxations are required. In this paper, a new graph matching method has been proposed. There are three main contributions of the proposed method: (1) we propose a new graph matching relaxation model, called Binary Constraint Preserving Graph Matching (BPGM), which aims to incorporate the discrete binary mapping constraints more in graph matching relaxation. Our BPGM is motivated by a new observation that the discrete binary constraints in IQP matching problem can be represented (or encoded) exactly by a l(2)-norm constraint. (2) An effective projection algorithm has been derived to solve BPGM model. (3) Using BPGM, we propose a path-following strategy to optimize IQP matching problem and thus obtain a desired discrete solution at convergence. Promising experimental results show the effectiveness of the proposed method.
We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a non-parametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body ...
详细信息
ISBN:
(纸本)9781538604571
We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a non-parametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed first in the inaugural COCO 2016 keypoints challenge, and significantly exceeds the previous state-of-the-art result on the MPII Multi-Person benchmark, both in performance and efficiency.
The effects of radial lens distortion often appear in wide-angle cameras of surveillance and safeguard systems, which may severely degrade performances of previous face recognition algorithms. Traditional methods for ...
详细信息
ISBN:
(纸本)9781728171685
The effects of radial lens distortion often appear in wide-angle cameras of surveillance and safeguard systems, which may severely degrade performances of previous face recognition algorithms. Traditional methods for radial lens distortion correction usually employ line features in scenarios that are not suitable for face images. In this paper, we propose a distortion-invariant face recognition system called RDCFace, which only utilize the distorted images of faces, to directly alleviate the effects of radial lens distortion. RDCFace is an end-to-end trainable cascade network, which can learn rectification and alignment parameters to achieve a better face recognition performance without requiring supervision of facial landmarks and distortion parameters. We design sequential spatial transformer layers to optimize the correction, alignment, and recognition modules jointly. The feasibility of our method comes from implicitly using the statistics of the layout of face features learned from the large-scale face data. Extensive experiments indicate that our method is robust to distortion and gains significant improvements on several benchmarks including LFW, YTF, CFP, and RadialFace, a real distorted face dataset compared with state-of-the-art methods.
Recognizing fine-grained sub-categories such as birds and dogs is extremely challenging due to the highly localized and subtle differences in some specific parts. Most previous works rely on object / part level annota...
详细信息
ISBN:
(纸本)9781467388511
Recognizing fine-grained sub-categories such as birds and dogs is extremely challenging due to the highly localized and subtle differences in some specific parts. Most previous works rely on object / part level annotations to build part-based representation, which is demanding in practical applications. This paper proposes an automatic finegrained recognition approach which is free of any object /part annotation at both training and testing stages. Our method explores a unified framework based on two steps of deep filter response picking. The first picking step is to find distinctive filters which respond to specific patterns significantly and consistently, and learn a set of part detectors via iteratively alternating between new positive sample mining and part model retraining. The second picking step is to pool deep filter responses via spatially weighted combination of Fisher Vectors. We conditionally pick deep filter responses to encode them into the final representation, which considers the importance of filter responses themselves. Integrating all these techniques produces a much more powerful framework, and experiments conducted on CUB-2002011 and Stanford Dogs demonstrate the superiority of our proposed algorithm over the existing methods.
Recently active learning has attracted a lot of attention in computervision field, as it is time and cost consuming to prepare a good set of labeled images for vision data analysis. Most existing active learning appr...
详细信息
ISBN:
(纸本)9780769549897
Recently active learning has attracted a lot of attention in computervision field, as it is time and cost consuming to prepare a good set of labeled images for vision data analysis. Most existing active learning approaches employed in computervision adopt most uncertainty measures as instance selection criteria. Although most uncertainty query selection strategies are very effective in many circumstances, they fail to take information in the large amount of unlabeled instances into account and are prone to querying outliers. In this paper we present a novel adaptive active learning approach that combines an information density measure and a most uncertainty measure together to select critical instances to label for image classifications. Our experiments on two essential tasks of computervision, object recognition and scene recognition, demonstrate the efficacy of the proposed approach.
It is often necessary to handle randomness and geometry is computervision, for instance to match and fuse together noisy geometric features such as points, lines or 3D frames, or to estimate a geometric transformatio...
详细信息
ISBN:
(纸本)0818672587
It is often necessary to handle randomness and geometry is computervision, for instance to match and fuse together noisy geometric features such as points, lines or 3D frames, or to estimate a geometric transformation from a set of matched features. However, the proper handling of these geometric features is far more difficult than for points, and a number of paradoxes can arise. We analyse in this article three basic problems: (1) what is a uniform random distribution of features, (2) how to define a distance between features, and (3) what is the 'mean feature' of a number of feature measurements, and we propose generic methods to solve them.
In this paper we propose a Enhanced Bayesian Compression method to flexibly compress the deep networks via reinforcement learning. Unlike existing Bayesian compression methods which can not explicitly enforce quantiza...
详细信息
ISBN:
(纸本)9781728132938
In this paper we propose a Enhanced Bayesian Compression method to flexibly compress the deep networks via reinforcement learning. Unlike existing Bayesian compression methods which can not explicitly enforce quantization weights during training, our method learns flexible codebooks in each layer for an optimal network quantization. To dynamically adjust the state of codebooks, we employ an Actor-Critic network to collaborate with the original deep network. Unlike most existing network quantization methods, our EBC doesn't require re-training procedures after the quantization. Experimental results show that our method obtains low-bit precision with acceptable accuracy drop on MNIST CIFAR and ImageNet.
We propose an iterative method for estimating rigid transformations from point sets using adiabatic quantum computation. Compared to existing quantum approaches, our method relies on an adaptive scheme to solve the pr...
详细信息
ISBN:
(纸本)9781665469463
We propose an iterative method for estimating rigid transformations from point sets using adiabatic quantum computation. Compared to existing quantum approaches, our method relies on an adaptive scheme to solve the problem to high precision, and does not suffer from inconsistent rotation matrices. Experimentally, our method performs robustly on several 2D and 3D datasets even with high outlier ratio.
We address the problem of locating a gray-level pattern in a gray-level image. The pattern can have been transformed formed by an affine transformation, and may have undergone some additional changes. We define a diff...
详细信息
ISBN:
(纸本)0780342364
We address the problem of locating a gray-level pattern in a gray-level image. The pattern can have been transformed formed by an affine transformation, and may have undergone some additional changes. We define a difference function based on comparing each pixel of the pattern with a window: in the image, and search efficiently for transformations that minimise the difference function. The search is guaranteed: it will always find the transformation minimising the difference function, and not get fooled by a local minimum;it is also efficient, in that it does not need to examine every transformation in order to achieve this guarantee. This technique can be applied to object location, motion tracking, optical flow, or block-based motion compensation in video image sequence compression (e.g., MPEG).
The deep two-stream architecture [23] exhibited excellent performance on video based action recognition. The most computationally expensive step in this approach comes from the calculation of optical flow which preven...
详细信息
ISBN:
(纸本)9781467388511
The deep two-stream architecture [23] exhibited excellent performance on video based action recognition. The most computationally expensive step in this approach comes from the calculation of optical flow which prevents it to be real-time. This paper accelerates this architecture by replacing optical flow with motion vector which can be obtained directly from compressed videos without extra calculation. However, motion vector lacks fine structures, and contains noisy and inaccurate motion patterns, leading to the evident degradation of recognition performance. Our key insight for relieving this problem is that optical flow and motion vector are inherent correlated. Transferring the knowledge learned with optical flow CNN to motion vector CNN can significantly boost the performance of the latter. Specifically, we introduce three strategies for this, initialization transfer, supervision transfer and their combination. Experimental results show that our method achieves comparable recognition performance to the state-of-the-art, while our method can process 390.7 frames per second, which is 27 times faster than the original two-stream method.
暂无评论