We address the false response influence problem when learning and applying discriminative parts to construct the mid-level representation in scene classification. It is often caused by the complexity of latent image s...
详细信息
ISBN:
(纸本)9781479951178
We address the false response influence problem when learning and applying discriminative parts to construct the mid-level representation in scene classification. It is often caused by the complexity of latent image structure when convolving part filters with input images. this problem makes mid-level representation, even after pooling, not distinct enough to classify input data correctly to categories. Our solution is to learn important spatial pooling regions along withtheir appearance. the experiments show that this new framework suppresses false response and produces improved results on several datasets, including MIT-Indoor, 15-Scene, and UIUC 8-Sport. When combined with global image features, our method achieves state-of-the-art performance on these datasets.
Most existing pose robust methods are too computational complex to meet practical applications and their performance under unconstrained environments are rarely evaluated. In this paper, we propose a novel method for ...
详细信息
ISBN:
(纸本)9780769549897
Most existing pose robust methods are too computational complex to meet practical applications and their performance under unconstrained environments are rarely evaluated. In this paper, we propose a novel method for pose robust face recognition towards practical applications, which is fast, pose robust and can work well under unconstrained environments. Firstly, a 3D deformable model is built and a fast 3D model fitting algorithm is proposed to estimate the pose of face image. Secondly, a group of Gabor filters are transformed according to the pose and shape of face image for feature extraction. Finally, PCA is applied on the pose adaptive Gabor features to remove the redundances and Cosine metric is used to evaluate the similarity. the proposed method has three advantages: (1) the pose correction is applied in the filter space rather than image space, which makes our method less affected by the precision of the 3D model;(2) By combining the holistic pose transformation and local Gabor filtering, the final feature is robust to pose and other negative factors in face recognition;(3) the 3D structure and facial symmetry are successfully used to deal with self-occlusion. Extensive experiments on FERET and PIE show the proposed method outperforms state-of-the-art methods significantly, meanwhile, the method works well on LFW.
In this paper we present and start analyzing the iCub World data-set, an object recognition data-set, we acquired using a Human-Robot Interaction (HRI) scheme and the iCub humanoid robot platform. Our set up allows fo...
详细信息
ISBN:
(纸本)9780769549903
In this paper we present and start analyzing the iCub World data-set, an object recognition data-set, we acquired using a Human-Robot Interaction (HRI) scheme and the iCub humanoid robot platform. Our set up allows for rapid acquisition and annotation of data with corresponding ground truth. While more constrained in its scopes - the iCub world is essentially a robotics research lab - we demonstrate how the proposed data-set poses challenges to current recognition systems. the iCubWorld data-set is publicly available (1).
Traditionally computervision and patternrecognition algorithms are evaluated by measuring differences between final interpretations and ground truth. these black-box evaluations ignore intermediate results, making i...
详细信息
ISBN:
(纸本)9783642046667
Traditionally computervision and patternrecognition algorithms are evaluated by measuring differences between final interpretations and ground truth. these black-box evaluations ignore intermediate results, making it difficult to use intermediate results in diagnosing errors and optimization. We propose "opening the box," representing vision algorithms as sequences of decision points where recognition results are selected from a set of alternatives. For this purpose, we present a domain-specific language for patternrecognition tasks, the recognition Strategy Language (RSL). At run-time, an RSL interpreter records a complete history of decisions made during recognition, as it applies them to a set of interpretations maintained for the algorithm. Decision histories provide a rich new Source of information: recognition errors may be traced back to the specific decisions that caused them, and intermediate interpretations may be recovered and displayed. this additional information also permits new evaluation metrics that include false negatives (correct hypotheses that the algorithm generates and later rejects), Such as the percentage of ground truth hypotheses generated (historical recall), and the percentage of generated hypotheses that are correct (historical precision). We illustrate the approach through all analysis of cell detection in two published table recognition algorithms.
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets a...
详细信息
ISBN:
(纸本)9781509014378
In this paper, we present two large video multi-modal datasets for RGB and RGB-D gesture recognition: the ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and the Continuous Gesture Dataset (ConGD). Both datasets are derived from the ChaLearn Gesture Dataset (CGD) that has a total of more than 50000 gestures for the "one-shot-learning" competition. To increase the potential of the old dataset, we designed new well curated datasets composed of 249 gesture labels, and including 47933 gestures manually labeled the begin and end frames in sequences. Using these datasets we will open two competitions on the CodaLab platform so that researchers can test and compare their methods for "user independent" gesture recognition. the first challenge is designed for gesture spotting and recognition in continuous sequences of gestures while the second one is designed for gesture classification from segmented data. the baseline method based on the bag of visual words model is also presented.
Ubiquitous image blur brings out a practically important question - what are effective features to differentiate between blurred and unblurred image regions. We address it by studying a few blur feature representation...
详细信息
ISBN:
(纸本)9781479951178
Ubiquitous image blur brings out a practically important question - what are effective features to differentiate between blurred and unblurred image regions. We address it by studying a few blur feature representations in image gradient, Fourier domain, and data-driven local filters. Unlike previous methods, which are often based on restoration mechanisms, our features are constructed to enhance discriminative power and are adaptive to various blur scales in images. To avail evaluation, we build a new blur perception dataset containing thousands of images with labeled ground-truth. Our results are applied to several applications, including blur region segmentation, deblurring, and blur magnification.
In this paper, we propose a robust method for visual tracking relying on mean shift, sparse coding and spatial pyramids. Firstly, we extend the original mean shift approach to handle orientation space and scale space ...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we propose a robust method for visual tracking relying on mean shift, sparse coding and spatial pyramids. Firstly, we extend the original mean shift approach to handle orientation space and scale space and name this new method as mean transform. the mean transform method estimates the motion, including the location, orientation and scale, of the interested object window simultaneously and effectively. Secondly, a pixel-wise dense patch sampling technique and a region-wise trivial template designing scheme are introduced which enable our approach to run very accurately and efficiently. In addition, instead of using either holistic representation or local representation only, we apply spatial pyramids by combining these two representations into our approach to deal with partial occlusion problems robustly. Observed from the experimental results, our approach outperforms state-of-theart methods in many benchmark sequences.
In this paper, we propose an efficient method to reconstruct surface-from-gradients (SfG). Our method is formulated under the framework of discrete geometry processing. Unlike the existing SfG approaches, we transfer ...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we propose an efficient method to reconstruct surface-from-gradients (SfG). Our method is formulated under the framework of discrete geometry processing. Unlike the existing SfG approaches, we transfer the continuous reconstruction problem into a discrete space and efficiently solve the problem via a sequence of least-square optimization steps. Our discrete formulation brings three advantages: 1) the reconstruction preserves sharp-features, 2) sparse/incomplete set of gradients can be well handled, and 3) domains of computation can have irregular boundaries. Our formulation is direct and easy to implement, and the comparisons with state-of-the-arts show the effectiveness of our method.
In this work, we return to the underlying mathematical definition of a manifold and directly characterise learning a manifold as finding an atlas, or a set of overlapping charts, that accurately describe local structu...
详细信息
ISBN:
(纸本)9780769549897
In this work, we return to the underlying mathematical definition of a manifold and directly characterise learning a manifold as finding an atlas, or a set of overlapping charts, that accurately describe local structure. We formulate the problem of learning the manifold as an optimisation that simultaneously refines the continuous parameters defining the charts, and the discrete assignment of points to charts. In contrast to existing methods, this direct formulation of a manifold does not require "unwrapping" the manifold into a lower dimensional space and allows us to learn closed manifolds of interest to vision, such as those corresponding to gait cycles or camera pose. We report state-of-the-art results for manifold based nearest neighbour classification on vision datasets, and show how the same techniques can be applied to the 3D reconstruction of human motion from a single image.
Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in coll...
详细信息
ISBN:
(纸本)9780769549897
Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in collective motion, is a fundamental and universal measurement for various crowd systems. By integrating path similarities among crowds on collective manifold, this paper proposes a descriptor of collectiveness and an efficient computation for the crowd and its constituent individuals. the algorithm of the Collective Merging is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness descriptor on the system of self-driven particles. We then compare the collectiveness descriptor to human perception for collective motion and show high consistency. Our experiments regarding the detection of collective motions and the measurement of collectiveness in videos of pedestrian crowds and bacteria colony demonstrate a wide range of applications of the collectiveness descriptor(1).
暂无评论