Face verification has many potential applications including filtering and ranking image/video search results on celebrities. Since these images/videos are taken under uncontrolled environments, the problem is very cha...
详细信息
ISBN:
(纸本)9781424439928
Face verification has many potential applications including filtering and ranking image/video search results on celebrities. Since these images/videos are taken under uncontrolled environments, the problem is very challenging due to dramatic lighting and pose variations, low resolutions, compression artifacts, etc. In addition, the available number of training images for each celebrity may be limited, hence learning individual classifiers for each person may cause overfitting. In this paper, we propose two ideas to meet the above challenges. First, we propose to use individual bins, instead of whole histograms, of Local Binary patterns (LBP) as features for learning, which yields significant performance improvements and computation reduction in our experiments. Second, we present a novel Multi-Task Learning (MTL) framework, called Boosted MTL, for face verification with limited training data. It jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The effectiveness of Boosted MTL and LBP bin features is verified with a large number of celebrity images/videos from the web.
A state-of-the-art approach to measure the similarity of two images is to model each image by a continuous distribution, generally a Gaussian mixture model (GMM), and to compute a probabilistic similarity between the ...
详细信息
ISBN:
(纸本)9781424439928
A state-of-the-art approach to measure the similarity of two images is to model each image by a continuous distribution, generally a Gaussian mixture model (GMM), and to compute a probabilistic similarity between the GMMs. One limitation of traditional measures such as the Kullback-Leibler (KL) divergence and the Probability Product Kernel (PPK) is that they measure a global match of distributions. This paper introduces a novel image representation. We propose to approximate an image, modeled by a GMM, as a convex combination of K reference image GMMs, and then to describe the image as the K-dimensional vector of mixture weights. The computed weights encode a similarity that favors local matches (i.e. matches of individual Gaussians) and is therefore fundamentally different from the KL or PPK. Although the computation of the mixture weights is a convex optimization problem, its direct optimization is difficult. We propose two approximate optimization algorithms: the first one based on traditional sampling methods, the second one based on a variational bound approximation of the true objective function. We apply this novel representation to the image categorization problem and compare its performance to traditional kernel-based methods. We demonstrate on the PASCAL VOC 2007 dataset a consistent increase in classification accuracy
In this work, we describe a segmentation cell method oriented to deal with experimental data obtained from 3D+t microscopical volumes. The proposed segmentation technique takes advantage of the pattern of appearances ...
详细信息
In this work, we describe a segmentation cell method oriented to deal with experimental data obtained from 3D+t microscopical volumes. The proposed segmentation technique takes advantage of the pattern of appearances exhibited by the objects (cells) from different focal planes, as a result of the object translucent properties and its interaction with light. This information allows us to discriminate between cells and artifacts (dust an other) with equivalent size and shape that are present in the biological preparation. Using a simple correlation criteria, the method matches a 3D video template (extracted from a sample of cells) with the motile cells contained into the biological sample, obtaining a high rate of true positives while discarding artifacts. In this work, our analysis is focused on sea urchin spermatozoa cells but is applicable to many other microscopical structures having the same optical properties.
The computer aided diagnosis (CAD) problems of detecting potentially diseased structures from medical images are typically distinguished by the following challenging characteristics: extremely unbalanced data between ...
详细信息
ISBN:
(纸本)9781424439928
The computer aided diagnosis (CAD) problems of detecting potentially diseased structures from medical images are typically distinguished by the following challenging characteristics: extremely unbalanced data between negative and positive classes;stringent real-time requirement of on execution;multiple positive candidates generated for the same malignant structure that are highly correlated and spatially close to each other. To address all these problems, we propose a novel learning formulation to combine cascade classification and multiple instance learning (MIL) in a unified min-max framework, leading to a joint optimization problem which can be converted to a tractable quadratically constrained quadratic program and efficiently solved by block-coordinate optimization algorithms. We apply the proposed approach to the CAD problems of detecting pulmonary embolism and colon cancer from computed tomography images. Experimental results show that our approach significantly reduces the computational cost while yielding comparable detection accuracy to the current state-of-the-art MIL or cascaded classifiers. Although not specifically designed for balanced MIL problems, the proposed method achieves superior performance on balanced MIL benchmark data such as MUSK and image data sets.
The matching and retrieval of 2D shapes is an important challenge in computervision. A large number of shape similarity approaches have been developed, with the main focus being the comparison or matching of pairs of...
详细信息
ISBN:
(纸本)9781424439928
The matching and retrieval of 2D shapes is an important challenge in computervision. A large number of shape similarity approaches have been developed, with the main focus being the comparison or matching of pairs of shapes. In these approaches, other shapes do not influence the similarity measure of a given pair of shapes. In the proposed approach, other shapes do influence the similarity measure of each pair of shapes, and we show that this influence is beneficial even in the unsupervised setting (without any prior knowledge of shape classes). The influence of other shapes is propagated as a diffusion process on a graph formed by a given set of shapes. However, the classical diffusion process does not perform well in shape space for two reasons: it is unstable in the presence of noise and the underlying local geometry is sparse. We introduce a locally constrained diffusion process which is more stable even if noise is present, and we densify the shape space by adding synthetic points we call 'ghost points'. We present experimental results that demonstrate very significant improvements over state-of-the-art shape matching algorithms. On the MPEG-7 data set, we obtained a bull's-eye retrieval score of 93.32%, which is the highest score ever reported in the literature.
The goal of this work is to automatically learn a large number of British Sign Language (BSL) signs from TV broadcasts. We achieve this by using the supervisory information available from subtitles broadcast simultane...
详细信息
ISBN:
(纸本)9781424439928
The goal of this work is to automatically learn a large number of British Sign Language (BSL) signs from TV broadcasts. We achieve this by using the supervisory information available from subtitles broadcast simultaneously with the signing. This supervision is both weak and noisy: it is weak due to the correspondence problem since temporal distance between sign and subtitle is unknown and signing does not follow the text order;it is noisy because subtitles can be signed in different ways, and because the occurrence of a subtitle word does not imply the presence of the corresponding sign. The contributions are: (i) we propose a distance function to match signing sequences which includes the trajectory of both hands, the hand shape and orientation, and properly models the case of hands touching;(ii) we show that by optimizing a scoring function based on multiple instance learning, we are able to extract the sign of interest from hours of signing footage, despite the very weak and noisy supervision. The method is automatic given the English target word of the sign to be learnt. Results are presented for 210 words including nouns, verbs and adjectives.
Higher order spatial features, such as doublets or triplets have been used to incorporate spatial information into the bag-of-local-features model. Due to computational limits, researchers have only been using feature...
详细信息
ISBN:
(纸本)9781424439928
Higher order spatial features, such as doublets or triplets have been used to incorporate spatial information into the bag-of-local-features model. Due to computational limits, researchers have only been using features up to the 3rd order, i.e., triplets, since the number of features increases exponentially with the order. We propose an algorithm for identifying high-order spatial features efficiently. The algorithm directly evaluates the inner product of the feature vectors from two images to be compared, identifying all high-order features automatically. The algorithm hence serves as a kernel for any kernel-based learning algorithms. The algorithm is based on the idea that if a high-order spatial feature co-occurs in both images, the occurrence of the feature in one image would be a translation from the occurrence of the same feature in the other image. This enables us to compute the kernel in time that is linear to the number of local features in an image (same as the bag of local features approach), regardless of the order. Therefore, our algorithm does not limit the upper bound of the order as in previous work. The experiment results on the object categorization task show that high order features can be calculated efficiently and provide significant improvement in object categorization performance.
We describe a method for retrieving shots containing a particular 2D human pose from unconstrained movie and TV videos. The method involves first localizing the spatial layout oldie head, torso and limbs in individual...
详细信息
ISBN:
(纸本)9781424439928
We describe a method for retrieving shots containing a particular 2D human pose from unconstrained movie and TV videos. The method involves first localizing the spatial layout oldie head, torso and limbs in individual frames using pictorial structures, and associating these through a shot by tracking. A feature vector describing the pose is then constructed from the pictorial structure. Shots can be retrieved either by querying on a single frame with the desired pose, or through a pose classifier trained from a set of pose examples. Our main contribution is an effective system for retrieving people based on their pose, and in particular we propose and investigate several pose descriptors which are person, clothing, background and lighting independent. As a second contribution, we improve the performance over existing methods for localizing upper body layout on unconstrained video. We compare the spatial layout pose retrieval to a baseline method where poses are retrieved using a HOG descriptor Performance is assessed on five episodes of the TV series 'Buffy the Vampire Slayer', and pose retrieval is demonstrated also on three Hollywood movies.
We present an image-based Simultaneous Localization and Mapping (SLAM) framework with online, appearance-only loop closing. We adopt a layered approach with metric maps over small areas at the local level and a global...
详细信息
ISBN:
(纸本)9781424439928
We present an image-based Simultaneous Localization and Mapping (SLAM) framework with online, appearance-only loop closing. We adopt a layered approach with metric maps over small areas at the local level and a global, graph-based abstract topological framework to build consistent maps over large distances. Rao-Blackwellised particle filtering and sparse bundle adjustment are efficiently coupled with a stereo-vision based odometry module to construct conditionally independent 'submaps' using SIFT features. By extracting keyframes from these submaps, a multi-resolution dictionary of distinct features is built online to learn a generative model of appearance and perform loop-closure. Creating such a dictionary also enables the system to distinguish between similar regions during loop closure without requiring any offline training, as has been described in other approaches. Furthermore, instead of occupancy or grid maps, we build 3D reconstructions of the world - a model we plan to use as input to a scene interpretation module for providing navigational cues to the visually impaired. We demonstrate the robustness of our SLAM system with indoor and outdoor experiments for full 6 degrees of freedom motion using only a stereo-camera in-hand, running at 1 Hz on a standard PC.
Detection and tracking of moving vehicles in airborne videos is a challenging problem. Many approaches have been proposed to improve motion segmentation on frame-by-frame and pixel-by-pixel bases, however, little atte...
详细信息
ISBN:
(纸本)9781424439928
Detection and tracking of moving vehicles in airborne videos is a challenging problem. Many approaches have been proposed to improve motion segmentation on frame-by-frame and pixel-by-pixel bases, however, little attention has been paid to analyze the long-term motion pattern, which is a distinctive property for moving vehicles in airborne videos. In this paper, we provide a straightforward geometric interpretation of a general motion pattern in 4D space (x,y,v(x),v(y)). We propose to use the Tensor Voting computational framework to detect and segment such motion patterns in 4D space. Specifically, in airborne videos, we analyze the essential difference in motion patterns caused by parallax and independent moving objects, which leads to a practical method for segmenting motion patterns (flows) created by moving vehicles in stabilized airborne videos. The flows are used in turn to facilitate dejection and tracking of each individual object in the flow. Conceptually, this approach is similar to "track-before-detect" techniques, which involves temporal information in the process as early as possible. As shown in the experiments, many difficult cases in airborne videos, such as parallax, noisy background modeling and long term occlusions, can be addressed by our approach.
暂无评论