We represent local spatial structure in a color image using feature matrices that are computed from an image region. Feature matrices contain significantly more information about local image structure than previous re...
详细信息
ISBN:
(纸本)0818672587
We represent local spatial structure in a color image using feature matrices that are computed from an image region. Feature matrices contain significantly more information about local image structure than previous representations. Although feature matrices are useful for surface recognition, this representation depends on the spectral properties of the scene illumination. Using a finite dimensional linear model for surface spectral reflectance with the same number of parameters as the number of color bands, we show that illumination changes correspond to linear transformations of the feature matrices and that surface rotations correspond to circular shifts of the matrices. From these relationships we derive an algorithm for illumination and geometry invariant recognition of local surface structure. We demonstrate the algorithm with a series of experiments on images of real objects.
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from o...
详细信息
ISBN:
(纸本)9781424469840
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results.
This paper presents a prediction-and-verification segmentation scheme using attention images from multiple fixations. A major advantage of this scheme is that it can handle a large number of different deformable objec...
详细信息
ISBN:
(纸本)0818672587
This paper presents a prediction-and-verification segmentation scheme using attention images from multiple fixations. A major advantage of this scheme is that it can handle a large number of different deformable objects presented in complex backgrounds. The scheme is also relatively efficient since the segmentation is guided by the past knowledge through a prediction-and-verification scheme. The system has been tested to segment hands in the sequences of intensity images, where each sequence represents a hand sign. The experimental result showed a 95% correct segmentation rate with a 3% false rejection rate.
Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this pap...
详细信息
ISBN:
(纸本)9781467369640
Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors. Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face. On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result [15] by 30% on both datasets.
This paper presents a novel, discriminative, multi-class classifier based on Sequential pattern Trees. It is efficient to learn, compared to other Sequential pattern methods, and scalable for use with large classifier...
详细信息
ISBN:
(纸本)9781467312288
This paper presents a novel, discriminative, multi-class classifier based on Sequential pattern Trees. It is efficient to learn, compared to other Sequential pattern methods, and scalable for use with large classifier banks. For these reasons it is well suited to Sign Language recognition. Using deterministic robust features based on hand trajectories, sign level classifiers are built from sub-units. Results are presented both on a large lexicon single signer data set and a multi-signer Kinect (TM) data set. In both cases it is shown to out perform the non-discriminative Markov model approach and be equivalent to previous, more costly, Sequential pattern (SP) techniques.
We present a surface radiance model for diffuse lighting that incorporates shadows, interreflections, and surface orientation. We show that, for smooth surfaces, the model is an excellent approximation of the radiosit...
详细信息
ISBN:
(纸本)0818672587
We present a surface radiance model for diffuse lighting that incorporates shadows, interreflections, and surface orientation. We show that, for smooth surfaces, the model is an excellent approximation of the radiosity equation. We present a new data structure and algorithm that uses this model to compute shape-from-shading under diffuse lighting. The algorithm was tested on both synthetic and real images, and performs more accurately than the only previous algorithm for this problem. Various causes of error are discussed, including approximation errors in image modelling, poor local constraints at the image boundary, and ill-conditioning of the problem itself.
We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume...
详细信息
ISBN:
(纸本)9781538664209
We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the current optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436) images. Our models are available on our project website.
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computervision and natural language processing. In this paper, we present a generative model based on...
详细信息
ISBN:
(纸本)9781467369640
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computervision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computervision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.
While activity recognition is a current focus of research the challenging problem of fine-grained activity recognition is largely overlooked. We thus propose a novel database of 65 cooking activities, continuously rec...
详细信息
ISBN:
(纸本)9781467312288
While activity recognition is a current focus of research the challenging problem of fine-grained activity recognition is largely overlooked. We thus propose a novel database of 65 cooking activities, continuously recorded in a realistic setting. Activities are distinguished by fine-grained body motions that have low inter-class variability and high intra-class variability due to diverse subjects and ingredients. We benchmark two approaches on our dataset, one based on articulated pose tracks and the second using holistic video features. While the holistic approach outperforms the pose-based approach, our evaluation suggests that fine-grained activities are more difficult to detect and the body model can help in those cases. Providing high-resolution videos as well as an intermediate pose representation we hope to foster research in fine-grained activity recognition.
The purpose of this study is not only to recognize some kind of facial expressions which is associated with human emotion but also to estimate its degree. Our method is based on the idea that facial expression recogni...
详细信息
ISBN:
(纸本)0780342364
The purpose of this study is not only to recognize some kind of facial expressions which is associated with human emotion but also to estimate its degree. Our method is based on the idea that facial expression recognition can be achieved by extracting a variation from expressionless face with considering face area as a whole pattern. For the purpose of extracting subtle changes in the face such as the degree of expressions, it is necessary to eliminate the individuality appearing in the facial image. Using a elastic net model, a variation of facial expression is represented as motion vectors of the deformed Net from a facial edge image. Then, applying K-L expansion, the change of facial expression represented as the motion vectors of nodes is mapped into low dimensional eigen space, and estimation is achieved by projecting input images on to the Emotion Space. In this paper we have constructed three kinds of expression models: happiness, anger, surprise, curd experimental results are evaluated.
暂无评论