Video event recognition still faces great challenges due to large intra-class variation and low image resolution, in particular for surveillance videos. To mitigate these challenges and to improve the event recognitio...
详细信息
ISBN:
(纸本)9781467369640
Video event recognition still faces great challenges due to large intra-class variation and low image resolution, in particular for surveillance videos. To mitigate these challenges and to improve the event recognition performance, various context information from the feature level, the semantic level, as well as the prior level is utilized. Different from most existing context approaches that utilize context in one of the three levels through shallow models like support vector machines, or probabilistic models like BN and MRF, we propose a deep hierarchical context model that simultaneously learns and integrates context at all three levels, and holistically utilizes the integrated contexts for event recognition. We first introduce two types of context features describing the event neighborhood, and then utilize the proposed deep model to learn the middle level representations and combine the bottom feature level, middle semantic level and top prior level contexts together for event recognition. The experiments on state of art surveillance video event benchmarks including VIRAT 1.0 Ground Dataset, VIRAT 2.0 Ground Dataset, and the UT-Interaction Dataset demonstrate that the proposed model is quite effective in utilizing the context information for event recognition. It outperforms the existing context approaches that also utilize multiple level contexts on these event benchmarks.
The purpose of this study is not only to recognize some kind of facial expressions which is associated with human emotion but also to estimate its degree. Our method is based on the idea that facial expression recogni...
详细信息
ISBN:
(纸本)0780342364
The purpose of this study is not only to recognize some kind of facial expressions which is associated with human emotion but also to estimate its degree. Our method is based on the idea that facial expression recognition can be achieved by extracting a variation from expressionless face with considering face area as a whole pattern. For the purpose of extracting subtle changes in the face such as the degree of expressions, it is necessary to eliminate the individuality appearing in the facial image. Using a elastic net model, a variation of facial expression is represented as motion vectors of the deformed Net from a facial edge image. Then, applying K-L expansion, the change of facial expression represented as the motion vectors of nodes is mapped into low dimensional eigen space, and estimation is achieved by projecting input images on to the Emotion Space. In this paper we have constructed three kinds of expression models: happiness, anger, surprise, curd experimental results are evaluated.
Automatically describing an image with a sentence is a long-standing challenge in computervision and natural language processing. Due to recent progress in object detection, attribute classification, action recogniti...
详细信息
ISBN:
(纸本)9781467369640
Automatically describing an image with a sentence is a long-standing challenge in computervision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image description approaches using this new protocol and provide a benchmark for future comparisons. A version of CIDEr named CIDEr-D is available as a part of MS COCO evaluation server to enable systematic evaluation and benchmarking.
In order to reduce false alarms and to improve the target detection performance of an automatic target detection and recognition system operating in a cluttered environment, it is important to develop the models not o...
详细信息
ISBN:
(纸本)0818672587
In order to reduce false alarms and to improve the target detection performance of an automatic target detection and recognition system operating in a cluttered environment, it is important to develop the models not only for man-made targets but also of natural background clutters. Because of the high complexity of natural clutters, this clutter model can only be reliably built through learning from real examples. If available, contextual information that characterizes each training example can be used to further improve the learned clutter model. In this paper, we present such a clutter model aided target detection system. Emphases are placed on two topics: (1) learning the background clutter model from sensory data through a self-organizing process, (2) reinforcing the learned clutter model using contextual information.
This paper presents a novel locally linear KNN model with the goal of not only developing efficient representation and classification methods, but also establishing a relation between them so as to approximate some cl...
详细信息
ISBN:
(纸本)9781467369640
This paper presents a novel locally linear KNN model with the goal of not only developing efficient representation and classification methods, but also establishing a relation between them so as to approximate some classification rules, e.g. the Bayes decision rule. Towards that end, first, the proposed model represents the test sample as a linear combination of all the training samples and derives a new representation by learning the coefficients considering the reconstruction, locality and sparsity constraints. The theoretical analysis shows that the new representation has the grouping effect of the nearest neighbors, which is able to approximate the "ideal representation". And then the locally linear KNN model based classifier (LLKNNC), which shows its connection to the Bayes decision rule for minimum error in the view of kernel density estimation, is proposed for classification. Besides, the locally linear nearest mean classifier (LLNMC), whose relation to the LLKNNC is just like the nearest mean classifier to the KNN classifier, is also derived. Furthermore, to provide reliable kernel density estimation, the shifted power transformation and the coefficients cut-off method are applied to improve the performance of the proposed method. The effectiveness of the proposed model is evaluated on several visual recognition tasks such as face recognition, scene recognition, object recognition and action recognition. The experimental results show that the proposed model is effective and outperforms some other representative popular methods.
We describe an approach to the classification of 3-D objects using a multi-scale representation. This approach starts with a smoothing algorithm for representing objects at different scales. Smoothing is applied in cu...
详细信息
ISBN:
(纸本)0780342364
We describe an approach to the classification of 3-D objects using a multi-scale representation. This approach starts with a smoothing algorithm for representing objects at different scales. Smoothing is applied in curvature space directly, thus avoiding the usual shrinkage problems and allowing for efficient implementations. A 3-D similarity measure that integrates the representations of the objects at multiple scales is introduced Given a library of models, objects that are similar based an this multi-scale measure are grouped together into classes. Thtr objects that are in the same class ave combined into a single prototype object. Finally the prototypes are used for hierarchical recognition by first comparing the scene representation to the prototypes and then matching it only to the objects in the most likely class rather than to the entire library of models. Beyond its application to object recognition, this approach provides an attractive implementation of the intuitive nations of scale and approximate similarity for 3-D shapes.
This paper introduces a novel image representation capturing feature dependencies through the mining of meaningful combinations of visual features. This representation leads to a compact and discriminative encoding of...
详细信息
ISBN:
(纸本)9781479951178
This paper introduces a novel image representation capturing feature dependencies through the mining of meaningful combinations of visual features. This representation leads to a compact and discriminative encoding of images that can be used for image classification, object detection or object recognition. The method relies on (i) multiple random projections of the input space followed by local binarization of projected histograms encoded as sets of items, and (ii) the representation of images as Histograms of pattern Sets (HoPS). The approach is validated on four publicly available datasets (Daimler Pedestrian, Oxford Flowers, KTH Texture and PASCAL VOC2007), allowing comparisons with many recent approaches. The proposed image representation reaches state-of-the-art performance on each one of these datasets.
This paper addresses the problem of recognizing objects in large image databases. The method is based on local characteristics which are invariant to similarity transformations in the image. These characteristics are ...
详细信息
ISBN:
(纸本)0818672587
This paper addresses the problem of recognizing objects in large image databases. The method is based on local characteristics which are invariant to similarity transformations in the image. These characteristics are computed at automatically detected keypoints using the greyvalue signal. The method therefore works on images such as paintings for which geometry based recognition fails. Due to the locality of the method, images can be recognized being given part of an image and in the presence of occlusions. Applying a voting algorithm and semi-local constraints makes the method robust to noise, scene clutter and small perspective deformations. Experiments show an efficient recognition for different types of images. The approach has been validated on an image database containing 1020 images, some of them being very similar by structure, texture or shape.
During a fixed axis camera rotation every image point is moving on a conic section. If the point is a vanishing point the conic section is invariant to possible translations of the observer. Given the rotation axis an...
详细信息
ISBN:
(纸本)0818672587
During a fixed axis camera rotation every image point is moving on a conic section. If the point is a vanishing point the conic section is invariant to possible translations of the observer. Given the rotation axis and the inter-frame correspondence of a set of parallel lines we are able to compute the intrinsic parameters without knowledge of the rotation angles. We propagate the error covariances and we remove the bias in the computation of the conic. We experimentally study the sensitivity of calibration to the amount of rotation and we compare our performance to the performance of a recent active calibration technique.
The mainstream approach to structured prediction problems in computervision is to learn an energy function such that the solution minimizes that function. At prediction time, this approach must solve an often-challen...
详细信息
ISBN:
(纸本)9781467369640
The mainstream approach to structured prediction problems in computervision is to learn an energy function such that the solution minimizes that function. At prediction time, this approach must solve an often-challenging optimization problem. Search-based methods provide an alternative that has the potential to achieve higher performance. These methods learn to control a search procedure that constructs and evaluates candidate solutions. The recently-developed He-Search method has been shown to achieve state-of-theart results in natural language processing, but mixed success when applied to vision problems. This paper studies whether WC -Search can achieve similarly competitive performance on basic vision tasks such as object detection, scene labeling, and monocular depth estimation, where the leading paradigm is energy minimization. To this end, we introduce a search operator suited to the vision domain that improves a candidate solution by probabilistically sampling likely object configurations in the scene from the hierarchical Berkeley segmentation. We complement this search operator by applying the DAGGER algorithm to robustly train the search heuristic so it learns from its previous mistakes. Our evaluation shows that these improvements reduce the branching factor and search depth, and thus give a significant performance boost. Our state-of-the-art results on scene labeling and depth estimation suggest that WC Search provides a suitable tool for learning and inference in vision.
暂无评论