Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. Howe...
详细信息
ISBN:
(纸本)9781424439942
Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. However, these trajectory based descriptors are not working well in the crowd environments like airports, rail stations, because those descriptors assume perfect motion/object segmentation. In this paper, we present an event detection method using dynamic texture descriptor. The dynamic texture descriptor is an extension of the local binary patterns. The image sequences are divided into regions. A flow is formed based on the similarity of the dynamic texture descriptors on the regions. We used real dataset for experiments. The results are promising.
In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters dow...
详细信息
ISBN:
(纸本)9781665448994
In this paper we present a unified formulation for a large class of relative pose problems with radial distortion and varying calibration. For minimal cases, we show that one can eliminate the number of parameters down to one to three. The relative pose can then be expressed using varying calibration constraints on the fundamental matrix, with entries that are polynomial in the parameters. We can then apply standard techniques based on the action matrix and Sturm sequences to construct our solvers. This enables efficient solvers for a large class of relative pose problems with radial distortion, using a common framework. We evaluate a number of these solvers for robust two-view inlier and epipolar geometry estimation, used as minimal solvers in RANSAC.
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of ...
详细信息
ISBN:
(纸本)9781665448994
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of an object detector (OD) on the frames, graphs are used to model the object relations and a graph convolutional network (GCN) is utilized to perform reasoning on the graphs. The resulting object-based frame-level features are then forwarded to a long short-term memory (LSTM) network for video event recognition. Moreover, the weighted in-degrees (WiDs) derived from the graph's adjacency matrix at frame level are used for identifying the objects that were considered most (or least) salient for event recognition and contributed the most (or least) to the final event recognition decision, thus providing an explanation for the latter. The experimental results show that the proposed method achieves state-of-the-art performance on the publicly available FCVID and YLI-MED datasets(1).
While most approaches to symmetry detection in machine vision try to explain the gray-values or colors of the pixels, Gestalt algebra has no room for such measurement data. The entities (i.e. Gestalten) are only defin...
详细信息
ISBN:
(纸本)9780769549903
While most approaches to symmetry detection in machine vision try to explain the gray-values or colors of the pixels, Gestalt algebra has no room for such measurement data. The entities (i.e. Gestalten) are only defined with respect to each other. They form a generic hierarchy, and live in a continuous domain without any pixel raster. There is also no constraint forcing them to completely fill an image, or prohibiting overlap. Yet, when used as a tool for symmetry recognition, the algebra must be somehow connected to the given data. In this paper this is done only on the primitive level using the well-known SIFT feature detector. From a set of such SIFT-based Gestalten follows a combinatorial set of higher-order symmetric Gestalten by constructing all possible terms using the operations of the algebra. The Gestalt domain contains a quality or assessment dimension. Taking the best Gestalten with respect to this attribute and clustering them yields the output for this competition participation.
Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing t...
详细信息
ISBN:
(纸本)9781665487399
Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification.
Perceiving distance from two camera images, a task called stereo vision, is fundamental for many applications in robotics or automation. However, algorithms that compute this information at high accuracy have a high c...
详细信息
ISBN:
(纸本)9781509014378
Perceiving distance from two camera images, a task called stereo vision, is fundamental for many applications in robotics or automation. However, algorithms that compute this information at high accuracy have a high computational complexity. One such algorithm, Semi Global Matching (SGM), performs well in many stereo vision benchmarks, while maintaining a manageable computational complexity. Nevertheless, CPU and GPU implementations of this algorithm often fail to achieve real-time processing of camera images, especially in power-constrained embedded environments. This work presents a novel architecture to calculate disparities through SGM. The proposed architecture is highly scalable and applicable for low-power embedded as well as high-performance multi-camera high-resolution applications.
Dynamic vision sensor event cameras produce a variable data rate stream of brightness change events. Event production at the pixel level is controlled by threshold, bandwidth, and refractory period bias current parame...
详细信息
ISBN:
(纸本)9781665448994
Dynamic vision sensor event cameras produce a variable data rate stream of brightness change events. Event production at the pixel level is controlled by threshold, bandwidth, and refractory period bias current parameter settings. Biases must be adjusted to match application requirements and the optimal settings depend on many factors. As a first step towards automatic control of biases, this paper proposes fixed-step feedback controllers that use measurements of event rate and noise. The controllers regulate the event rate within an acceptable range using threshold and refractory period control, and regulate noise using bandwidth control. Experiments demonstrate model validity and feedback control.
"Big Data" analysis is an emerging topic in computervision and patternrecognition. As one example problem of big data, we study semantic age labels and facial aging pattern analysis on a large database. In...
详细信息
ISBN:
(纸本)9780769549903
"Big Data" analysis is an emerging topic in computervision and patternrecognition. As one example problem of big data, we study semantic age labels and facial aging pattern analysis on a large database. In aging analysis, one of the great challenges is the lack of a large number of face images with ground truth age labels. Unlike many other example-based recognition problems where human annotations can be used as the ground truth labels for both training and testing, it is quite difficult to label the exact ages in face images by human annotators. An alternative is to exploit the unlabeled ages to enhance the age estimation performance. However, it is unclear whether the face images with unlabeled ages can be used or not for age estimation, and how to use the unlabeled data. In this paper, we study the two problems comprehensively under two paradigms: the semi-supervised learning and unsupervised learning for aging pattern analysis. We emphasize the importance of using ground truth age labels and a large database in order to derive a meaningful measure in the context of big data. Our study can make an impact on collecting aging patterns that is very expensive and time consuming in practice.
Egocentric vision provides a unique perspective of the visual world that is inherently human-centric. Since egocentric cameras are mounted on the user (typically on the user's head), they are naturally primed to g...
详细信息
ISBN:
(纸本)9781479943098
Egocentric vision provides a unique perspective of the visual world that is inherently human-centric. Since egocentric cameras are mounted on the user (typically on the user's head), they are naturally primed to gather visual information from our everyday interactions, and can even act on that information in real-time (e. g. for a vision aid). We believe that this human-centric characteristic of egocentric vision can have a large impact on the way we approach central computervision tasks such as visual detection, recognition, prediction, and socio-behavioral analysis. By taking advantage of the first-person point-of-view paradigm, there have been recent advances in areas such as personalized video summarization, understanding concepts of social saliency, activity analysis with inside-out cameras (a camera to capture eye gaze and an outward-looking camera), recognizing human interactions and modeling focus of attention. However, in many ways people are only beginning to understand the full potential (and limitations) of the first-person paradigm. In the 3rd workshop on Egocentric (First-Person) vision, we bring together researchers to discuss emerging topics such as: Personalization of visual analysis;Socio-behavioral modeling;Understanding group dynamics and interactions;Egocentric video as big data;First-person vision for robotics;and Egographical User Interfaces (EUIs).
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for...
详细信息
ISBN:
(纸本)9780769549903
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for computervision problems because they promote smooth surfaces where points are represented as subspaces. In this paper we propose Grassmannian Sparse Representations (GSR), a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss L1-norm minimization for optimal classification. We further introduce a new descriptor that we term Motion Depth Surface (MDS) and compare its classification performance against the traditional Motion History Image (MHI) descriptor. We demonstrate the effectiveness of GSR on computationally intensive 3D action sequences from the Microsoft Research 3D-Action and 3D-Gesture datasets.
暂无评论