We propose hinge-loss Markov random fields (HL-MRFs), a powerful class of continuous-valued graphical models, for high-level computervision tasks. HL-MRFs are characterized by log-concave density functions, and are a...
详细信息
ISBN:
(纸本)9780769549903
We propose hinge-loss Markov random fields (HL-MRFs), a powerful class of continuous-valued graphical models, for high-level computervision tasks. HL-MRFs are characterized by log-concave density functions, and are able to perform efficient, exact inference. Their templated hinge-loss potential functions naturally encode soft-valued logical rules. Using the declarative modeling language probabilistic soft logic, one can easily define HL-MRFs via familiar constructs from first-order logic. We apply HL-MRFs to the task of activity detection, using principles of collective classification. Our model is simple, intuitive and interpretable. We evaluate our model on two datasets and show that it achieves significant lift over the low-level detectors.
Multi-camera tracking (MCT) plays a crucial role in various computervision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In thi...
详细信息
ISBN:
(纸本)9798350365474
Multi-camera tracking (MCT) plays a crucial role in various computervision applications. However, accurate tracking of individuals across multiple cameras faces challenges, particularly with identity switches. In this paper, we present an efficient online MCT system that tackles these challenges through online processing. Our system leverages memory-efficient accumulated appearance features to provide stable representations of individuals across cameras and time. By incorporating trajectory validation using hierarchical agglomerative clustering (HAC) in overlapping regions, ID transfers are identified and rectified. Evaluation on the 2024 AI City Challenge Track 1 dataset [39] demonstrates the competitive performance of our system, achieving accurate tracking in both overlapping and non-overlapping camera networks. With a 40.3% HOTA score [29], our system ranked 9th in the challenge. The integration of trajectory validation enhances performance by 8% over the baseline, and the accumulated appearance features further contribute to a 17% improvement.
Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tacti...
详细信息
ISBN:
(纸本)9781538607336
Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tactical profiling in elite badminton. The proposed approach uses computervision techniques to automate data gathering from video footage. The image processing algorithm is validated using video footage of the highest level tournaments, including the Olympic Games. The average accuracy of player position detection is 96.03% and 97.09% on the two halves of a badminton court. Next, frequent trajectories of badminton players are extracted and classified according to their tactical relevance. The classification performs at 97.79% accuracy, 97.81% precision, 97.44% recall, and 97.62% F-score. The combination of automated player position detection, frequent trajectory extraction, and the subsequent classification can be used to automatically generate player tactical profiles.
Dynamic vision sensor event cameras produce a variable data rate stream of brightness change events. Event production at the pixel level is controlled by threshold, bandwidth, and refractory period bias current parame...
详细信息
ISBN:
(纸本)9781665448994
Dynamic vision sensor event cameras produce a variable data rate stream of brightness change events. Event production at the pixel level is controlled by threshold, bandwidth, and refractory period bias current parameter settings. Biases must be adjusted to match application requirements and the optimal settings depend on many factors. As a first step towards automatic control of biases, this paper proposes fixed-step feedback controllers that use measurements of event rate and noise. The controllers regulate the event rate within an acceptable range using threshold and refractory period control, and regulate noise using bandwidth control. Experiments demonstrate model validity and feedback control.
Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. Howe...
详细信息
ISBN:
(纸本)9781424439942
Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. However, these trajectory based descriptors are not working well in the crowd environments like airports, rail stations, because those descriptors assume perfect motion/object segmentation. In this paper, we present an event detection method using dynamic texture descriptor. The dynamic texture descriptor is an extension of the local binary patterns. The image sequences are divided into regions. A flow is formed based on the similarity of the dynamic texture descriptors on the regions. We used real dataset for experiments. The results are promising.
computervision techniques that operate on hyper- and multispectral imagery benefit from the additional amount of spectral information relative to those that exploit traditional RGB or monochromatic visual data. Howev...
详细信息
ISBN:
(纸本)9781728125060
computervision techniques that operate on hyper- and multispectral imagery benefit from the additional amount of spectral information relative to those that exploit traditional RGB or monochromatic visual data. However, the increased volume of data to be processed brings about additional memory, storage and computational requirements. In order to address such limitations, a wide range of techniques for dimensionality reduction have been introduced by previous work. In this paper, we propose a framework for spectral band selection that is highly data- and computationally efficient. The method leverages a convolutional siamese network learned by optimizing a contrastive loss, and performs band selection based on the low-dimensional data embeddings produced by the network. We empirically demonstrate the efficacy of the method on an object detection task from aerial multispectral imagery. The results show that, in spite of the method's frugality, it produces very competitive band selection results against the evaluated competing techniques.
Automotive systems provide a unique opportunity for mobile vision technologies to improve road safety by understanding and monitoring the driver. In this work, we propose a real-time framework for early detection of d...
详细信息
ISBN:
(纸本)9781479943098
Automotive systems provide a unique opportunity for mobile vision technologies to improve road safety by understanding and monitoring the driver. In this work, we propose a real-time framework for early detection of driver maneuvers. The implications of this study would allow for better behavior prediction, and therefore the development of more efficient advanced driver assistance and warning systems. Cues are extracted from an array of sensors observing the driver (head, hand, and foot), the environment (lane and surrounding vehicles), and the ego-vehicle state (speed, steering angle, etc.). Evaluation is performed on a real-world dataset with overtaking maneuvers, showing promising results. In order to gain better insight into the processes that characterize driver behavior, temporally discriminative cues are studied and visualized.
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for...
详细信息
ISBN:
(纸本)9780769549903
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for computervision problems because they promote smooth surfaces where points are represented as subspaces. In this paper we propose Grassmannian Sparse Representations (GSR), a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss L1-norm minimization for optimal classification. We further introduce a new descriptor that we term Motion Depth Surface (MDS) and compare its classification performance against the traditional Motion History Image (MHI) descriptor. We demonstrate the effectiveness of GSR on computationally intensive 3D action sequences from the Microsoft Research 3D-Action and 3D-Gesture datasets.
We address the task of articulated pose estimation from video sequences. We consider an interactive setting where the initial pose is annotated in the first frame. Our system synthesizes a large number of hypothetical...
详细信息
ISBN:
(纸本)9781467367592
We address the task of articulated pose estimation from video sequences. We consider an interactive setting where the initial pose is annotated in the first frame. Our system synthesizes a large number of hypothetical scenes with different poses and camera positions by applying geometric deformations to the first frame. We use these synthetic images to generate a custom labeled training set for the video in question. This training data is then used to learn a regressor (for future frames) that predicts joint locations from image data. Notably, our training set is so accurate that nearest-neighbor (NN) matching on low-resolution pixel features works well. As such, we name our underlying representation "tiny synthetic videos". We present quantitative results the Friends benchmark dataset that suggests our simple approach matches or exceed state-of-the-art.
Shape completion is the problem of completing partial input shapes such as partial scans. This problem finds important applications in computervision and robotics due to issues such as occlusion or sparsity in real-w...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Shape completion is the problem of completing partial input shapes such as partial scans. This problem finds important applications in computervision and robotics due to issues such as occlusion or sparsity in real-world data. However, most of the existing research related to shape completion has been focused on completing shapes by learning a one-to-one mapping which limits the diversity and creativity of the produced results. We propose a novel multimodal shape completion technique that is effectively able to learn a one-to-many mapping and generates diverse complete shapes. Our approach is based on the conditional Implicit Maximum Likelihood Estimation (IMLE) technique wherein we condition our inputs on partial 3D point clouds. We extensively evaluate our approach by comparing it to various baselines both quantitatively and qualitatively. We show that our method is superior to alternatives in terms of completeness and diversity of shapes.
暂无评论