In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in cont...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
Neural networks are used for many real world applications, but often they have problems estimating their own confidence. This is particularly problematic for computervision applications aimed at making high stakes de...
详细信息
ISBN:
(纸本)9781665448994
Neural networks are used for many real world applications, but often they have problems estimating their own confidence. This is particularly problematic for computervision applications aimed at making high stakes decisions with humans and their lives. In this paper we make a meta-analysis of the literature, showing that most if not all computervision applications do not use proper epistemic uncertainty quantification, which means that these models ignore their own limitations. We describe the consequences of using models without proper uncertainty quantification, and motivate the community to adopt versions of the models they use that have proper calibrated epistemic uncertainty, in order to enable out of distribution detection. We close the paper with a summary of challenges on estimating uncertainty for computervision applications and recommendations.
Traditional empirical risk minimization (ERM) for semantic segmentation can disproportionately advantage or disadvantage certain target classes in favor of an (unfair but) improved overall performance. Inspired by the...
详细信息
ISBN:
(纸本)9781665448994
Traditional empirical risk minimization (ERM) for semantic segmentation can disproportionately advantage or disadvantage certain target classes in favor of an (unfair but) improved overall performance. Inspired by the recently introduced tilted ERM (TERM), we propose tilted cross-entropy (TCE) loss and adapt it to the semantic segmentation setting to minimize performance disparity among target classes and promote fairness. Through quantitative and qualitative performance analyses, we demonstrate that the proposed Stochastic TCE for semantic segmentation can offer improved overall fairness by efficiently minimizing the performance disparity among the target classes of Cityscapes.
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware ...
详细信息
ISBN:
(纸本)9780769549903
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources, as in the case of mobile devices. This demo is based on a novel algorithm for fast and accurate ellipse detection. The proposed algorithm relies on a careful selection of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. The demo will show it working on a commercial smart-phone.
The NTIRE 2021 workshop features a Multi-modal Aerial View Object Classification Challenge. Its focus is on multi-sensor imagery classification in order to improve the performance of automatic target recognition (ATR)...
详细信息
ISBN:
(纸本)9781665448994
The NTIRE 2021 workshop features a Multi-modal Aerial View Object Classification Challenge. Its focus is on multi-sensor imagery classification in order to improve the performance of automatic target recognition (ATR) systems. In this paper we describe our entry in this challenge, a method focused on efficiency and low computational time, while maintaining a high level of accuracy. The method is a convolutional neural network with 11 convolutions, 1 max pooling layers and 3 residual blocks which has a total of 373.130 parameters. The method ranks 3rd in the Track 2 (SAR+EO) of the challenge.
Lane feature extraction is one of the key computational steps in lane analysis systems. In this paper, we propose a lane feature extraction method, which enables different configurations of embedded solutions that add...
详细信息
ISBN:
(纸本)9780769549903
Lane feature extraction is one of the key computational steps in lane analysis systems. In this paper, we propose a lane feature extraction method, which enables different configurations of embedded solutions that address both accuracy and embedded systems' constraints. The proposed lane feature extraction process is evaluated in detail using real world lane data, to explore its effectiveness for embedded realization and adaptability to varying contextual information like lane types and environmental conditions.
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datas...
详细信息
ISBN:
(纸本)9781665448994
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datasets like Kinetics and HMDB51. We demonstrate that our work is able to achieve a very low error rate while being trained on a much smaller dataset of dark videos. We also explore a variety of training and inference strategies including domain transfer methodologies and also propose a simple but useful frame selection strategy. Our empirical results demonstrate that we beat previously published baseline models by 11%.
Building footprints (BFP) provide useful visual context for users of digital maps when navigating in space. This paper proposes a method for extracting and symbolizing building footprints from satellite imagery using ...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Building footprints (BFP) provide useful visual context for users of digital maps when navigating in space. This paper proposes a method for extracting and symbolizing building footprints from satellite imagery using a convolutional neural network (CNN). The CNN architecture outputs rotated rectangles, providing a symbolized approximation that works well for small buildings. Experiments are conducted on the four cities in the DeepGlobe Challenge dataset (Las Vegas, Paris, Shanghai, Khartoum). Our method performs best on suburbs consisting of individual houses. These experiments show that either large buildings or buildings without clear delineation produce weaker results in terms of precision and recall.
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance of material image classification models. We first collected eye-tracking data from human observers and used it to fine-tune a generative adversarial network for saliency prediction (SalGAN). We then fused the predicted saliency map with material images and fed them to CNN models for material classification. The experiment results show that the classification accuracy is improved than those using original images. This indicates that human's visual cues could benefit computational models as priors.
Action recognition is one of the major challenges of computervision. Several approaches have been proposed using different descriptors and multi-class models. In this paper, we focus on binary ranking models for the ...
详细信息
ISBN:
(纸本)9780769549903
Action recognition is one of the major challenges of computervision. Several approaches have been proposed using different descriptors and multi-class models. In this paper, we focus on binary ranking models for the action recognition problem and address the action recognition as a ranking problem. A binary ranking model is trained for each action and used to recognize the test videos for that action. Binary ranking models are constructed using dense SIFT (DSIFT) descriptors and histogram of oriented gradients / histogram of optical flows (HOG/HOF) descriptors. We show that using ranking models, it is possible to obtain higher recognition accuracies from a baseline that is based on multi-class models on the very recent and challenging benchmark datasets;Human Motion Database (HMDB) and The Action Similarity Labeling (ASLAN).
暂无评论