The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translatio...
详细信息
ISBN:
(纸本)9781665487399
The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translations and image rotations. It is experimentally shown that this boost is obtained without reducing performance on ordinary illumination and viewpoint matching sequences.
This work analyzes the problem of homography estimation for robust target matching in the context of real-time mobile vision. We present a device-friendly implementation of the Gaussian Elimination algorithm and show ...
详细信息
ISBN:
(纸本)9781479943098
This work analyzes the problem of homography estimation for robust target matching in the context of real-time mobile vision. We present a device-friendly implementation of the Gaussian Elimination algorithm and show that our optimized approach can significantly improve the homography estimation step in a hypothesize-and-verify scheme. Experiments are performed on image sequences in which both speed and accuracy are evaluated and compared with conventional homography estimation schemes.
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware ...
详细信息
ISBN:
(纸本)9780769549903
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources, as in the case of mobile devices. This demo is based on a novel algorithm for fast and accurate ellipse detection. The proposed algorithm relies on a careful selection of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. The demo will show it working on a commercial smart-phone.
Human-object interaction (HOI) detection is a core task in computervision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a tuple leads to a long-tailed vi...
详细信息
ISBN:
(纸本)9781728193601
Human-object interaction (HOI) detection is a core task in computervision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a tuple leads to a long-tailed visual recognition challenge since many combinations are rarely represented. The performance of the proposed models is limited especially for the tail categories, but little has been done to understand the reason. To that end, in this paper, we propose to diagnose rarity in HOI detection. We propose a three-step strategy, namely Detection, Identification and recognition where we carefully analyse the limiting factors by studying state-of-the-art models. Our findings indicate that detection and identification steps are altered by the interaction signals like occlusion and relative location, as a result limiting the recognition accuracy.
Surveillance system involving hundreds of cameras becomes very popular. Due to various positions and orientations of camera, object appearance changes dramatically in different scenes. Traditional appearance based obj...
详细信息
ISBN:
(纸本)9781424439942
Surveillance system involving hundreds of cameras becomes very popular. Due to various positions and orientations of camera, object appearance changes dramatically in different scenes. Traditional appearance based object classification methods tend to fail under these situations. We approach the problem by designing an adaptive object classification framevvork which automatically adjust to different scenes. Firstly, a baseline object classifier is applied to specific scene, generating training samples with extracted scene-specific features (such as object position). Based on that, bilateral weighted LDA is trained under the guide of sample confidence. Moreover we propose a bayesian classifier based method to detect and remove outliers to cope with contingent generalization disaster resulted from utilizing high confidence but incorrectly classified training samples. To validate these ideas, we realize the framework into an intelligent surveillance system. Experimental results demonstrate the effectiveness of this adaptive object classification framework.
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Material recognition is researched in both computervision and vision science fields. In this paper, we investigated how humans observe material images and found the eye fixation information improves the performance of material image classification models. We first collected eye-tracking data from human observers and used it to fine-tune a generative adversarial network for saliency prediction (SalGAN). We then fused the predicted saliency map with material images and fed them to CNN models for material classification. The experiment results show that the classification accuracy is improved than those using original images. This indicates that human's visual cues could benefit computational models as priors.
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to c...
详细信息
ISBN:
(纸本)9781538607336
In this paper we discuss and analyze possible futures for technologies in the field of computervision (CV). Using a method we have coined speculative analysis we take a broad look at research trends in the field to categorize risks, analyze which ones are most threatening and likely, and ultimately summarize conclusions for how the field may attempt to stem future harms caused by CV technologies. We develop narrative case studies to provoke dialogue and deeply explore possible risk scenarios we found to be most probable and severe. We arrive at the position that there are serious potentials for CV to cause discriminatory harm and exacerbate cybersecurity issues.
Recent research has shown that faces can be obfuscated in large-scale datasets with a minimal performance impact on image classification and downstream tasks like object recognition. In this paper, we investigate the ...
详细信息
ISBN:
(纸本)9781665448994
Recent research has shown that faces can be obfuscated in large-scale datasets with a minimal performance impact on image classification and downstream tasks like object recognition. In this paper, we investigate the role of face obfuscation in video classification datasets and quantify a more significant reduction in performance caused by face blurring. To reduce such performance effects, we propose a generalized distillation approach in which a privacy-preserving action recognition network is trained with privileged information given by face identities. We show, through experiments performed on Kinetics-400, that the proposed approach can fully close the performance gap caused by face anonymization.
We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding ...
详细信息
ISBN:
(纸本)9781424439942
We investigate the problem of recognizing words from video, fingerspelled using the British Sign Language (BSL) fingerspelling alphabet. This is a challenging task since the BSL alphabet involves both hands occluding each other and contains signs which are ambiguous from the observer's viewpoint. The main contributions of our work include: (i) recognition based on hand shape alone, not requiring motion cues;(ii) robust visual features for hand shape recognition;(iii) scalability to large lexicon recognition with no re-training. We report results on a dataset of 1,000 low quality web-cam videos of 100 words. The proposed method achieves a word recognition accuracy of 98.9%.
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datas...
详细信息
ISBN:
(纸本)9781665448994
We present an approach to perform supervised action recognition in the dark. In this work, we present our results on the ARID dataset[60]. Most previous works only evaluate performance on large, well illuminated datasets like Kinetics and HMDB51. We demonstrate that our work is able to achieve a very low error rate while being trained on a much smaller dataset of dark videos. We also explore a variety of training and inference strategies including domain transfer methodologies and also propose a simple but useful frame selection strategy. Our empirical results demonstrate that we beat previously published baseline models by 11%.
暂无评论