The identification and estimation of exercise poses has been a field of ongoing research in computervision. Many deep learning architectures have demonstrated impressive performance, and as a result, much progress ha...
详细信息
In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Vid...
详细信息
ISBN:
(纸本)9781665445092
In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Video models process inputs at one (or few) fixed temporal resolution without any dynamic frame selection. However, we argue that, processing multiple temporal resolutions of the input and doing so dynamically by learning to estimate the importance of each frame can largely improve video representations, specially in the domain of temporal activity localization. To this end we propose (1) 'Grid Pool', a learned temporal downsampling layer to extract coarse features, and, (2) 'Multi-stage Fusion', a spatio-temporal attention mechanism to fitse a fine-grained context with the coarse features. We show that our method outperforms the state-of-the-arts for action detection in public datasets including Charades with a significantly reduced compute and memory footprint.
The inclusion of people with specifics in society has become a big challenge. vision is extremely relevant in the relationship with objects, space, and others, and their absence can negatively affect the quality of li...
详细信息
Vehicle detection, recognition, and distance estimation play a crucial role in computervision, particularly in applications such as intelligent traffic systems. However, existing methods need help in achieving accura...
详细信息
In many computervision classification tasks, class priors at test time often differ from priors on the training set. In the case of such prior shift, classifiers must be adapted correspondingly to maintain close to o...
详细信息
ISBN:
(纸本)9781665409155
In many computervision classification tasks, class priors at test time often differ from priors on the training set. In the case of such prior shift, classifiers must be adapted correspondingly to maintain close to optimal performance. This paper analyzes methods for adaptation of probabilistic classifiers to new priors and for estimating new priors on an unlabeled test set. We propose a novel method to address a known issue of prior estimation methods based on confusion matrices, where inconsistent estimates of decision probabilities and confusion matrices lead to negative values in the estimated priors. Experiments on fine-grained image classification datasets provide insight into the best practice of prior shift estimation and classifier adaptation, and show that the proposed method achieves state-of-the-art results in prior adaptation. Applying the best practice to two tasks with naturally imbalanced priors, learning from webcrawled images and plant species classification, increased the recognition accuracy by 1.1% and 3.4% respectively.
Few-shot object detection (FSOD) aims to learn detectors that can be generalized to novel classes with only a few instances. Unlike previous attempts that exploit meta-learning techniques to facilitate FSOD, this work...
详细信息
ISBN:
(纸本)9781665445092
Few-shot object detection (FSOD) aims to learn detectors that can be generalized to novel classes with only a few instances. Unlike previous attempts that exploit meta-learning techniques to facilitate FSOD, this work tackles the problem from the perspective of sample expansion. To this end, we propose a simple yet effective Transformation Invariant Principle (TIP) that can be flexibly applied to various meta-learning models for boosting the detection performance on novel class objects. Specifically, by introducing consistency regularization on predictions from various transformed images, we augment vanilla FSOD models with the generalization ability to objects perturbed by various transformation, such as occlusion and noise. Importantly, our approach can extend supervised FSOD models to naturally cope with unlabeled data, thus addressing a more practical and challenging semi-supervised FSOD problem. Extensive experiments on PASCAL VOC and MSCOCO datasets demonstrate the effectiveness of our TIP under both of the two FSOD settings.
Engineering sketches form the 2D basis of parametric computer-Aided Design (CAD), the foremost modeling paradigm for manufactured objects. In this paper we tackle the problem of learning based engineering sketch gener...
详细信息
ISBN:
(纸本)9781665448994
Engineering sketches form the 2D basis of parametric computer-Aided Design (CAD), the foremost modeling paradigm for manufactured objects. In this paper we tackle the problem of learning based engineering sketch generation as a first step towards synthesis and composition of parametric CAD models. We propose two generative models, CurveGen and TurtleGen, for engineering sketch generation. Both models generate curve primitives without the need for a sketch constraint solver and explicitly consider topology for downstream use with constraints and 3D CAD modeling operations. We find in our perceptual evaluation using human subjects that both CurveGen and TurtleGen produce more realistic engineering sketches when compared with the current state-of-the-art for engineering sketch generation.
Since their prevalence in World War II, Radar-based systems have provided a strategic advantage for military applications. Since then, radars have merged into everyday commercial products ranging from Automotive senso...
详细信息
ISBN:
(数字)9781665492997
ISBN:
(纸本)9781665492997
Since their prevalence in World War II, Radar-based systems have provided a strategic advantage for military applications. Since then, radars have merged into everyday commercial products ranging from Automotive sensors for adaptive cruise control, to home security systems used to protect one's home. Due to their various use cases for patternrecognition, classification, and computervision tasks, many radar-systems incorporate machine learning models. This paper aims to implement a real-time tracking system comprised of a low-cost transceiver and computervision model. To determine the most optimal setup, the study will compare implementations that include two low-cost transceivers and two different weights from the YOLOv3 algorithm. The comparison will determine the most optimal constraints for the tracking system by measuring system latency, and classification confidence.
Unprocessed RAW data is a highly valuable image format for image editing and computervision. However, since the file size of RAW data is huge, most users can only get access to processed and compressed sRGB images. T...
详细信息
ISBN:
(纸本)9781665445092
Unprocessed RAW data is a highly valuable image format for image editing and computervision. However, since the file size of RAW data is huge, most users can only get access to processed and compressed sRGB images. To bridge this gap, we design an Invertible Image Signal Processing (InvISP) pipeline, which not only enables rendering visually appealing sRGB images but also allows recovering nearly perfect RAW data. Due to our framework's inherent reversibility, we can reconstruct realistic RAW data instead of synthesizing RAW data from sRGB images without any memory overhead. We also integrate a differentiable JPEG compression simulator that empowers our framework to reconstruct RAW data from JPEG images. Extensive quantitative and qualitative experiments on two DSLR demonstrate that our method obtains much higher quality in both rendered sRGB images and reconstructed RAW data than alternative methods.
Recognizing different cattle breeds accurately is important for farmers to prevent accidental crossbreeding and to maintain the purity of the breeds. The goal of this research is to simplify cattle breed classificatio...
详细信息
暂无评论