Combination of Multiple Classifiers (CMC) has recently drawn attention as a method of improving classification accuracy. This paper presents a method for combining classifiers that uses estimates of each individual cl...
详细信息
ISBN:
(纸本)0818672587
Combination of Multiple Classifiers (CMC) has recently drawn attention as a method of improving classification accuracy. This paper presents a method for combining classifiers that uses estimates of each individual classifier's local accuracy in small regions of feature space surrounding an unknown test sample. Only the output of the most locally accurate classifier is considered. We address issues of 1) optimization of individual classifiers, and 2) the effect of varying the sensitivity of the individual classifiers on the CMC algorithm. Our algorithm performs better on data from a real problem in mammogram image analysis than do other recently proposed CMC techniques.
Humans are capable of perceiving a scene at a glance, and obtain deeper understanding with additional time. Similarly, visual recognition deployments should be robust to varying computational budgets. Such situations ...
详细信息
ISBN:
(纸本)9781479951178
Humans are capable of perceiving a scene at a glance, and obtain deeper understanding with additional time. Similarly, visual recognition deployments should be robust to varying computational budgets. Such situations require Anytime recognition ability, which is rarely considered in computervision research. We present a method for learning dynamic policies to optimize Anytime performance in visual architectures. Our model sequentially orders feature computation and performs subsequent classification. Crucially, decisions are made at test time and depend on observed data and intermediate results. We show the applicability of this system to standard problems in scene and object recognition. On suitable datasets, we can incorporate a semantic back-off strategy that gives maximally specific predictions for a desired level of accuracy;this provides a new view on the time course of human visual perception.
We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding...
详细信息
ISBN:
(纸本)9781424469840
We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we demonstrate that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from o...
详细信息
ISBN:
(纸本)9781424469840
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results.
Object detection in aerial images is an active yet challenging task in computervision because of the bird's-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially...
详细信息
ISBN:
(纸本)9781728132938
Object detection in aerial images is an active yet challenging task in computervision because of the bird's-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (Rocs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. In this paper, we propose a RoI Transformer to address these problems. The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations. RoI Transformer is with lightweight and can be easily embedded into detectors for oriented object detection. Simply apply the RoI Transformer to light-head RCNN has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer.
There are at least two situations in practical computervision where displacement of a point in an image is accompanied by a defocus blur. The first is when a camera of limited autofocal capability moves in depth, and...
详细信息
ISBN:
(纸本)0818672587
There are at least two situations in practical computervision where displacement of a point in an image is accompanied by a defocus blur. The first is when a camera of limited autofocal capability moves in depth, and the second is when a limited autofocal camera zooms. Motion and zooming are two popular strategies for acquiring more detail or for acquiring depth. The defocus blur has been considered noise or at best been ignored. However, the defocus blur is in itself a cue to depth, and hence we proceed to show how it can be calculated simultaneously with affine motion. We first introduce the theory, then develop a solution method and finally demonstrate the validity of the theory and the solution by conducting experiments with real scenery.
A method for identifying shape features of local nature on the shapes boundary, in a way that is facilitated by the presence of noise is presented. The boundary is seen as a real function. A study of a certain distanc...
详细信息
ISBN:
(纸本)9781479951178
A method for identifying shape features of local nature on the shapes boundary, in a way that is facilitated by the presence of noise is presented. The boundary is seen as a real function. A study of a certain distance function reveals, almost counter-intuitively, that vertices can be defined and localized better in the presence of noise, thus the concept of noising, as opposed to smoothing, is conceived and presented. The method works on both smooth and noisy shapes, the presence of noise having an effect of improving on the results of the smoothed version. Experiments with noise and a comparison to state of the art validate the method.
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in p...
详细信息
ISBN:
(纸本)9781665445092
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset.
Rotated object detection is a challenging issue in computervision field. Inadequate rotated representation and the confusion of parametric regression have been the bottleneck for high performance rotated detection. I...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Rotated object detection is a challenging issue in computervision field. Inadequate rotated representation and the confusion of parametric regression have been the bottleneck for high performance rotated detection. In this paper, we propose an orientation-sensitive keypoint based rotated detector OSKDet. First, we adopt a set of keypoints to represent the target and predict the keypoint heatmap on ROI to get the rotated box. By proposing the orientation-sensitive heatmap, OSKDet could learn the shape and direction of rotated target implicitly and has stronger modeling capabilities for rotated representation, which improves the localization accuracy and acquires high quality detection results. Second, we explore a new unordered keypoint representation paradigm, which could avoid the confusion of keypoint regression caused by rule based ordering. Furthermore, we propose a localization quality uncertainty module to better predict the classification score by the distribution uncertainty of keypoints heatmap. Experimental results on several public benchmarks show the state-of-the-art performance of OSKDet. Specifically, we achieve an AP of 80.91% on DOTA, 89.98% on HRSC2016, 97.27% on UCAS-AOD, and a F-measure of 92.18% on ICDAR2015, 81.43% on ICDAR2017, respectively.
We frame the problem of local representation of imaging data as the computation of minimal sufficient statistics that are invariant to nuisance variability induced by viewpoint and illumination. We show that, under ve...
详细信息
ISBN:
(纸本)9781467369640
We frame the problem of local representation of imaging data as the computation of minimal sufficient statistics that are invariant to nuisance variability induced by viewpoint and illumination. We show that, under very stringent conditions, these are related to "feature descriptors" commonly used in computervision. Such conditions can be relaxed if multiple views of the same scene are available. We propose a sampling-based and a point-estimate based approximation of such a representation, compared empirically on image-to-(multiple) image matching, for which we introduce a multi-view wide-baseline matching benchmark, consisting of a mixture of real and synthetic objects with ground truth camera motion and dense three-dimensional geometry.
暂无评论