The ability to normalize pose based on super-category landmarks can significantly improve models of individual categories when training data are limited. Previous methods have considered the use of volumetric or morph...
详细信息
ISBN:
(纸本)9781467312288
The ability to normalize pose based on super-category landmarks can significantly improve models of individual categories when training data are limited. Previous methods have considered the use of volumetric or morphable models for faces and for certain classes of articulated objects. We consider methods which impose fewer representational assumptions on categories of interest, and exploit contemporary detection schemes which consider the ensemble of responses of detectors trained for specific pose-keypoint configurations. We develop representations for poselet-based pose normalization using both explicit warping and implicit pooling as mechanisms. Our method defines a pose normalized similarity or kernel function that is suitable for nearest-neighbor or kernel-based learning methods.
We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection co...
详细信息
ISBN:
(纸本)9780769549897
We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection confidence scores and pairwise overlap constraints to determine which overlapping detections should be suppressed, and which should be kept. The framework is flexible enough to handle the problem of detecting objects as a shape covering of a foreground mask, and to handle the problem of filtering confidence weighted detections produced by a traditional sliding window object detector. In our experiments, we show that our method outperforms two existing state-of-the-art pedestrian detectors.
Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171685
Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two forms of self-attention. One is pairwise self-attention, which generalizes standard dot-product attention and is fundamentally a set operator. The other is patchwise self-attention, which is strictly more powerful than convolution. Our pairwise self-attention networks match or outperform their convolutional counterparts, and the patchwise models substantially outperform the convolutional baselines. We also conduct experiments that probe the robustness of learned representations and conclude that self-attention networks may have significant benefits in terms of robustness and generalization.
We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new fe...
详细信息
ISBN:
(纸本)9780769549897
We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors' ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
Point sets are the standard output of many 3D scanning systems and depth cameras. Presenting the set of points as is, might "hide" the prominent features of the object from which the points are sampled. Our ...
详细信息
ISBN:
(纸本)9780769549897
Point sets are the standard output of many 3D scanning systems and depth cameras. Presenting the set of points as is, might "hide" the prominent features of the object from which the points are sampled. Our goal is to reduce the number of points in a point set, for improving the visual comprehension from a given viewpoint. This is done by controlling the density of the reduced point set, so as to create bright regions (low density) and dark regions (high density), producing an effect of shading. This data reduction is achieved by leveraging a limitation of a solution to the classical problem of determining visibility from a viewpoint. In addition, we introduce a new dual problem, for determining visibility of a point from infinity, and show how a limitation of its solution can be leveraged in a similar way.
We present the Incremental Focus of Attention (IFA) architecture for adding robustness to software-based, real-time, motion trackers. The framework provides a structure which, when given the entire camera image to sea...
详细信息
ISBN:
(纸本)0818672587
We present the Incremental Focus of Attention (IFA) architecture for adding robustness to software-based, real-time, motion trackers. The framework provides a structure which, when given the entire camera image to search, efficiently focuses the attention of the system into a narrow set of possible states that includes the target state. IFA offers a means for automatic tracking initialization and reinitialization when environmental conditions momentarily deteriorate and cause the system to lose track of its target. Systems based on the framework degrade gracefully as various assumptions about the environment are violated. In particular, multiple tracking algorithms are layered so that the failure of a single algorithm causes another algorithm of less precision to take over, thereby allowing the system to return approximate feature state information.
Since its inception in 2015, Style Transfer has focused on texturing a content image using an art exemplar. Recently, the geometric changes that artists make have been acknowledged as an important component of style [...
详细信息
ISBN:
(纸本)9781665445092
Since its inception in 2015, Style Transfer has focused on texturing a content image using an art exemplar. Recently, the geometric changes that artists make have been acknowledged as an important component of style [42, 55, 62, 63]. Our contribution is to propose a neural network that, uniquely, learns a mapping from a 4D array of inter-feature distances to a non parametric 2D warp field. The system is generic in not being limited by semantic class, a single learned model will suffice;all examples in this paper are output from one model. Our approach combines the benefits of the high speed of Liu et al. [42] with the non-parametric warping of Kim et al. [55]. Furthermore, our system extends the normal NST paradigm: although it can be used with a single exemplar, we also allow two style exemplars: one for texture and another for geometry. This supports far greater flexibility in use cases than single exemplars can provide.
In this paper, we examine gradients of logits of image classification CNNs by input pixel values. We observe that these fluctuate considerably with training randomness, such as the random initialization of the network...
详细信息
ISBN:
(纸本)9798350301298
In this paper, we examine gradients of logits of image classification CNNs by input pixel values. We observe that these fluctuate considerably with training randomness, such as the random initialization of the networks. We extend our study to gradients of intermediate layers, obtained via GradCAM, as well as popular network saliency estimators such as DeepLIFT, SHAP, LIME, Integrated Gradients, and SmoothGrad. While empirical noise levels vary, qualitatively different attributions to image features are still possible with all of these, which comes with implications for interpreting such attributions, in particular when seeking data-driven explanations of the phenomenon generating the data. Finally, we demonstrate that the observed artefacts can be removed by marginalization over the initialization distribution by simple stochastic integration.
This paper presents a trainable object detection architecture that is applied to detecting people in static images of cluttered scenes. This problem poses several challenges. People are highly non-rigid objects with a...
详细信息
ISBN:
(纸本)0780342364
This paper presents a trainable object detection architecture that is applied to detecting people in static images of cluttered scenes. This problem poses several challenges. People are highly non-rigid objects with a high degree of variability in size, shape, color, and texture. Unlike previous approaches, this system learns from examples and does not rely on any a priori (handcrafted) models or on motion. The detection technique is based on the novel idea of the wavelet template that defines the shape of an object in terms of a subset of the wavelet coefficients of the image. It is invariant to changes in color and texture and can be used to robustly define a rich and complex class of objects such as people. We show how the invariant properties and computational efficiency of the wavelet template make it an effective tool for object detection.
The design of robust classifiers, which can contend with the noisy and outlier ridden datasets typical of computervision, is studied. It is argued that such robustness requires loss functions that penalize both large...
详细信息
ISBN:
(纸本)9781424469840
The design of robust classifiers, which can contend with the noisy and outlier ridden datasets typical of computervision, is studied. It is argued that such robustness requires loss functions that penalize both large positive and negative margins. The probability elicitation view of classifier design is adopted, and a set of necessary conditions for the design of such losses is identified. These conditions are used to derive a novel robust Bayes-consistent loss, denoted Tangent loss, and an associated boosting algorithm, denoted TangentBoost. Experiments with data from the computervision problems of scene classification, object tracking, and multiple instance learning show that TangentBoost consistently outperforms previous boosting algorithms.
暂无评论