Tracking multiple targets in a video, based on a finite set of detection hypotheses, is a persistent problem in computervision. A common strategy for tracking is to first select hypotheses spatially and then to link ...
详细信息
ISBN:
(纸本)9781467369640
Tracking multiple targets in a video, based on a finite set of detection hypotheses, is a persistent problem in computervision. A common strategy for tracking is to first select hypotheses spatially and then to link these over time while maintaining disjoint path constraints [14, 15, 24]. In crowded scenes multiple hypotheses will often be similar to each other making selection of optimal links an unnecessary hard optimization problem due to the sequential treatment of space and time. Embracing this observation, we propose to link and cluster plausible detections jointly across space and time. Specifically, we state multi-target tracking as a Minimum Cost Subgraph Multicut Problem. Evidence about pairs of detection hypotheses is incorporated whether the detections are in the same frame, neighboring frames or distant frames. This facilitates long-range re-identification and within-frame clustering. Results for published benchmark sequences demonstrate the superiority of this approach.
We derive a sensitivity analysis for moment invariants of multidimensional distributions, These invariants have many uses in computational systems and have recently been used for illumination-invariant recognition in ...
详细信息
ISBN:
(纸本)0780342364
We derive a sensitivity analysis for moment invariants of multidimensional distributions, These invariants have many uses in computational systems and have recently been used for illumination-invariant recognition in color images. In this context, the sensitivity analysis predicts the response of moment invariants to partial occlusion. Using the results of the sensitivity analysis, we develop a novel surface representation called the invariant profile which captures color distribution and spatial information while remaining invariant to the spectral content of the scene illumination. Unlike previous representations, the recognition of invariant profiles does not require illumination correction. We demonstrate the sensitivity analysis and the use of invariant profiles for recognition with a set of experiments on color images.
Objects in visual scenes come in a rich variety of transformed states. A few classes of transformation have been heavily studied in computervision: mostly simple, parametric changes in color and geometry. However, tr...
详细信息
ISBN:
(纸本)9781467369640
Objects in visual scenes come in a rich variety of transformed states. A few classes of transformation have been heavily studied in computervision: mostly simple, parametric changes in color and geometry. However, transformations in the physical world occur in many more flavors, and they come with semantic meaning: e.g., bending, folding, aging, etc. The transformations an object can undergo tell us about its physical and functional properties. In this paper, we introduce a dataset of objects, scenes, and materials, each of which is found in a variety of transformed states. Given a novel collection of images, we show how to explain the collection in terms of the states and transformations it depicts. Our system works by generalizing across object classes: states and transformations learned on one set of objects are used to interpret the image collection for an entirely new object class.
Multi-task learning is a natural approach for computervision applications that require the simultaneous solution of several distinct but related problems, e.g. object detection, classification, tracking of multiple a...
详细信息
ISBN:
(纸本)9781467369640
Multi-task learning is a natural approach for computervision applications that require the simultaneous solution of several distinct but related problems, e.g. object detection, classification, tracking of multiple agents, or denoising, to name a few. The key idea is that exploring task relatedness (structure) can lead to improved performances. In this paper, we propose and study a novel sparse, non-parametric approach exploiting the theory of Reproducing Kernel Hilbert Spaces for vector-valued functions. We develop a suitable regularization framework which can be formulated as a convex optimization problem, and is provably solvable using an alternating minimization approach. Empirical tests show that the proposed method compares favorably to state of the art techniques and further allows to recover interpretable structures, a problem of interest in its own right.
We consider the NP-hard problem of MAP-inference for graphical models. We propose a polynomial time practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks each la...
详细信息
ISBN:
(纸本)9781467369640
We consider the NP-hard problem of MAP-inference for graphical models. We propose a polynomial time practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks each label in each node of the considered graphical model either as (i) optimal, meaning that it belongs to all optimal solutions of the inference problem;(ii) non-optimal if it provably does not belong to any solution;or (iii) undefined, which means our algorithm can not make a decision regarding the label. Moreover, we prove optimality of our approach: it delivers in a certain sense the largest total number of labels marked as optimal or non-optimal. We demonstrate superiority of our approach on problems from machine learning and computervision benchmarks.
This paper presents a robust technique to detect local deteriorations of old cinematographic films. This method relies on spatio-temporal information and combines two different detectors : a morphological detector whi...
详细信息
ISBN:
(纸本)0780342364
This paper presents a robust technique to detect local deteriorations of old cinematographic films. This method relies on spatio-temporal information and combines two different detectors : a morphological detector which uses spatial properties of deteriorations, and a dynamic detector based on motion estimation techniques. Our deterioration detector has been validated Olt several film sequences and turned out to be a powerful tool for digital film restoration.
We introduce a novel method for using reflectance to identify materials. Reflectance offers a unique signature of the material but is challenging to measure and use for recognizing materials due to its high-dimensiona...
详细信息
ISBN:
(纸本)9781467369640
We introduce a novel method for using reflectance to identify materials. Reflectance offers a unique signature of the material but is challenging to measure and use for recognizing materials due to its high-dimensionality. In this work, one-shot reflectance of a material surface which we refer to as a reflectance disk is capturing using a unique optical camera. The pixel coordinates of these reflectance disks correspond to the surface viewing angles. The reflectance has class-specific stucture and angular gradients computed in this reflectance space reveal the material class. These reflectance disks encode discriminative information for efficient and accurate material recognition. We introduce a framework called reflectance hashing that models the reflectance disks with dictionary learning and binary hashing. We demonstrate the effectiveness of reflectance hashing for material recognition with a number of realworld materials.
Recently, a variety of real-world applications have triggered huge demand for techniques that can extract textual information from natural scenes. Therefore, scene text detection and recognition have become active res...
详细信息
ISBN:
(纸本)9781467369640
Recently, a variety of real-world applications have triggered huge demand for techniques that can extract textual information from natural scenes. Therefore, scene text detection and recognition have become active research topics in computervision. In this work, we investigate the problem of scene text detection from an alternative perspective and propose a novel algorithm for it. Different from traditional methods, which mainly make use of the properties of single characters or strokes, the proposed algorithm exploits the symmetry property of character groups and allows for direct extraction of text lines from natural images. The experiments on the latest ICDAR benchmarks demonstrate that the proposed algorithm achieves state-of-the-art performance. Moreover, compared to conventional approaches, the proposed algorithm shows stronger adaptability to texts in challenging scenarios.
Many visual recognition problems can be approached by counting instances. To determine whether an event is present in a long internet video, one could count how many frames seem to contain the activity. Classifying th...
详细信息
ISBN:
(纸本)9781467369640
Many visual recognition problems can be approached by counting instances. To determine whether an event is present in a long internet video, one could count how many frames seem to contain the activity. Classifying the activity of a group of people can be done by counting the actions of individual people. Encoding these cardinality relationships can reduce sensitivity to clutter, in the form of irrelevant frames or individuals not involved in a group activity. Learned parameters can encode how many instances tend to occur in a class of interest. To this end, this paper develops a powerful and flexible framework to infer any cardinality relation between latent labels in a multi-instance model. Hard or soft cardinality relations can be encoded to tackle diverse levels of ambiguity. Experiments on tasks such as human activity recognition, video event detection, and video summarization demonstrate the effectiveness of using cardinality relations for improving recognition results.
We present a new method for synthesizing novel views of a 3D scene from few model images in full correspondence. The core of this work is the derivation of a tensorial operator that describes the transformation from) ...
详细信息
ISBN:
(纸本)0780342364
We present a new method for synthesizing novel views of a 3D scene from few model images in full correspondence. The core of this work is the derivation of a tensorial operator that describes the transformation from) a given tensor of three views to a novel tensor of a new configuration of three views. BL repeated application of the operator an a seed tensor with a sequence of desired virtual camera positions we obtain a chain of warping functions (tensors) from the set of model images to create the desired virtual views.
暂无评论