Object tracking is one of the most important components in numerous applications of computervision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importan...
详细信息
ISBN:
(纸本)9780769549897
Object tracking is one of the most important components in numerous applications of computervision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
Potts energy frequently occurs in computervision applications. We present an efficient parallel method for optimizing Potts energy based on the extension of hierarchical fusion algorithm. Unlike previous parallel gra...
详细信息
ISBN:
(纸本)9781467369640
Potts energy frequently occurs in computervision applications. We present an efficient parallel method for optimizing Potts energy based on the extension of hierarchical fusion algorithm. Unlike previous parallel graph-cut based optimization algorithms, our approach has optimality bounds even after a single iteration over all labels, i.e. after solving only k-1 max-flow problems, where k is the number of labels. This is perhaps the minimum number of max-flow problems one has to solve to obtain a solution with optimality guarantees. Our approximation factor is O(log(2) k). Although this is not as good as the factor of 2 approximation of the well known expansion algorithm, we achieve very good results in practice. In particular, we found that the results of our algorithm after one iteration are always better than the results after one iteration of the expansion algorithm. We demonstrate experimentally the computational advantages of our parallel implementation on the problem of stereo correspondence, achieving a factor of 1.5 to 2.6 speedup compared to the serial implementation. These results were obtained with a small number of processors. The expected speedups with a larger number of processors are greater.
We have designed and implemented a real-time binocular tracking system which uses two independent cues commonly found in the primary functions of biological visual systems to robustly track moving targets in complex e...
详细信息
ISBN:
(纸本)0780342364
We have designed and implemented a real-time binocular tracking system which uses two independent cues commonly found in the primary functions of biological visual systems to robustly track moving targets in complex environments, without a-priori knowledge of the target shape or texture: a fast optical flow segmentation algorithm quickly locates independently moving objects for target acquisition and provides a reliable velocity estimate for smooth tracking. In parallel, target position is generated from the output of a zero-disparity filter where a phase-based disparity estimation technique allows dynamic control of the camera vergence to adapt the horopter geometry to the target location. The system takes advantage of the optical properties of our custom-designed foveated wide-angle lenses, which exhibit a wide field of view along with a high resolution fovea. Methods to cope with the distortions introduced by the space-variant resolution, and a robust real-time implementation on a high performance active vision head are presented.
Mode-seeking has been widely used as a powerful data analysis technique for clustering and filtering in a metric feature space. We introduce a versatile and efficient mode-seeking method for "graph" represen...
详细信息
ISBN:
(纸本)9781467312288
Mode-seeking has been widely used as a powerful data analysis technique for clustering and filtering in a metric feature space. We introduce a versatile and efficient mode-seeking method for "graph" representation where general embedding of relational data is possible beyond metric spaces. Exploiting the global structure of the graph by random walks, our method intrinsically combines mode-seeking with ranking on the graph, and performs robust analysis by seeking high-ranked authoritative data and suppressing low-ranked noise and outliers. This enables mode-seeking to be applied to a large class of challenging real-world problems involving graph representation which frequently arises in computervision. We demonstrate our method on various synthetic experiments and real applications dealing with noisy and complex data such as scene summarization and object-based image matching.
Driven by the wide range of applications, scene text detection and recognition have become active research topics in computervision. Though extensively studied, localizing and reading text in uncontrolled environment...
详细信息
ISBN:
(纸本)9781479951178
Driven by the wide range of applications, scene text detection and recognition have become active research topics in computervision. Though extensively studied, localizing and reading text in uncontrolled environments remain extremely challenging, due to various interference factors. In this paper, we propose a novel multi-scale representation for scene text recognition. This representation consists of a set of detectable primitives, termed as strokelets, which capture the essential substructures of characters at different granularities. Strokelets possess four distinctive advantages: (1) Usability: automatically learned from bounding box labels;(2) Robustness: insensitive to interference factors;(3) Generality: applicable to variant languages;and (4) Expressivity: effective at describing characters. Extensive experiments on standard benchmarks verify the advantages of strokelets and demonstrate the effectiveness of the proposed algorithm for text recognition.
We propose a real-time 3D model-based method that continuously recognizes dimensional emotions from facial expressions in natural communications. In our method, 3D facial models are restored from 2D images, which prov...
详细信息
ISBN:
(纸本)9781467369640
We propose a real-time 3D model-based method that continuously recognizes dimensional emotions from facial expressions in natural communications. In our method, 3D facial models are restored from 2D images, which provide crucial clues for the enhancement of robustness to overcome large changes including out-of-plane head rotations, fast head motions and partial facial occlusions. To accurately recognize the emotion, a novel random forest-based algorithm which simultaneously integrates two regressions for 3D facial tracking and continuous emotion estimation is constructed. Moreover, via the reconstructed 3D facial model, temporal information and user-independent emotion presentations are also taken into account through our image fusion process. The experimental results show that our algorithm can achieve state-of-the-art result with higher Pearson's correlation coefficient of continuous emotion recognition in real time.
Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding ...
详细信息
ISBN:
(纸本)9798350353006
Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding sequence-to-sequence vision-language model, trained with a Query-CTC loss, that marginalizes over multiple inference paths in the decoder. This allows us to model the joint distribution of tokens, rather than restricting to conditional distribution as in an autoregressive model. The resulting model, NARVL, achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time, reducing from the linear complexity associated with the sequential generation of tokens to a paradigm of constant time joint inference.
The vast majority of corner and edge detectors measure image intensity gradients in order to estimate the positions and strengths of features. However, many of the most popular intensity gradient estimators are inhere...
详细信息
ISBN:
(纸本)0818672587
The vast majority of corner and edge detectors measure image intensity gradients in order to estimate the positions and strengths of features. However, many of the most popular intensity gradient estimators are inherently and significantly anisotropic. In spite of this, few algorithms take the anisotropy into account, and so the set of features uncovered is typically sensitive to rotations of the image, compromising recognition, matching (e.g. stereo), and tracking. We introduce an effective technique for removing unwanted anisotropies from analytical gradient estimates, by measuring local intensity gradients in four directions rather than the more traditional two. In experiments using real image data, our algorithm reduces the gradient anisotropy associated with conventional analytical gradient estimates by up to 85%, yielding more consistent feature topologies.
In this paper, a novel cultural event classification algorithm based on convolutional neural networks is proposed. The proposed method firstly extracts regions that contain meaningful information.. Then, convolutional...
详细信息
ISBN:
(纸本)9781467367592
In this paper, a novel cultural event classification algorithm based on convolutional neural networks is proposed. The proposed method firstly extracts regions that contain meaningful information.. Then, convolutional neural networks are trained to classify the extracted regions. The final classification of a scene is performed by combining the classification results of each extracted region of the scene probabilistically. Compared to the state-of-the-art methods for classifying Chalearn Looking at People cultural event recognition database, the proposed methods shows competitive results.
This paper introduces a method for learning to generate line drawings from 3D models. Our architecture incorporates a differentiable module operating on geometric features of the 3D model, and an image-based module op...
详细信息
ISBN:
(纸本)9781728171685
This paper introduces a method for learning to generate line drawings from 3D models. Our architecture incorporates a differentiable module operating on geometric features of the 3D model, and an image-based module operating on view-based shape representations. At test time, geometric and view-based reasoning are combined with the help of a neural module to create a line drawing. The model is trained on a large number of crowdsourced comparisons of line drawings. Experiments demonstrate that our method achieves significant improvements in line drawing over the state-of-the-art when evaluated on standard benchmarks, resulting in drawings that are comparable to those produced by experienced human artists.
暂无评论