This paper presents a framework for building VideoPlace-like vision-driven user interface using ldquooptical flowldquo measurements and elastic labeled silhouette. The optical flow not only detects the movements but a...
详细信息
This paper presents a framework for building VideoPlace-like vision-driven user interface using ldquooptical flowldquo measurements and elastic labeled silhouette. The optical flow not only detects the movements but also gives us an estimate of the direction and the speed of the movement. The proposed representation is based on a self-organizing system designed to learn to recognize both the characteristic features of the image and their spatial relationship without needs of initializations or special settings. The positions of the units composing the system allow extracting information about the position and the dynamics of the observed figure. Reported results show how it is possible to identify the skeleton (legs and torso) of the walking subject using four units. It can be observed that the low-resolution skeleton formed by the four units correctly tracks the walking pattern of the two legs, while the upper segment remains centered on the subject body.
The paper presents a compact vision system for efficient contours extraction in high-speed applications. By exploiting the ultra high temporal resolution and the sparse representation of the sensors data in reacting t...
详细信息
ISBN:
(纸本)9781424423392
The paper presents a compact vision system for efficient contours extraction in high-speed applications. By exploiting the ultra high temporal resolution and the sparse representation of the sensors data in reacting to scene dynamics, the system fosters efficient embedded computervision for ultra high-speed applications. The results reported in this paper show the sensor output quality for a wide range of object velocity (5-40 m/s), and demonstrate the object data volume independence from the velocity as well as the steadiness of the object quality. The influence of object velocity on high-performance embedded computervision is also discussed.
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are ...
详细信息
ISBN:
(纸本)9781424423392
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are produced quickly. We describe results for several different annotation problems. We describe some strategies for determining when the task is well specified and properly priced.
Tensor Voting is a robust technique to extract low-level features in noisy images. The approach achieves its robustness by exploiting coherent orientations in local neighborhoods. In this paper we propose an efficient...
详细信息
ISBN:
(纸本)9781424423392
Tensor Voting is a robust technique to extract low-level features in noisy images. The approach achieves its robustness by exploiting coherent orientations in local neighborhoods. In this paper we propose an efficient algorithm for dense Tensor Voting in 3D which makes use of steerable filters. Therefore, we propose steerable expansions of spherical tensor fields in terms of tensorial harmonics, which are their canonical representation. In this way it is possible to perform arbitrary rank Tensor Voting by linear combinations of convolutions in an efficient way.
This paper describes an in-depth investigation and implementation of interleaved memory for pixel lookup operations in computervision. Pixel lookup, mapping between coordinates and pixels, is a common operation in co...
详细信息
ISBN:
(纸本)9781424423392
This paper describes an in-depth investigation and implementation of interleaved memory for pixel lookup operations in computervision. Pixel lookup, mapping between coordinates and pixels, is a common operation in computervision, but is also a potential bottleneck due to formidable bandwidth requirements for real-time operation. We focus on the acceleration of pixel lookup operations through parallelizing memory banks by interleaving. The key to applying interleaving for pixel lookup is 2D block data partitioning and support for unaligned access. With this optimization of interleaving, pixel lookup operations can output a block of pixels at once without major overhead for unaligned access. An example implementation of our optimized interleaved memory for affine motion tracking shows that the pixel lookup operations can achieve 12.8 Gbps for random lookup of a 4x4 size block, of 8-bit pixels under 100 MHz operation. Interleaving can be a cost-effective solution for fast pixel lookup in embedded computervision.
Partial matching is probably one of the most challenging problems in nonrigid shape analysis. The problem consists of matching similar parts of shapes that are dissimilar on the whole and can assume different forms by...
详细信息
ISBN:
(纸本)9781424423392
Partial matching is probably one of the most challenging problems in nonrigid shape analysis. The problem consists of matching similar parts of shapes that are dissimilar on the whole and can assume different forms by undergoing nonrigid deformations. Conceptually, two shapes can be considered partially matching if they have significant similar parts, with the simplest definition of significance being the size of the parts. Thus, partial matching can be defined as a multcriterion optimization problem trying to simultaneously maximize the similarity and the size of these parts. In this paper, we propose a different definition of significance, taking into account the regularity of parts besides their size. The regularity term proposed here is similar to the spirit of the Mumford-Shah functional. Numerical experiments show that the regularized partial matching produces semantically better results compared to the non-regularized one.
In this paper, we examine the problem of internet video categorization. Specifically, we explore the representation of a video as a "bag of words" using various combinations of spatial and temporal descripto...
详细信息
ISBN:
(纸本)9781424423392
In this paper, we examine the problem of internet video categorization. Specifically, we explore the representation of a video as a "bag of words" using various combinations of spatial and temporal descriptors. The descriptors incorporate both spatial and temporal gradients as well as optical flow information. We achieve state-of-the-art results on a standard human activity recognition database and demonstrate promising category recognition performance on two new databases of approximately 1000 and 1500 online user-submitted videos, which we will be making available to the community.
We give a brief discussion of denoising algorithms for depth data and introduce a novel technique based oil the NL-Means Filter A unified approach is presented that removes outliers from depth data and accordingly ach...
详细信息
ISBN:
(纸本)9781424423392
We give a brief discussion of denoising algorithms for depth data and introduce a novel technique based oil the NL-Means Filter A unified approach is presented that removes outliers from depth data and accordingly achieves all unbiased smoothing result. This robust denoising algorithm takes intra-patch similarity and optional color information into account in order to handle strong discontinuities and to preserve fine detail structure in the data. We achieve fast computation times with a GPU-based implementation. Results using data from a time-of-flight camera system show a significant gain in visual quality.
Inferring the 3D spatial layout from a single 2D image is a fundamental visual task. We formulate it as a grouping problem where edges are grouped into lines, quadrilaterals, and finally depth-ordered planes. We demon...
详细信息
ISBN:
(纸本)9781424423392
Inferring the 3D spatial layout from a single 2D image is a fundamental visual task. We formulate it as a grouping problem where edges are grouped into lines, quadrilaterals, and finally depth-ordered planes. We demonstrate that the 3D structure of planar objects in indoor scenes can be fast and accurately inferred without any learning or indexing.
The combination of biometric matching scores can be enhanced by, taking into account the matching scores related to all enrolled persons in addition to traditional combinations utilizing only;matching scores related t...
详细信息
ISBN:
(纸本)9781424423392
The combination of biometric matching scores can be enhanced by, taking into account the matching scores related to all enrolled persons in addition to traditional combinations utilizing only;matching scores related to a single person. Identification models take into account the dependence between matching scores assigned to different persons and can be used for such enhancement. In this paper we compare the use of two such models - T-normalization and second best score model. The comparison is performed using two combination algorithms - likelihood ratio and multilayer perceptron. The results show, that while second best score model delivers better performance improvement than T-normalization, two models are complementary to each other and can be used together for further improvements.
暂无评论