Despite many successful applications of robust statistics, they have yet to be completely adapted to many computervision problems. Range reconstruction, particularly in unstructured environments, requires a robust es...
详细信息
ISBN:
(纸本)0818672587
Despite many successful applications of robust statistics, they have yet to be completely adapted to many computervision problems. Range reconstruction, particularly in unstructured environments, requires a robust estimator that not only tolerates a large outlier percentage but also tolerates several discontinuities, extracting multiple surfaces in an image region. Observing that random outliers and/or points from across discontinuities increase a hypothesized fit's scale estimate (standard deviation of the noise), our new operator, called MUSE (Minimum Unbiased Scale Estimator), evaluates a hypothesized fit over potential inlier sets via an objective function of unbiased scale estimates. MUSE extracts the single best fit from the data by minimizing its objective function over a set of hypothesized fits and can sequentially extract multiple surfaces from an image region. We show MUSE to be effective on synthetic data modelling small scale discontinuities and in preliminary experiments on complicated range data.
We describe a monocular real-time computervision system that identifies shopping groups by detecting and tracking multiple people as they wait in a checkout line or service counter. Our system segments each frame int...
详细信息
ISBN:
(纸本)0769512720
We describe a monocular real-time computervision system that identifies shopping groups by detecting and tracking multiple people as they wait in a checkout line or service counter. Our system segments each frame into foreground regions which contains multiple people. Foreground regions are further segmented into individuals using a temporal segmentation of foreground and motion cues. Once a person is detected, an appearance model based on color and edge density in conjunction with a mean-shift tracker is used to recover the person's trajectory. People are grouped together as a shopping group by analyzing interbody distances. The system also monitors the cashier's activities to determine when shopping transactions start and end. Experimental results demonstrate the robustness and real-time performance of the algorithm.
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware ...
详细信息
ISBN:
(纸本)9780769549903
Several papers addressed ellipse detection as a first step for several computervision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources, as in the case of mobile devices. This demo is based on a novel algorithm for fast and accurate ellipse detection. The proposed algorithm relies on a careful selection of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. The demo will show it working on a commercial smart-phone.
Focus of attention mechanisms for robot vision are discussed. A new method for neglecting low level filter responses from already modelled structures is presented. The method is based on a filtering technique termed n...
详细信息
ISBN:
(纸本)0818672587
Focus of attention mechanisms for robot vision are discussed. A new method for neglecting low level filter responses from already modelled structures is presented. The method is based on a filtering technique termed normalized convolution. In one experiment, the robot is continuously moving its arm in the scene while tracking other objects. It is shown how the arm can be made 'invisible' so that only the moving object of interest is detected. This makes tracking of objects much simpler. In another experiment, the attention of the system is shifted between objects by simply cancelling the mask of the object to be attended to. With this strategy the low level processes do not need to know the difference between a new object entering the scene and a mask being cancelled, and thus a complex communication structure between high and low levels is avoided.
The paper presents a compact vision system for efficient contours extraction in high-speed applications. By exploiting the ultra high temporal resolution and the sparse representation of the sensors data in reacting t...
详细信息
ISBN:
(纸本)9781424423392
The paper presents a compact vision system for efficient contours extraction in high-speed applications. By exploiting the ultra high temporal resolution and the sparse representation of the sensors data in reacting to scene dynamics, the system fosters efficient embedded computervision for ultra high-speed applications. The results reported in this paper show the sensor output quality for a wide range of object velocity (5-40 m/s), and demonstrate the object data volume independence from the velocity as well as the steadiness of the object quality. The influence of object velocity on high-performance embedded computervision is also discussed.
This paper presents a sensing approach where phototransduction, multi-resolution feature extraction, scale-space integration and edge tracking are performed on a mixed (digital-analog) VLSI architecture in order to ge...
详细信息
ISBN:
(纸本)0818658258
This paper presents a sensing approach where phototransduction, multi-resolution feature extraction, scale-space integration and edge tracking are performed on a mixed (digital-analog) VLSI architecture in order to generate medium-level scene description. The proposed system is mainly targeted for robot vision applications where feature description is preferred to a set of raw or raster 2D images and edge maps. The Multiport Access photo-Receptor (MAR) is a CMOS sensor and represents the main sensory part of this integrated image acquisition system. VLSI also provides means to integrate analog computing, digital controller and DSP co-processor modules which define a powerful sensory chip set for focal plane image processing. A current version of the MAR sensor which implements 256 × 256 pixels includes 16 analog spatial filters which simultaneously compute multiresolution edge maps. This unique 2D hexagonal smart sensor approach which performs up to 8.5 × 109 arithmetic Op/sec during the acquisition/filtering phase and 25 × 109 Logical Op/sec for scale-space integration allows high resolution image capability. It represents a significant improvement for passive sensory units in a compact assembly for computervision applications.
Human-object interaction (HOI) detection is a core task in computervision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a tuple leads to a long-tailed vi...
详细信息
ISBN:
(纸本)9781728193601
Human-object interaction (HOI) detection is a core task in computervision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a tuple leads to a long-tailed visual recognition challenge since many combinations are rarely represented. The performance of the proposed models is limited especially for the tail categories, but little has been done to understand the reason. To that end, in this paper, we propose to diagnose rarity in HOI detection. We propose a three-step strategy, namely Detection, Identification and recognition where we carefully analyse the limiting factors by studying state-of-the-art models. Our findings indicate that detection and identification steps are altered by the interaction signals like occlusion and relative location, as a result limiting the recognition accuracy.
Automatic target recognition (ATR) applications require simultaneously a wide field of view (FOV) for better detection and situation awareness, high resolution for target recognition and threat assessment, and high fr...
详细信息
ISBN:
(纸本)0818672587
Automatic target recognition (ATR) applications require simultaneously a wide field of view (FOV) for better detection and situation awareness, high resolution for target recognition and threat assessment, and high frame rate for detecting brief events and disambiguating frame-to-frame correlation. Uniformly sampling the entire FOV at recognition resolution is simply wasteful in ATR scenarios with localized regions of interest (ROIs). Foveal data acquisition with space-variant sampling and context-sensitive sensor articulation is highly optimized for active ATR applications. We propose a multiscale local Zernike filter-based front end target detection technique for a commercially feasible foveal sensor topology with piecewise constant resolution profile. Anisotropic heat diffusion is employed for preprocessing of the foveal data. Expansion template matching is used to derive a detection filter that optimizes the discriminant signal-to-noise ratio (SNR). Results are presented with simulated foveal imagery derived from real uniform acuity FLIR data.
Recent interest in developing online computervision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images an...
详细信息
ISBN:
(纸本)9781479943098
Recent interest in developing online computervision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images and video streams. Online vision algorithms for managing, processing and analyzing these streams need to rely upon streaming concepts, such as pipelines, to ensure timely and incremental processing of data. This paper is a first attempt at defining a formal stream algebra that provides a mathematical description of vision pipelines and describes the distributed manipulation of image and video streams. We also show how our algebra can effectively describe the vision pipelines of two state of the art techniques.
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from s...
详细信息
ISBN:
(纸本)0780342364
In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy;rendering conditions from surface shape in shape-from-shading;face identity and head pose in face recognition;or font and letter class in character recognition. We refer to these two factors generically as ''style'' and ''content''. Bilinear models offer a powerful framework for extracting the two-factor structure of a set of observations, and are familiar in computational vision from several well-known lines of research. This paper shows how bilinear models can be used to learn the style-content structure of a pattern analysis or synthesis problem, which can then be generalized to solve related tasks using different styles and/or content. We focus on three tasks: extrapolating the style of data to unseen content classes, classifying data with known content under a novel style, and translating data from novel content classes and style to a known style or content. We show examples from color constancy, face pose estimation, shape-from-shading, typography and speech.
暂无评论