This paper presents the application of 2D and 3D Hough Transforms together with conformal geometric algebra to build 3D geometric maps using the geometric entities of lines and planes. Among several existing technique...
详细信息
Many successful models for predicting attention in a scene involve three main steps: convolution with a set of filters, a center-surround mechanism and spatial pooling to construct a saliency map. However, integrating...
详细信息
Variations in pose, illumination and expression in faces make face recognition a difficult problem. Several researchers have shown that faces of the same individual, despite all these variations, lie on a complex mani...
详细信息
ISBN:
(纸本)9783642240850
Variations in pose, illumination and expression in faces make face recognition a difficult problem. Several researchers have shown that faces of the same individual, despite all these variations, lie on a complex manifold in a higher dimensional space. Several methods have been proposed to exploit this fact to build better recognition systems, but have not succeeded to a satisfactory extent. We propose a new method to model this higher dimensional manifold with available data, and use a reconstruction technique to approximate unavailable data points. The proposed method is tested on Sheffield (previously UMIST) database, Extended Yale Face database B and AT&T (previously ORL) database of faces. Our method outperforms other manifold based methods such as Nearest Manifold and other methods such as PCA, LDA Modular PCA, Generalized 2D PCA and super-resolution method for face recognition using nonlinear mappings on coherent features.
In the central visual pathway originating from the eye, a bridging is required between two hierarchical tasks, that of pixel based information recording by visual pathway at low level on one hand and that of object re...
详细信息
ISBN:
(纸本)9783642271717
In the central visual pathway originating from the eye, a bridging is required between two hierarchical tasks, that of pixel based information recording by visual pathway at low level on one hand and that of object recognition at high level on the other. Such a bridge which may be designated as a mid-level block-grained integration has here been modeled by a multi-layer flexible cellular neural network (F-CNN). The proposed CNN architecture is validated by different intermediate level tasks involving rigid and deformable patternrecognition. Execution of such tasks by the proposed architecture, it has been shown, is capable of generating valid and significant inputs for the WHERE (dorsal) and WHAT (ventral) pathways in the brain. The model includes the proposal of a feedback (also by CNN architecture) to the lower mid-level from the higher mid-level dorsal and ventral pathways for flexible cell (physiological receptive field) size adjustment in the primary visual cortex towards successful 'where' and 'what' identifications for high-level vision.
We present a novel classification method formulating an objective model by 2,1-norm based regression. The 2,1-norm based loss function is robust to outliers or the large variations within given data, and the 2,1-norm ...
详细信息
We consider the problem of geo-locating static cameras from long-term time-lapse imagery. This problem has received significant attention recently, with most methods making strong assumptions on the geometric structur...
详细信息
The Graphs are very powerful and widely used tool for data representation in various fields of science and engineering. Due to their versatile representational power graphs are widely used for dealing with structural ...
详细信息
In Bayesian patternrecognition research, static classifiers have featured prominently in the literature. A static classifier is essentially based on a static model of input statistics, thereby assuming input ergodici...
详细信息
ISBN:
(纸本)9780819487469
In Bayesian patternrecognition research, static classifiers have featured prominently in the literature. A static classifier is essentially based on a static model of input statistics, thereby assuming input ergodicity that is not realistic in practice. Classical Bayesian approaches attempt to circumvent the limitations of static classifiers, which can include brittleness and narrow coverage, by training extensively on a data set that is assumed to cover more than the subtense of expected input. Such assumptions are not realistic for more complex pattern classification tasks, for example, object detection using pattern classification applied to the output of computervision filters. In contrast, we have developed a two step process, that can render the majority of static classifiers adaptive, such that the tracking of input nonergodicities is supported. Firstly, we developed operations that dynamically insert (or resp. delete) training patterns into (resp. from) the classifier's pattern database, without requiring that the classifier's internal representation of its training database be completely recomputed. Secondly, we developed and applied a pattern replacement algorithm that uses the aforementioned pattern insertion/deletion operations. This algorithm is designed to optimize the pattern database for a given set of performance measures, thereby supporting closed-loop, performance-directed optimization. This paper presents theory and algorithmic approaches for the efficient computation of adaptive linear and nonlinear patternrecognition operators that use our pattern insertion/deletion technology - in particular, tabular nearest-neighbor encoding (TNE) and lattice associative memories (LAMs). Of particular interest is the classification of nonergodic datastreams that have noise corruption with time-varying statistics. The TNE and LAM based classifiers discussed herein have been successfully applied to the computation of object classification in hyperspectral re
A practical lipreading system can be considered either as subject dependent (SD) or subject-independent (SI). An SD system is user-specific, i.e., customized for some particular user while an SI system has to cope wit...
This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The frames are segmented into foreground and...
详细信息
This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The frames are segmented into foreground and background layers that comprise a subject (participant) and other objects and individuals. The algorithm produces correct segmentations even in the presence of large background motion with a nearly stationary foreground. This research makes three key contributions: First, we introduce a novel motion representation, referred to as "motons," inspired by research in object recognition. Second, we propose estimating the segmentation likelihood from the spatial context of motion. The estimation is efficiently learned by random forests. Third, we introduce a general taxonomy of tree-based classifiers that facilitates both theoretical and experimental comparisons of several known classification algorithms and generates new ones. In our bilayer segmentation algorithm, diverse visual cues such as motion, motion context, color, contrast, and spatial priors are fused by means of a conditional random field (CRF) model. Segmentation is then achieved by binary min-cut. Experiments on many sequences of our videochat application demonstrate that our algorithm, which requires no initialization, is effective in a variety of scenes, and the segmentation results are comparable to those obtained by stereo systems.
暂无评论