A recognition strategy consisting of a mixture of indexing on invariants and search, allows objects to be recognised up to a Euclidean ambiguity with an uncalibrated camera. The approach works by using projective inva...
详细信息
A recognition strategy consisting of a mixture of indexing on invariants and search, allows objects to be recognised up to a Euclidean ambiguity with an uncalibrated camera. The approach works by using projective invariants to determine all the possible projectively equivalent models for a particular imaged object; then a system of global consistency constraints is used to determine which of these projectively equivalent, but Euclidean distinct, models corresponds to the objects viewed. These constraints follow from properties of the imaging geometry. In particular, a recognition hypothesis is equivalent to an assertion about, among other things, viewing conditions and geometric relationships between objects, and these assertions must be consistent for hypotheses to be correct. The approach is demonstrated to work on images of real scenes consisting of polygonal objects and polyhedra.< >
In this paper we introduce visual phrases, complex visual composites like "a person riding a horse". Visual phrases often display significantly reduced visual complexity compared to their component objects, ...
详细信息
The paper proposes a vision based online mapping of large-scale environments. Our novel approach uses a hybrid representation of a fully metric Euclidean environment map and a topological map. This novel hybrid repres...
详细信息
In this paper, we describe the use of Genetic Programming (GP) techniques to learn a visual feature detection for a mobile robot navigation task. We provide experimental results across a number of different environmen...
详细信息
In this paper, we describe the use of Genetic Programming (GP) techniques to learn a visual feature detection for a mobile robot navigation task. We provide experimental results across a number of different environments, each with different characteristics, and draw conclusions about the performance of the learned feature detector. We also explore the utility of seeding the initial population with a previously evolved individual, and discuss the performance of the resulting individuals.
Visual attributes expose human-defined semantics to object recognition models, but existing work largely restricts their influence to mid-level cues during classifier training. Rather than treat attributes as intermed...
详细信息
We present a framework for the automatic recognition of complex multi-agent events in settings where structure is imposed by rules that agents must follow while performing activities. Given semantic spatio-temporal de...
详细信息
In this paper, we propose a novel shape-theoretic framework for dynamical analysis of human movement from 3D data. The key idea we propose is the use of global descriptors of the shape of the dynamical attractor as a ...
详细信息
In this paper, we propose a novel shape-theoretic framework for dynamical analysis of human movement from 3D data. The key idea we propose is the use of global descriptors of the shape of the dynamical attractor as a feature for modeling actions. We apply this approach to the novel application scenario of estimation of movement quality from a single-marker for future usage in home-based stroke rehabilitation. Using a dataset collected from 15 stroke survivors performing repetitive task therapy, we demonstrate that the proposed method outperforms traditional methods, such as kinematic analysis and use of chaotic invariants, in estimation of movement quality. In addition, we demonstrate that the proposed framework is sufficiently general for the application of action and gesture recognition as well. Our experimental results reflect improved action recognition results on two publicly available 3D human activity databases.
A geometric framework for the recognition of three-dimensional objects represented by point clouds is introduced in this paper. The proposed approach is based on comparing distributions of intrinsic measurements on th...
详细信息
A geometric framework for the recognition of three-dimensional objects represented by point clouds is introduced in this paper. The proposed approach is based on comparing distributions of intrinsic measurements on the point cloud. In particular, intrinsic distances are exploited as signatures for representing the point clouds. The first signature we introduce is the histogram of pairwise diffusion distances between all points on the shape surface. These distances represent the probability of traveling from one point to another in a fixed number of random steps, the average intrinsic distances of all possible paths of a given number of steps between the two points. This signature is augmented by the histogram of the actual pairwise geodesic distances, as well as the distribution of the ratio between these two distances. These signatures are not only geometric but also invariant to bends. We further augment these signatures by the distribution of a curvature function and the distribution of a curvature weighted distance. These histograms are compared using the chi 2 or other common distance metrics for distributions. The presentation of the framework is accompanied by theoretical justification and state-of-the-art experimental results with the standard Princeton 3D shape benchmark and ISDB datasets, as well as a detailed analysis of the particular relevance of each one of the different histogram-based signatures. Finally, we briefly discuss a more local approach where the histograms are computed for a number of overlapping patches from the object rather than the whole shape, thereby opening the door to partial shape comparisons.
Visual patternrecognition over agricultural areas is an important application of aerial image processing. In this paper, we consider the multi-modality nature of agricultural aerial images and show that naively combi...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193618
Visual patternrecognition over agricultural areas is an important application of aerial image processing. In this paper, we consider the multi-modality nature of agricultural aerial images and show that naively combining different modalities together without taking the feature divergence into account can lead to sub-optimal results. Thus, we apply a Switchable Normalization block to our DeepLabV3+ segmentation model to alleviate the feature divergence. Using the popular symmetric Kullback-Leibler divergence measure, we show that our model can greatly reduce the divergence between RGB and near-infrared channels. Together with a hybrid loss function, our model achieves nearly 10% improvements in mean IoU over previously published baseline.
A recent trend in saliency algorithm development is large-scale benchmarking and algorithm ranking with ground truth provided by datasets of human fixations. In order to accommodate the strong bias humans have toward ...
详细信息
ISBN:
(纸本)9781467388511
A recent trend in saliency algorithm development is large-scale benchmarking and algorithm ranking with ground truth provided by datasets of human fixations. In order to accommodate the strong bias humans have toward central fixations, it is common to replace traditional ROC metrics with a shuffled ROC metric which uses randomly sampled fixations from other images in the database as the negative set. However, the shuffled ROC introduces a number of problematic elements, including a fundamental assumption that it is possible to separate visual salience and image spatial arrangement. We argue that it is more informative to directly measure the effect of spatial bias on algorithm performance rather than try to correct for it. To capture and quantify these known sources of bias, we propose a novel metric for measuring saliency algorithm performance: the spatially binned ROC (spROC). This metric provides direct insight into the spatial biases of a saliency algorithm without sacrificing the intuitive raw performance evaluation of traditional ROC measurements. By quantitatively measuring the bias in saliency algorithms, researchers will be better equipped to select and optimize the most appropriate algorithm for a given task. We use a baseline measure of inherent algorithm bias to show that Adaptive Whitening Saliency (AWS) [14], Attention by Information Maximization (AIM) [8], and Dynamic Visual Attention (DVA) [20] provide the least spatially biased results, suiting them for tasks in which there is no information about the underlying spatial bias of the stimuli, whereas algorithms such as Graph Based Visual Saliency (GBVS) [18] and Context-Aware Saliency (CAS) [15] have a significant inherent central bias.
暂无评论