We present algorithms for coupling and training hidden Markov models CHMMsl to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs ...
详细信息
ISBN:
(纸本)0780342364
We present algorithms for coupling and training hidden Markov models CHMMsl to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and classifying dynamic behaviors, popular because they offer dynamic time warping, a training algorithm, and a clear Bayesian semantics. However;the Markovian framework makes strong restrictive assumptions about the system generating the signal-that it is a single process having a smalt number of states and an extremely limited stare memory The single-process model is often inappropriate for vision (and speech) applications, resulting in low ceilings on model performance. Coupled HMMs provide an efficient way to resolve many of these problems, and offer superior training speeds, model likelihoods, and robustness to initial conditions.
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Most existing similarity learning methods exacerbate the unexplain...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Most existing similarity learning methods exacerbate the unexplainability by mapping each sample to a single point in the embedding space with a distance metric (e.g., Mahalanobis distance, Euclidean distance). Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph and then infer the overall similarity accordingly. Furthermore, we establish a bottom-up similarity construction and top-down similarity inference framework to infer the similarity based on semantic hierarchy consistency. We first identify unreliable higher-level similarity nodes and then correct them using the most coherent adjacent lower-level similarity nodes, which simultaneously preserve traces for similarity attribution. Extensive experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods and verify the interpretability of our framework.(1)
The Perseus system is a purposive visual architecture that has been used to recognize the pointing gesture. recognition of this gesture is an important part of natural human-machine interfaces. Perseus is modularized ...
详细信息
ISBN:
(纸本)0818672587
The Perseus system is a purposive visual architecture that has been used to recognize the pointing gesture. recognition of this gesture is an important part of natural human-machine interfaces. Perseus is modularized into 6 types of components: feature maps, object representations, markers, visual routines, a segmentation map, and a long term visual memory. This structure not only allows Perseus to use knowledge about the task and environment at every stage of processing to more efficiently and accurately solve the pointing task, but also allows it to be extended to tasks other than recognizing pointing.
recognition ambiguity, due to noisy measurements and uncertain object models, can be quantified and actively used by an autonomous agent to efficiently gather new data and improve its information about the environment...
详细信息
ISBN:
(纸本)0818672587
recognition ambiguity, due to noisy measurements and uncertain object models, can be quantified and actively used by an autonomous agent to efficiently gather new data and improve its information about the environment. In this work an information-based utility measure is used to derive from a learned classification of shape models an efficient data collection strategy, specifically aimed at increasing classification confidence when recognizing uncertain shapes. Promising simulation results are presented and discussed.
Scene classification is a major open challenge in machine vision. Most solutions proposed so far such as those based on color histograms and local texture statistics cannot capture a scene's global configuration, ...
详细信息
ISBN:
(纸本)0780342364
Scene classification is a major open challenge in machine vision. Most solutions proposed so far such as those based on color histograms and local texture statistics cannot capture a scene's global configuration, which is critical in perceptual judgments of scene similarity. We present a novel approach, ''configural recognition'', for encoding scene class structure. The approach's main feature is its use of qualitative spatial and photometric relationships within and across regions in low resolution images. The emphasis on qualitative measures leads to enhanced generalization abilities and the use of low-resolution images renders the scheme computationally efficient. We present results on a large database of natural scenes. We also describe how qualitative scene concepts may be learned from examples.
The trifocal tensor, which describes the relation between projections of points and lines in three views, is a fundamental entity of geometric computervision. In this work, we investigate a new parametrization of the...
详细信息
ISBN:
(纸本)9781467369640
The trifocal tensor, which describes the relation between projections of points and lines in three views, is a fundamental entity of geometric computervision. In this work, we investigate a new parametrization of the trifocal tensor for calibrated cameras with non-colinear pinholes obtained from a quotient Riemannian manifold. We incorporate this formulation into state-of-the art methods for optimization on manifolds, and show, through experiments in pose averaging, that it produces a meaningful way to measure distances between trifocal tensors.
Prototype-based methods use interpretable representations to address the black-box nature of deep learning models, in contrast to post-hoc explanation methods that only approximate such models. We propose the Neural P...
详细信息
ISBN:
(纸本)9781665445092
Prototype-based methods use interpretable representations to address the black-box nature of deep learning models, in contrast to post-hoc explanation methods that only approximate such models. We propose the Neural Prototype Tree (ProtoTree), an intrinsically interpretable deep learning method for fine-grained image recognition. ProtoTree combines prototype learning with decision trees, and thus results in a globally interpretable model by design. Additionally, ProtoTree can locally explain a single prediction by outlining a decision path through the tree. Each node in our binary tree contains a trainable prototypical part. The presence or absence of this learned prototype in an image determines the routing through a node. Decision making is therefore similar to human reasoning: Does the bird have a red throat? And an elongated beak? Then it's a hummingbird! We tune the accuracy-interpretability trade-off using ensemble methods, pruning and binarizing. We apply pruning without sacrificing accuracy, resulting in a small tree with only 8 learned prototypes along a path to classify a bird from 200 species. An ensemble of 5 ProtoTrees achieves competitive accuracy on the CUB-200-2011 and Stanford Cars data sets. Code is available at ***/M-Nauta/ProtoTree.
Colorization refers to the process of adding color to black & white images or videos. This paper extends the term to handle surfaces in three dimensions. This is important for applications in which the colors of a...
详细信息
ISBN:
(纸本)9780769549897
Colorization refers to the process of adding color to black & white images or videos. This paper extends the term to handle surfaces in three dimensions. This is important for applications in which the colors of an object need to be restored and no relevant image exists for texturing it. We focus on surfaces with patterns and propose a novel algorithm for adding colors to these surfaces. The user needs only to scribble a few color strokes on one instance of each pattern, and the system proceeds to automatically colorize the whole surface. For this scheme to work, we address not only the problem of colorization, but also the problem of pattern detection on surfaces.
Weakly supervised object localization aims to find a target object region in a given image with only weak supervision, such as image-level labels. Most existing methods use a class activation map (CAM) to generate a l...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Weakly supervised object localization aims to find a target object region in a given image with only weak supervision, such as image-level labels. Most existing methods use a class activation map (CAM) to generate a localization map;however, a CAM identifies only the most discriminative parts of a target object rather than the entire object region. In this work, we find the gap between classification and localization in terms of the misalignment of the directions between an input feature and a class-specific weight. We demonstrate that the misalignment suppresses the activation of CAM in areas that are less discriminative but belong to the target object. To bridge the gap, we propose a method to align feature directions with a class-specific weight. The proposed method achieves a state-of-the-art localization performance on the CUB-200-2011 and ImageNet-1K benchmarks.
We propose an iterative method for estimating rigid transformations from point sets using adiabatic quantum computation. Compared to existing quantum approaches, our method relies on an adaptive scheme to solve the pr...
详细信息
ISBN:
(纸本)9781665469463
We propose an iterative method for estimating rigid transformations from point sets using adiabatic quantum computation. Compared to existing quantum approaches, our method relies on an adaptive scheme to solve the problem to high precision, and does not suffer from inconsistent rotation matrices. Experimentally, our method performs robustly on several 2D and 3D datasets even with high outlier ratio.
暂无评论