We propose a novel unsupervised method for discovering recurring patterns from a single view. A key contribution of our approach is the formulation and validation of a joint assignment optimization problem where multi...
详细信息
ISBN:
(纸本)9780769549897
We propose a novel unsupervised method for discovering recurring patterns from a single view. A key contribution of our approach is the formulation and validation of a joint assignment optimization problem where multiple visual words and object instances of a potential recurring pattern are considered simultaneously. The optimization is achieved by a greedy randomized adaptive search procedure (GRASP) with moves specifically designed for fast convergence. We have quantified systematically the performance of our approach under stressed conditions of the input (missing features, geometric distortions). We demonstrate that our proposed algorithm outperforms state of the art methods for recurring pattern discovery on a diverse set of 400+ real world and synthesized test images.
This paper proposes a method for detecting obstacles on a runway by controlling their expected disparities. By approximating the runway by a planar surface, the initial model flow field (MFF) corresponding to an obsta...
详细信息
ISBN:
(纸本)0818672587
This paper proposes a method for detecting obstacles on a runway by controlling their expected disparities. By approximating the runway by a planar surface, the initial model flow field (MFF) corresponding to an obstacle-free runway is described by the data from onboard sensors (OBS). The error variance of the initial MFF is computed and used to estimate the MFF. Obstacles are detected by comparing the expected residual flow disparities with the residual flow field (RFF) estimated after warping (or stabilizing) an image using the MFF. Expected temporal and spatial disparities are obtained from the use of the OBS. This allows us to control the residual disparities by increasing the temporal baseline and/or by utilizing the spatial baseline if distant objects cannot be detected for a given temporal baseline. Experimental results for two real flight image sequences are presented.
We address the problem of performing backpropagation for computation graphs involving 3D transformation groups SO(3), SE(3), and Sim(3). 3D transformation groups are widely used in 3D vision and robotics, but they do ...
详细信息
ISBN:
(纸本)9781665445092
We address the problem of performing backpropagation for computation graphs involving 3D transformation groups SO(3), SE(3), and Sim(3). 3D transformation groups are widely used in 3D vision and robotics, but they do not form vector spaces and instead lie on smooth manifolds. The standard backpropagation approach, which embeds 3D transformations in Euclidean spaces, suffers from numerical difficulties. We introduce a new library, which exploits the group structure of 3D transformations and performs backpropagation in the tangent spaces of manifolds. We show that our approach is numerically more stable, easier to implement, and beneficial to a diverse set of tasks.
Multi-view subspace clustering aims to partition a set of multi-source data into their underlying groups. To boost the performance of multi-view clustering, numerous subspace learning algorithms have been developed in...
详细信息
ISBN:
(纸本)9781538604571
Multi-view subspace clustering aims to partition a set of multi-source data into their underlying groups. To boost the performance of multi-view clustering, numerous subspace learning algorithms have been developed in recent years, but with rare exploitation of the representation complementarity between different views as well as the indicator consistency among the representations, let alone considering them simultaneously. In this paper, we propose a novel multi-view subspace clustering model that attempts to harness the complementary information between different representations by introducing a novel position-aware exclusivity term. Meanwhile, a consistency term is employed to make these complementary representations to further have a common indicator. We formulate the above concerns into a unified optimization framework. Experimental results on several benchmark datasets are conducted to reveal the effectiveness of our algorithm over other state-of-the-arts.
The ability to normalize pose based on super-category landmarks can significantly improve models of individual categories when training data are limited. Previous methods have considered the use of volumetric or morph...
详细信息
ISBN:
(纸本)9781467312288
The ability to normalize pose based on super-category landmarks can significantly improve models of individual categories when training data are limited. Previous methods have considered the use of volumetric or morphable models for faces and for certain classes of articulated objects. We consider methods which impose fewer representational assumptions on categories of interest, and exploit contemporary detection schemes which consider the ensemble of responses of detectors trained for specific pose-keypoint configurations. We develop representations for poselet-based pose normalization using both explicit warping and implicit pooling as mechanisms. Our method defines a pose normalized similarity or kernel function that is suitable for nearest-neighbor or kernel-based learning methods.
This paper summarizes a novel logic-based approach to grouping and perceptual organization, (presented more thoroughly in [2]), and presents novel efficient methods for computing interpretations in this framework. Gro...
详细信息
ISBN:
(纸本)0780342364
This paper summarizes a novel logic-based approach to grouping and perceptual organization, (presented more thoroughly in [2]), and presents novel efficient methods for computing interpretations in this framework. Grouping interpretations are first defined as logical structures, built out of atomic premises (''regularities'') that are derived from considerations of non-accidentalness. These interpretations can then be partially ordered by their degree of regularity or constraint (measured numerically by their codimension). The Genericity Constraint-the principle that interpretations should minimize coincidences in the observed configuration-dictates that the preferred interpretation will be the minimum in this partial order, i.e. the interpretation with maximum codimension. The preferred interpretation, called the qualitative parse, corresponds neatly to the interpretation intuitively preferred ed by human observers. As a side-effect, the ''most salient'' or most structured part of the scene can be identified, as the highest-codimension subtree of the qualitative parse. An efficient (O(n(2))) method for computing the maximum codimension interpretation is presented, along with examples.
We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection co...
详细信息
ISBN:
(纸本)9780769549897
We present a quadratic unconstrained binary optimization (QUBO) framework for reasoning about multiple object detections with spatial overlaps. The method maximizes an objective function composed of unary detection confidence scores and pairwise overlap constraints to determine which overlapping detections should be suppressed, and which should be kept. The framework is flexible enough to handle the problem of detecting objects as a shape covering of a foreground mask, and to handle the problem of filtering confidence weighted detections produced by a traditional sliding window object detector. In our experiments, we show that our method outperforms two existing state-of-the-art pedestrian detectors.
Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171685
Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image recognition. We consider two forms of self-attention. One is pairwise self-attention, which generalizes standard dot-product attention and is fundamentally a set operator. The other is patchwise self-attention, which is strictly more powerful than convolution. Our pairwise self-attention networks match or outperform their convolutional counterparts, and the patchwise models substantially outperform the convolutional baselines. We also conduct experiments that probe the robustness of learned representations and conclude that self-attention networks may have significant benefits in terms of robustness and generalization.
Point sets are the standard output of many 3D scanning systems and depth cameras. Presenting the set of points as is, might "hide" the prominent features of the object from which the points are sampled. Our ...
详细信息
ISBN:
(纸本)9780769549897
Point sets are the standard output of many 3D scanning systems and depth cameras. Presenting the set of points as is, might "hide" the prominent features of the object from which the points are sampled. Our goal is to reduce the number of points in a point set, for improving the visual comprehension from a given viewpoint. This is done by controlling the density of the reduced point set, so as to create bright regions (low density) and dark regions (high density), producing an effect of shading. This data reduction is achieved by leveraging a limitation of a solution to the classical problem of determining visibility from a viewpoint. In addition, we introduce a new dual problem, for determining visibility of a point from infinity, and show how a limitation of its solution can be leveraged in a similar way.
Egocentricinteractionrecognitionaims to recognize the camera wearer's interactionswith the interactorwho faces the camera wearer in egocentric videos. In such a humanhuman interactionanalysisproblem, it is crucial...
详细信息
ISBN:
(纸本)9781728132938
Egocentricinteractionrecognitionaims to recognize the camera wearer's interactionswith the interactorwho faces the camera wearer in egocentric videos. In such a humanhuman interactionanalysisproblem, it is crucial to explore the relationsbetween the camera wearerand the interactor. However most existing works directly model the interactions as a whole and lack modeling the relations between the two interactingpersons. To exploit the strong relations for egocentric interactionrecognition,we introducea dual relation modelingframework which learns to model the relations between the camera wearerand the interactorbased on the individual action representationsof the two persons. Specifically, we develop a novel interactive LSTM module, the key component of our framework, to explicitly model the relations between the two interactingpersons based on their individual action representations,which are collaboratively learned with an interactorattention module and a global-localmotion module. Experimental results on three egocentric interactiondatasetsshow the effectiveness ofour method and advantage over state-of-the-arts.
暂无评论