In this paper we show how to learn directly from image data (i.e., without resorting to manually-designed features) a general similarity function for comparing image patches, which is a task of fundamental importance ...
详细信息
ISBN:
(纸本)9781467369640
In this paper we show how to learn directly from image data (i.e., without resorting to manually-designed features) a general similarity function for comparing image patches, which is a task of fundamental importance for many computervision problems. To encode such a function, we opt for a CNN-based model that is trained to account for a wide variety of changes in image appearance. To that end, we explore and study multiple neural network architectures, which are specifically adapted to this task. We show that such an approach can significantly outperform the state-ofthe-art on several problems and benchmark datasets.
We propose binary range-sample feature in depth. It is based on t tests and achieves reasonable invariance with respect to possible change in scale, viewpoint, and background. It is robust to occlusion and data corrup...
详细信息
ISBN:
(纸本)9781479951178
We propose binary range-sample feature in depth. It is based on t tests and achieves reasonable invariance with respect to possible change in scale, viewpoint, and background. It is robust to occlusion and data corruption as well. The descriptor works in a high speed thanks to its binary property. Working together with standard learning algorithms, the proposed descriptor achieves state-of-the-art results on benchmark datasets in our experiments. Impressively short running time is also yielded.
Planar pose measurement from images is an important problem for automated assembly and inspection. In addition to accuracy and robustness, ease of use is very important for real world applications. Recently, Murase an...
详细信息
ISBN:
(纸本)0818672587
Planar pose measurement from images is an important problem for automated assembly and inspection. In addition to accuracy and robustness, ease of use is very important for real world applications. Recently, Murase and Nayar have presented the 'parametric eigenspace' for object recognition and pose measurement based on training images. Although their system is easy to use, it has potential problems with background clutter and partial occlusions. We present an algorithm that is robust in these terms. It uses several small features on the object rather than a monolithic template. These 'eigenfeatures' are matched using a median statistic, giving the system robustness in the face of background clutter and partial occlusions. We demonstrate our algorithm's pose measurement accuracy with a controlled test, and we demonstrate its detection robustness on cluttered images with the objects of interest partially occluded.
The problem of finding the closest point in high-dimensional spaces is common in computational vision. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially...
详细信息
ISBN:
(纸本)0818672587
The problem of finding the closest point in high-dimensional spaces is common in computational vision. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user specified distance ε. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance ε. Our algorithm uses a projection search technique along with a novel data structure to dramatically improve performance in high dimensions. A complexity analysis is presented which can help determine ε in structured problems. Benchmarks clearly show the superiority of the proposed algorithm for high dimensional search problems frequently encountered in machine vision, such as real-time object recognition.
This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with ieee International conference on computervision and patternrecognition (cvpr), 2022. The 3rd ABAW C...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with ieee International conference on computervision and patternrecognition (cvpr), 2022. The 3rd ABAW Competition is a continuation of the Competitions held at ICCV 2021, ieee FG 2020 and ieeecvpr 2017 conferences, and aims at automatically analyzing affect. This year the Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) MultiTask-Learning. All the Challenges are based on a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one to be annotated in terms of valence-arousal, expressions and action units. In this paper, we present the four Challenges, with the utilized Competition corpora, we outline the evaluation metrics and present both the baseline systems and the top performing teams' per Challenge. Finally we illustrate the obtained results of the baseline systems and of all participating teams.
We propose a framework that performs action recognition and identity maintenance of multiple targets simultaneously. Instead of first establishing tracks using an appearance model and then performing action recognitio...
详细信息
ISBN:
(纸本)9781467312288
We propose a framework that performs action recognition and identity maintenance of multiple targets simultaneously. Instead of first establishing tracks using an appearance model and then performing action recognition, we construct a network flow-based model that links detected bounding boxes across video frames while inferring activities, thus integrating identity maintenance and action recognition. Inference in our model reduces to a constrained minimum cost flow problem, which we solve exactly and efficiently. By leveraging both appearance similarity and action transition likelihoods, our model improves on state-of-the-art results on action recognition for two datasets.
The following shape segmentation problem is addressed: find the part decomposition of a 3D object that accounts for an observed pattern of similarities among several of the object's views. This represents the inve...
详细信息
The following shape segmentation problem is addressed: find the part decomposition of a 3D object that accounts for an observed pattern of similarities among several of the object's views. This represents the inverse, ill-posed version of the direct problem of computing perceptual similarities among object views when the object parts are known. The problem is solved by inverting a proposed model for the direct similarity-from-parts problem by resorting to regularization techniques. The algorithm takes as input the geometry of the object (given as a triangular mesh), the camera positions corresponding to the test views, and the perceptual similarities among the rest views. The output of the algorithm is a segmentation of the surface of the object hto connected regions, i.e., parts.
Complementary fashion item recommendation is critical for fashion outfit completion. Existing methods mainly focus on outfit compatibility prediction but not in a retrieval setting. We propose a new framework for outf...
详细信息
ISBN:
(纸本)9781728171685
Complementary fashion item recommendation is critical for fashion outfit completion. Existing methods mainly focus on outfit compatibility prediction but not in a retrieval setting. We propose a new framework for outfit complementary item retrieval. Specifically, a category-based subspace attention network is presented, which is a scalable approach for learning the subspace attentions. In addition, we introduce an outfit ranking loss that better models the item relationships of an entire outfit. We evaluate our method on the outfit compatibility, FITB and new retrieval tasks. Experimental results demonstrate that our approach outperforms state-of-the-art methods in both compatibility prediction and complementary item retrieval.
The Variational Autoencoder (VAE) is a powerful architecture capable of representation learning and generative modeling. When it comes to learning interpretable (disentangled) representations, VAE and its variants sho...
详细信息
ISBN:
(纸本)9781728132938
The Variational Autoencoder (VAE) is a powerful architecture capable of representation learning and generative modeling. When it comes to learning interpretable (disentangled) representations, VAE and its variants show unparalleled performance. However, the reasons for this are unclear, since a very particular alignment of the latent embedding is needed but the design of the VAE does not encourage it in any explicit way. We address this matter and offer the following explanation: the diagonal approximation in the encoder together with the inherent stochasticity force local orthogonality of the decoder. The local behavior of promoting both reconstruction and orthogonality matches closely how the PCA embedding is chosen. Alongside providing an intuitive understanding, we justify the statement with full theoretical analysis as well as with experiments.
暂无评论