Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing t...
详细信息
ISBN:
(纸本)9781665487399
Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification.
Neural network designers have reached progressive accuracy by increasing models depth, introducing new layer types and discovering new combinations of layers. A common element in many architectures is the distribution...
详细信息
ISBN:
(纸本)9781665448994
Neural network designers have reached progressive accuracy by increasing models depth, introducing new layer types and discovering new combinations of layers. A common element in many architectures is the distribution of the number of filters in each layer. Neural network models keep a pattern design of increasing filters in deeper layers such as those in LeNet, VGG, ResNet, MobileNet and even in automatic discovered architectures such as NASNet. It remains unknown if this pyramidal distribution of filters is the best for different tasks and constrains. In this work we present a series of modifications in the distribution of filters in three popular neural network models and their effects in accuracy and resource consumption. Results show that by applying this approach, some models improve up to 8.9% in accuracy showing reductions in parameters up to 54%.
Event-based vision, as realized by bio-inspired Dynamic vision Sensors (DVS), is gaining more and more popularity due to its advantages of high temporal resolution, wide dynamic range and power efficiency at the same ...
详细信息
ISBN:
(纸本)9781538607336
Event-based vision, as realized by bio-inspired Dynamic vision Sensors (DVS), is gaining more and more popularity due to its advantages of high temporal resolution, wide dynamic range and power efficiency at the same time. Potential applications include surveillance, robotics, and autonomous navigation under uncontrolled environment conditions. In this paper, we deal with event-based vision for 3D reconstruction of dynamic scene content by using two stationary DVS in a stereo configuration. We focus on a cooperative stereo approach and suggest an improvement over a previously published algorithm that reduces the measured mean error by over 50 percent. An available ground truth data set for stereo event data is utilized to analyze the algorithm's sensitivity to parameter variation and for comparison with competing techniques.
While most approaches to symmetry detection in machine vision try to explain the gray-values or colors of the pixels, Gestalt algebra has no room for such measurement data. The entities (i.e. Gestalten) are only defin...
详细信息
ISBN:
(纸本)9780769549903
While most approaches to symmetry detection in machine vision try to explain the gray-values or colors of the pixels, Gestalt algebra has no room for such measurement data. The entities (i.e. Gestalten) are only defined with respect to each other. They form a generic hierarchy, and live in a continuous domain without any pixel raster. There is also no constraint forcing them to completely fill an image, or prohibiting overlap. Yet, when used as a tool for symmetry recognition, the algebra must be somehow connected to the given data. In this paper this is done only on the primitive level using the well-known SIFT feature detector. From a set of such SIFT-based Gestalten follows a combinatorial set of higher-order symmetric Gestalten by constructing all possible terms using the operations of the algebra. The Gestalt domain contains a quality or assessment dimension. Taking the best Gestalten with respect to this attribute and clustering them yields the output for this competition participation.
The prediction of human eye fixations has been recently gaining a lot of attention thanks to the improvements shown by deep architectures. In our work, we go beyond classical feed-forward networks to predict saliency ...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
The prediction of human eye fixations has been recently gaining a lot of attention thanks to the improvements shown by deep architectures. In our work, we go beyond classical feed-forward networks to predict saliency maps and propose a Saliency Attentive Model which incorporates neural attention mechanisms to iteratively refine predictions. Experiments demonstrate that the proposed strategy overcomes by a considerable margin the state of the art on the largest dataset available for saliency prediction. Here, we provide experimental results on other popular saliency datasets to confirm the effectiveness and the generalization capabilities of our model, which enable us to reach the state of the art on all considered datasets.
Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computervision. This paper propose...
详细信息
ISBN:
(纸本)9798350365474
Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computervision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing new options with grouping, folding, shuffling, projection, and tensor decomposition, SuperLoRA offers high flexibility and demonstrates superior performance, with up to 10-fold gain in parameter efficiency for transfer learning tasks.
Camera traps are increasingly being deployed by ecologists and citizen-scientists as a cost-effective way of obtaining large amounts of animal images in the wild. In order to analyze this data, the images are labeled ...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Camera traps are increasingly being deployed by ecologists and citizen-scientists as a cost-effective way of obtaining large amounts of animal images in the wild. In order to analyze this data, the images are labeled manually by ecologists, where they identify species of animals and more fine-grained details, such as animal sex or age, or even individual animal identities. However, with the number of camera trap images quickly outgrowing the capacity of the labelers, ecologists are unable to keep up with the wealth of data they are obtaining. Using computervision, we can automatically generate labels for new camera trap images at the rate that they are being obtained, allowing ecologists to uncover ecological and biological information at a scale previously not possible. In this paper, we explore computervision approaches for species identification in camera trap images and for individual jaguar identification, both of which show promising results. We make this novel dataset publicly available for future research directions and further exploration.
Symmetry is a pervasive phenomenon presenting itself in all forms and scales in natural and manmade environments. Its detection plays an essential role at all levels of human as well as machine perception. The recent ...
详细信息
ISBN:
(纸本)9780769549903
Symmetry is a pervasive phenomenon presenting itself in all forms and scales in natural and manmade environments. Its detection plays an essential role at all levels of human as well as machine perception. The recent resurging interest in computational symmetry for computervision and computer graphics applications has motivated us to conduct a US NSF funded symmetry detection algorithm competition as a workshop affiliated with the computervision and patternrecognition (CVPR) conference, 2013. This competition sets a more complete benchmark for computervision symmetry detection algorithms. In this report we explain the evaluation metric and the automatic execution of the evaluation workflow. We also present and analyze the algorithms submitted, and show their results on three test sets of real world images depicting reflection, rotation and translation symmetries respectively. This competition establishes a performance baseline for future work on symmetry detection.
This paper presents a study on the use of Convolutional Neural Networks for camera relocalisation and its application to map compression. We follow state of the art visual relocalisation results and evaluate the respo...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
This paper presents a study on the use of Convolutional Neural Networks for camera relocalisation and its application to map compression. We follow state of the art visual relocalisation results and evaluate the response to different data inputs. We use a CNN map representation and introduce the notion of map compression under this paradigm by using smaller CNN architectures without sacrificing relocalisation performance. We evaluate this approach in a series of publicly available datasets over a number of CNN architectures with different sizes, both in complexity and number of layers. This formulation allows us to improve relocalisation accuracy by increasing the number of training trajectories while maintaining a constant-size CNN.
The search for interpretable directions in latent spaces of pre-trained Generative Adversarial Networks (GANs) has become a topic of interest. These directions can be utilized to perform semantic manipulations on the ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The search for interpretable directions in latent spaces of pre-trained Generative Adversarial Networks (GANs) has become a topic of interest. These directions can be utilized to perform semantic manipulations on the GAN generated images. The discovery of such directions is performed either in a supervised way, which requires manual annotation or pre-trained classifiers, or in an unsupervised way, which requires the user to interpret what these directions represent. In this work, we propose a framework that finds a specific manipulation direction using only a single simple sketch drawn on an image. Our method finds directions consisting of channels in the style space of the StyleGAN2 architecture responsible for the desired edits and performs image manipulations comparable with state-of-the-art methods.
暂无评论