Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the n...
详细信息
ISBN:
(纸本)9781665448994
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the next step. Such fusion methods are beneficial in the situations where running additional matching algorithms needed for later stages is time consuming or expensive. The construction of multistage fusion methods is challenging, since it requires both learning fusion functions and finding optimal decision thresholds for each stage. In this paper, we propose the use of single neural network for learning the multistage fusion. In addition we discuss the choices for the performance measurements of the trained algorithms and for the selection of network training optimization criteria. We perform the experiments using three face matching algorithms and IJB-A and IJB-C databases.
Learned lossy image compression has demonstrated impressive progress via end-to-end neural network training. However, this end-to-end training belies the fact that lossy compression is inherently not differentiable, d...
详细信息
ISBN:
(纸本)9781665448994
Learned lossy image compression has demonstrated impressive progress via end-to-end neural network training. However, this end-to-end training belies the fact that lossy compression is inherently not differentiable, due to the necessity of quantisation. To overcome this difficulty in training, researchers have used various approximations to the quantisation step. However, little work has studied the mechanism of quantisation approximation itself. We address this issue, identifying three gaps arising in the quantisation approximation problem. These gaps are visualised, and show the effect of applying different quantisation approximation methods. Following this analysis, we propose a Soft-STE quantisation approximation method, which closes these gaps and demonstrates better performance than other quantisation approaches on the Kodak dataset.
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (D...
详细信息
ISBN:
(纸本)9781728193601
Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (DNN) to conventional software is a challenge for practical software verification. In this work, we show how existing methods from software engineering provide benefits for the development of a DNN and in particular for dataset design and analysis. We show how combinatorial testing based on a domain model can be leveraged for generating test sets providing coverage guarantees with respect to important environmental features and their interaction. Additionally, we show how our approach can be used for growing a dataset, i.e. to identify where data is missing and should be collected next. We evaluate our approach on an internal use case and two public datasets.
In this paper we present the Women in computervision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in computer vis...
详细信息
ISBN:
(纸本)9781728125060
In this paper we present the Women in computervision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in computervision field. computervision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in the academia and in the industry. WiCV is organized especially for this reason: to raise visibility of female researchers, to increase collaborations between them, and to provide mentorship to female junior researchers in the field. In this paper, we present a report of trends over the past years, along with a summary of statistics regarding presenters, attendees, and sponsorship for the current workshop.
Face recognition technique is widely used in the real-world applications over the past decade. Different from other biometric traits such as fingerprint and iris, face is the biological nature for humans to recognise ...
详细信息
ISBN:
(纸本)9780769549903
Face recognition technique is widely used in the real-world applications over the past decade. Different from other biometric traits such as fingerprint and iris, face is the biological nature for humans to recognise a person even met just once. In this paper, we propose a novel method, which simulates the mechanism of fixations and saccades in human visual perception, to handle the face recognition from single image per person problem. Our method is robust to the local deformations of the face (i.e., expression changes and occlusions). Especially for the occlusion related problems, which have not received enough attentions compared with other challenging variations of illumination, expression and pose, our method significantly outperforms the state-of-the-art approaches despite various types of occlusions. Experimental results on the FRGC and the AR databases confirm the effectiveness of our method.
We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion repres...
详细信息
ISBN:
(数字)9781728193601
ISBN:
(纸本)9781728193601
We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion representations and residual information by treating the two successive frames as inputs. The prior probability of the latent representations is modeled by a hyperprior auto-encoder and trained jointly with the MV-Residual network. Specially, the spatially-displaced convolution is applied for video frame prediction, in which a motion kernel for each pixel is learned to generate predicted pixel by applying the kernel at a displaced location in the source image. Finally, novel rate allocation and post-processing strategies are used to produce the final compressed bits, considering the bits constraint of the challenge. The experimental results on validation set show that the proposed optimized framework can generate the highest MS-SSIM for P-frame compression competition.
Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image o...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Cross domain image retrieval is a challenging task that implies matching images from one domain to their pairs from another domain. In this paper we focus on fashion image retrieval, which involves matching an image of a fashion item taken by users, to the images of the same item taken in controlled condition, usually by professional photographer. When facing this problem, we have different products in train and test time, and we use triplet loss to train the network. We stress the importance of proper training of simple architecture, as well as adapting general models to the specific task.
Temporal segmentation of human motion into actions is central to the understanding and building of computational models of human motion and activity recognition. Several issues contribute to the challenge of temporal ...
详细信息
ISBN:
(纸本)9781424439942
Temporal segmentation of human motion into actions is central to the understanding and building of computational models of human motion and activity recognition. Several issues contribute to the challenge of temporal segmentation and classification of human motion. These include the large variability in the temporal scale and Periodicity of human actions, the complexity of representing articulated motion, and the exponential nature of all possible movement combinations. We provide initial results from investigating two distinct problems - classification of the overall task being performed, and the more difficult problem of classifying individual frames over time into specific actions. We explore first-person sensing through a wearable camera and Inertial Measurement Units (IMUs)for temporally segmenting human motion into actions and performing activity classification in the context of cooking and recipe preparation in a natural environment. We present baseline results for supervised and unsupervised temporal segmentation, and recipe recognition in the CMU-Multimodal activity database (CMU-MMAC).
Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computervision, thanks to recently introduced 3D sensors. In the literature, naive methods simply ...
详细信息
ISBN:
(纸本)9781538607336
Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computervision, thanks to recently introduced 3D sensors. In the literature, naive methods simply transfer off-the-shelf techniques from video to the skeletal representation. However, the current state-of-the-art is contended between to different paradigms: kernel-based methods and feature learning with (recurrent) neural networks. Both approaches show strong performances, yet they exhibit heavy, but complementary, drawbacks. Motivated by this fact, our work aims at combining together the best of the two paradigms, by proposing an approach where a shallow network is fed with a covariance representation. Our intuition is that, as long as the dynamics is effectively modeled, there is no need for the classification network to be deep nor recurrent in order to score favorably. We validate this hypothesis in a broad experimental analysis over 6 publicly available datasets.
Millions of people are disconnected from basic services due to lack of adequate addressing. We propose an automatic generative algorithm to create street addresses from satellite imagery. Our addressing scheme is cohe...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Millions of people are disconnected from basic services due to lack of adequate addressing. We propose an automatic generative algorithm to create street addresses from satellite imagery. Our addressing scheme is coherent with the street topology, linear and hierarchical to follow human perception, and universal to be used as a unified geocoding system. Our algorithm starts with extracting road segments using deep learning and partitions the road network into regions. Then regions, streets, and address cells are named using proximity computations. We also extend our addressing scheme to cover inaccessible areas, to be flexible for changes, and to lead as a pioneer for a unified geodatabase.
暂无评论