With the emergence of fashion recommendation, many researchers have attempted to recommend fashion items that fit consumers' tastes. However, few have looked into fashion outfits as a whole when making recommendat...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
With the emergence of fashion recommendation, many researchers have attempted to recommend fashion items that fit consumers' tastes. However, few have looked into fashion outfits as a whole when making recommendations. In this paper, we propose a neural network that learns one's fashion taste and predicts whether an individual likes a fashion outfit. To improve learning, we also develop a fashion outfit negative sampling scheme to sample fashion outfits that are different enough. With experiments on the collected Polyvore dataset, we find that using complete images offashion outfits performs well when learning individuals' tastes toward fashion outfits. Our proposed negative sampling scheme also improves the model's performance significantly, compared to random negative sampling.
Face anti-spoofing detection is a crucial procedure in biometric face recognition systems. State-of-the-art approaches, based on Convolutional Neural Networks (CNNs), present good results in this field. However, previ...
详细信息
ISBN:
(纸本)9781728125060
Face anti-spoofing detection is a crucial procedure in biometric face recognition systems. State-of-the-art approaches, based on Convolutional Neural Networks (CNNs), present good results in this field. However, previous works focus on one single modal data with limited number of subjects. The recently published CASIA-SURF dataset is the largest dataset that consists of 1000 subjects and 21000 video clips with 3 modalities (RGB, Depth and IR). In this paper, we propose a multi-stream CNN architecture called FaceBagNet to make full use of this data. The input of FaceBagNet is patch-level images which contributes to extract spoof-specific discriminative information. In addition, in order to prevent overfitting and for better learning the fusion features, we design a Modal Feature Erasing (MFE) operation on the multi-modal features which erases features from one randomly selected modality during training. As the result, our approach wins the second place in cvpr 2019 ChaLearn Face Anti-spoofing attack detection challenge. Ourfinal submission gets the score of 99.8052% (TPR@FPR = 10e-4) on the test set.
Deep CNN models have become state-of-the-art techniques in many application, e.g., face recognition, speaker recognition, and image classification. Although many studies address on speedup or compression of individual...
详细信息
ISBN:
(纸本)9781728125060
Deep CNN models have become state-of-the-art techniques in many application, e.g., face recognition, speaker recognition, and image classification. Although many studies address on speedup or compression of individual models, very few studies focus on co-compressing and unifying models from different modalities. In this work, to joint and compress face and speaker recognition models, a shared-codebook approach is adopted to reduce the redundancy of the combined model. Despite the modality of the inputs of these two CNN models are quite different, the shared codebook can support two CNN models of sound and image for speaker and face recognition. Experiments show the promising results of unified and co-compressing heterogeneous models for efficient inference.
computervision techniques that operate on hyper- and multispectral imagery benefit from the additional amount of spectral information relative to those that exploit traditional RGB or monochromatic visual data. Howev...
详细信息
ISBN:
(纸本)9781728125060
computervision techniques that operate on hyper- and multispectral imagery benefit from the additional amount of spectral information relative to those that exploit traditional RGB or monochromatic visual data. However, the increased volume of data to be processed brings about additional memory, storage and computational requirements. In order to address such limitations, a wide range of techniques for dimensionality reduction have been introduced by previous work. In this paper, we propose a framework for spectral band selection that is highly data- and computationally efficient. The method leverages a convolutional siamese network learned by optimizing a contrastive loss, and performs band selection based on the low-dimensional data embeddings produced by the network. We empirically demonstrate the efficacy of the method on an object detection task from aerial multispectral imagery. The results show that, in spite of the method's frugality, it produces very competitive band selection results against the evaluated competing techniques.
We present a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k st...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
We present a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects (chosen randomly from the 21 object models of the YCB dataset [1]) and flying distractors. Object and camera pose, scene lighting, and quantity of objects and distractors were randomized. Each provided view includes RGB, depth, segmentation, and surface normal images, all pixel level. We describe our approach for domain randomization and provide insight into the decisions that produced the dataset.
With one in four individuals afflicted with malnutrition, computervision may provide a way of introducing a new level of automation in the nutrition field to reliably monitor food and nutrient intake. In this study, ...
详细信息
ISBN:
(纸本)9781728125060
With one in four individuals afflicted with malnutrition, computervision may provide a way of introducing a new level of automation in the nutrition field to reliably monitor food and nutrient intake. In this study, we present a novel approach to modeling the link between color and vitamin A content using transmittance imaging of a pureed foods dilution series in a computervision powered nutrient sensing system via a fine-tuned deep autoencoder network, which in this case was trained to predict the relative concentration of sweet potato purees. Experimental results show the deep autoencoder network can achieve an accuracy of 80% across beginner (6 month) and intermediate (8 month) commercially prepared pureed sweet potato samples. Prediction errors may be explained by fundamental differences in optical properties which are further discussed.
When convoking privacy, group membership verification checks if a biometric trait corresponds to one member of a group without revealing the identity of that member. Similarly, group membership identification states w...
详细信息
ISBN:
(纸本)9781728125060
When convoking privacy, group membership verification checks if a biometric trait corresponds to one member of a group without revealing the identity of that member. Similarly, group membership identification states which group the individual belongs to, without knowing his/her identity. A recent contribution provides privacy and security for group membership protocols through the joint use of two mechanisms: quantizing biometric templates into discrete embeddings, and aggregating several templates into one group representation. This paper significantly improves that contribution because it jointly learns how to embed and aggregate instead of imposing fixed and hard coded rules. This is demonstrated by exposing the mathematical underpinnings of the learning stage before showing the improvements through an extensive series of experiments targeting face recognition. Overall, experiments show that learning yields an excellent trade-off between security / privacy and the verification / identification performances.
Convolutional Neural Networks (CNNs) have proven very successful in extracting discriminative features from video data. These deep features can be summarized using spatial covariance descriptors for further analysis. ...
详细信息
ISBN:
(纸本)9781728125060
Convolutional Neural Networks (CNNs) have proven very successful in extracting discriminative features from video data. These deep features can be summarized using spatial covariance descriptors for further analysis. However, due to large number of potential features, the covariance descriptors are often very high dimensional. To facilitate large scale data analysis, we propose a novel, metric based dimension-reduction technique that reduces large co-variances to small ones. Then, we represent videos as time series trajectories on the space of covariance matrices, or symmetric-positive definite matrices (SPDMs), and use a Riemannian metric on this space to quantify differences across these trajectories. These distance features can then be used for classification of video sequences. We illustrate this comprehensive framework using data from the UCF11 dataset for action recognition, with classification rates that match or outperform state-of-the-art techniques.
Hand-drawn sketch recognition is a fundamental problem in computervision, widely used in sketch-based image and video retrieval, editing, and reorganization. Previous methods often assume that a complete sketch is us...
详细信息
ISBN:
(纸本)9781728132938
Hand-drawn sketch recognition is a fundamental problem in computervision, widely used in sketch-based image and video retrieval, editing, and reorganization. Previous methods often assume that a complete sketch is used as input;however, hand-drawn sketches in common application scenarios are often incomplete, which makes sketch recognition a challenging problem. In this paper, we propose SketchGAN, a new generative adversarial network (GAN) based approach that jointly completes and recognizes a sketch, boosting the performance of both tasks. Specifically, we use a cascade Encode-Decoder network to complete the input sketch in an iterative manner, and employ an auxiliary sketch recognition task to recognize the completed sketch. Experiments on the Sketchy database benchmark demonstrate that our joint learning approach achieves competitive sketch completion and recognition performance compared with the state-of-the-art methods. Further experiments using several sketch-based applications also validate the performance of our method.
Evolutionary deep intelligence has recently shown great promise for producing small, powerful deep neural network models via the synthesis of increasingly efficient architectures over successive generations. Despite r...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125060
Evolutionary deep intelligence has recently shown great promise for producing small, powerful deep neural network models via the synthesis of increasingly efficient architectures over successive generations. Despite recent research showing the efficacy of multi-parent evolutionary synthesis, little has been done to directly assess architectural similarity between networks during the synthesis process for improved parent network selection. In this work, we present a preliminary study into quantifying architectural similarity via the percentage overlap of architectural clusters. Results show that networks synthesized using architectural alignment (via gene tagging) maintain higher architectural similarities within each generation, potentially restricting the search space of highly efficient network architectures.
暂无评论