Person re-identification aims to associate images of the same person over multiple non-overlapping camera views at different times. Depending on the human operator, manual re-identification in large camera networks is...
详细信息
Cosmetic makeup is an art in itself which people use to often enhance their beauty and express themselves. Putting on makeup is a tedious task, so who would not love to see themselves in makeup before physically apply...
详细信息
ISBN:
(纸本)9781450366151
Cosmetic makeup is an art in itself which people use to often enhance their beauty and express themselves. Putting on makeup is a tedious task, so who would not love to see themselves in makeup before physically applying it? In this work, we demonstrate the transfer of makeup from a reference makeup image to some subject image. Our technique involves generation and use of 3D face models of these images. Some nice features of our pipeline include: (1) process adapts to the lighting conditions of the subject image. (2) after makeup on subject image, it is possible to view it in different indoor and outdoor lighting conditions. (3) accessories can be added realistically with proper lighting so that they naturally fit in the subject's *** show that these also make our makeup look very realistic and improved compared to many existing *** main advantage of our novel pipeline is the requirement of only the subject and reference images as input. This is unlike other techniques using 3D models which require extensive data collection.
In this work, we address the problem of dynamic gesture recognition using a pose based video descriptor. The proposed approach takes as input video frames and extracts pose-specific image regions which are further pro...
详细信息
ISBN:
(纸本)9781450366151
In this work, we address the problem of dynamic gesture recognition using a pose based video descriptor. The proposed approach takes as input video frames and extracts pose-specific image regions which are further processed by a pre-trained Convolutional Neural Network (CNN) to derive a pose-based descriptor for each frame. A Long Short Term Memory (LSTM) network is trained from scratch for dynamic gesture classification by learning long-term spatiotemporal relations among features. We also demonstrate that only using video data (RGB frames and optical flow) one can design an effective model for recognizing dynamic gestures. We utilize ChaLearn multi-modal gesture challenge dataset [13] and Cambridge hand gesture dataset [18] for evaluation of the proposed algorithm achieving an accuracy of 91.27% and 96% respectively using only RGB data.
Object detection is one of the most popular and difficult field in computervision. Although deep learning methods have great performance on object detection. For specific application, algorithms which use hand-crafte...
详细信息
ISBN:
(数字)9781510628298
ISBN:
(纸本)9781510628298
Object detection is one of the most popular and difficult field in computervision. Although deep learning methods have great performance on object detection. For specific application, algorithms which use hand-crafted features are still widely used. One main problem in object detection is the scale problem. Algorithms usually use image pyramid to cover as many scales as possible. But gaps still exist between scale levels in image pyramid. Our work extends some sub scale level to fill the gaps between image pyramids. To this end, we use Gaussian Scales Pyramid to generate sub-scale image and extract HOG feature on the sub-scale. We use framework offered by DPM algorithm and make modification on it. We compare the result of our method with DPM baseline on Pascal VOC database. Our work has great performance on some categories and makes an improvement on the overall performance. This work can be used in other object detection frameworks. We apply multi-scale HOG feature on pre-process procedure of our own detection framework and test it on our own dataset. Then the framework gains performance improvement on precision and recall rate of the pre-process procedure comparing to the original HOG feature architecture.
Egocentric activity recognition (EAR) is an emerging area in the field of computervision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal ...
详细信息
ISBN:
(纸本)9781450366151
Egocentric activity recognition (EAR) is an emerging area in the field of computervision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal egocentric activity recognition using visual (RGB videos) and sensor stream (accelerometer, gyroscope, etc.). In order to effectively capture the spatio-temporal information contained in RGB videos, two types of modalities are extracted from visual data: Approximate Dynamic image (ADI) and Stacked Difference image (SDI). These image-based representations are generated both at clip level as well as entire video level, and are then utilized to finetune a pretrained 2D-CNN called MobileNet, which is specifically designed for mobile vision applications. Similarly for sensor data, each training sample is divided into three segments, and a deep 1D-CNN network is trained (corresponding to each type of sensor stream) from scratch. During testing, the softmax scores of all the streams (visual + sensor) are combined by late fusion. The experiments performed on multimodal egocentric activity dataset demonstrates that our proposed approach can achieve state-of-the-art results, outperforming the current best handcrafted and deep learning based techniques.
The proceedings contain 14 papers. The special focus in this conference is on Document Analysis and Recognition. The topics include: Word-Wise Handwriting Based Gender Identification Using Multi-Gabor Response Fusion;...
ISBN:
(纸本)9789811393600
The proceedings contain 14 papers. The special focus in this conference is on Document Analysis and Recognition. The topics include: Word-Wise Handwriting Based Gender Identification Using Multi-Gabor Response Fusion;a Secure and Light Weight User Authentication System Based on Online Signature Verification for Resource Constrained Mobile Networks;benchmark Datasets for Offline Handwritten Gurmukhi Script Recognition;benchmark Dataset: Offline Handwritten Gurmukhi City Names for Postal Automation;attributed Paths for Layout-Based Document Retrieval;textual Content Retrieval from Filled-in Form images;A Study on the Effect of CNN-Based Transfer Learning on Handwritten Indic and Mixed Numeral Recognition;symbol Spotting in Offline Handwritten Mathematical Expressions;online Handwritten Bangla Character Recognition Using Frechet Distance and Distance Based Features;an Efficient Multi Lingual Optical Character Recognition System for indian Languages Through Use of Bharati Script;telugu Word Segmentation Using Fringe Maps;an Efficient Character Segmentation Algorithm for Connected Handwritten Documents.
We present a reduced model based on position based dynamics for real-time simulation of human musculature. We demonstrate our methods on the muscles of the human arm. Co-simulation of all the muscles of the human arm ...
详细信息
ISBN:
(纸本)9781450366151
We present a reduced model based on position based dynamics for real-time simulation of human musculature. We demonstrate our methods on the muscles of the human arm. Co-simulation of all the muscles of the human arm allow us to accurately track the development of stresses and strains in the muscles, when the arm is moved. We evaluate our method for accuracy by comparing it with gold standard simulation models based on finite volume methods, and demonstrate the stability of the method under flexion, extension and torsion.
Describing the contents of an image automatically has been a fundamental problem in the field of artificial intelligence and computervision. Existing approaches are either top-down, which start from a simple represen...
ISBN:
(纸本)9781450366151
Describing the contents of an image automatically has been a fundamental problem in the field of artificial intelligence and computervision. Existing approaches are either top-down, which start from a simple representation of an image and convert it into a textual description; or bottom-up, which come up with attributes describing numerous aspects of an image to form the caption or a combination of both. Recurrent neural networks (RNN) enhanced by Long Short-Term Memory networks (LSTM) have become a dominant component of several frameworks designed for solving the image captioning task. Despite their ability to reduce the vanishing gradient problem, and capture dependencies, they are inherently sequential across time. In this work, we propose two novel approaches, a top-down and a bottom-up approach independently, which dispenses the recurrence entirely by incorporating the use of a Transformer, a network architecture for generating sequences relying entirely on the mechanism of attention. Adaptive positional encodings for the spatial locations in an image and a new regularization cost during training is introduced. The ability of our model to focus on salient regions in the image automatically is demonstrated visually. Experimental evaluation of the proposed architecture on the MS-COCO dataset is performed to exhibit the superiority of our method.
Face Recognition (FR) under adversarial conditions has been a big challenge for researchers in the computervision and Machine Learning communities in the recent past. Most of state-of-the-art face recognition systems...
ISBN:
(纸本)9781450366151
Face Recognition (FR) under adversarial conditions has been a big challenge for researchers in the computervision and Machine Learning communities in the recent past. Most of state-of-the-art face recognition systems have been designed to overcome degradations in a face due to variations in pose, illumination, contrast, resolution, along with blur. However, interestingly none have addressed the fascinating issue of makeup as a spoof attack, which drastically changes the appearance of a face, making it difficult for even humans to detect and identify the impostor. In this paper, we propose a novel multi-component deep convolutional neural network (CNN) based architecture which performs the complex task of makeup removal from a disguised face, to reveal the original mugshot image of the impostor (i.e. without makeup). The proposed network also performs the hard tasks of FR on a disguised face in addition to recognition of identity and generation of the face of the spoofed target, by minimizing a novel multi-component objective function. Comparison of performance with a few recent state-of-the-art methods of FR over three benchmark datasets reveals the superiority of our proposed method for both synthesis as well as recognition (FR) tasks.
Person re-identification (ReID) is an important problem in computervision, especially for video surveillance applications. The problem focuses on identifying people across different cameras or across different frames...
ISBN:
(纸本)9781450366151
Person re-identification (ReID) is an important problem in computervision, especially for video surveillance applications. The problem focuses on identifying people across different cameras or across different frames of same camera. The main challenge lies in identifying similarity of the same person against large appearance and structure variations, while differentiating between individuals. Recently, deep learning networks with triplet loss has become a common framework for person ReID. However, triplet loss focuses on obtaining correct orders on the training set. We demonstrate that it performs inferior in a clustering task. In this paper, we design a cluster loss, which can lead to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss. As a result, our model has a better generalisation ability and can achieve a higher accuracy on the test set especially for a clustering task. We also introduce a batch hard training mechanism for improving the results and faster convergence of training.
暂无评论