Online garment shopping has gained many customers in recent years. Describing a dress using keywords does not always yield the proper results, which in turn leads to dissatisfaction of customers. A visual search based...
详细信息
In this paper, a robust image hashing framework is presented using discrete cosine transformation and singular value decomposition. Firstly, the input image is normalized using geometric moment and normalized coeffici...
详细信息
ISBN:
(纸本)9781450366151
In this paper, a robust image hashing framework is presented using discrete cosine transformation and singular value decomposition. Firstly, the input image is normalized using geometric moment and normalized coefficients are divided into non-overlapping blocks. the selected blocks based on a peace-wise non-linear chaotic map are transformed using discrete cosine transom followed by singular value decomposition. then a feature matrix is constructed in reliance on Hessian matrix and the final hash values are obtained. the proposed hashing system is resilient to different content-preserving image distortions such as geometric and filtering operations. the simulated results demonstrate the efficiency proposed framework in terms of security and robustness.
We present a reduced model based on position based dynamics for real-time simulation of human musculature. We demonstrate our methods on the muscles of the human arm. Co-simulation of all the muscles of the human arm ...
详细信息
ISBN:
(纸本)9781450366151
We present a reduced model based on position based dynamics for real-time simulation of human musculature. We demonstrate our methods on the muscles of the human arm. Co-simulation of all the muscles of the human arm allow us to accurately track the development of stresses and strains in the muscles, when the arm is moved. We evaluate our method for accuracy by comparing it with gold standard simulation models based on finite volume methods, and demonstrate the stability of the method under flexion, extension and torsion.
Cosmetic makeup is an art in itself which people use to often enhance their beauty and express themselves. Putting on makeup is a tedious task, so who would not love to see themselves in makeup before physically apply...
详细信息
ISBN:
(纸本)9781450366151
Cosmetic makeup is an art in itself which people use to often enhance their beauty and express themselves. Putting on makeup is a tedious task, so who would not love to see themselves in makeup before physically applying it? In this work, we demonstrate the transfer of makeup from a reference makeup image to some subject image. Our technique involves generation and use of 3D face models of these images. Some nice features of our pipeline include: (1) process adapts to the lighting conditions of the subject image. (2) after makeup on subject image, it is possible to view it in different indoor and outdoor lighting conditions. (3) accessories can be added realistically with proper lighting so that they naturally fit in the subject's *** show that these also make our makeup look very realistic and improved compared to many existing *** main advantage of our novel pipeline is the requirement of only the subject and reference images as input. this is unlike other techniques using 3D models which require extensive data collection.
In this work, we address the problem of dynamic gesture recognition using a pose based video descriptor. the proposed approach takes as input video frames and extracts pose-specific image regions which are further pro...
详细信息
ISBN:
(纸本)9781450366151
In this work, we address the problem of dynamic gesture recognition using a pose based video descriptor. the proposed approach takes as input video frames and extracts pose-specific image regions which are further processed by a pre-trained Convolutional Neural Network (CNN) to derive a pose-based descriptor for each frame. A Long Short Term Memory (LSTM) network is trained from scratch for dynamic gesture classification by learning long-term spatiotemporal relations among features. We also demonstrate that only using video data (RGB frames and optical flow) one can design an effective model for recognizing dynamic gestures. We utilize ChaLearn multi-modal gesture challenge dataset [13] and Cambridge hand gesture dataset [18] for evaluation of the proposed algorithm achieving an accuracy of 91.27% and 96% respectively using only RGB data.
In this work, we propose a computationally efficient compressive sensing based approach for very low bit rate lossy coding of hyperspectral (HS) image data by exploiting the redundancy inherent in this imaging modalit...
ISBN:
(纸本)9781450366151
In this work, we propose a computationally efficient compressive sensing based approach for very low bit rate lossy coding of hyperspectral (HS) image data by exploiting the redundancy inherent in this imaging modality. We divide the HS datacube into subsets of adjacent bands, each of which is encoded into a coded snapshot using a random code matrix. these coded snapshot images are encoded using the wavelet-based SPIHT compression technique. the decompression from the coded snapshots at the receiver is done using the orthogonal matching pursuit withthe help of an overcomplete dictionary learned on a general purpose training dataset. We provide ample experimental results and performance comparisons to substantiate the usefulness of the proposed method. In the proposed technique the encoder is free from any decoder and it offers a significant saving in computation and yet yields a much higher compression quality.
Egocentric activity recognition (EAR) is an emerging area in the field of computervision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal ...
详细信息
ISBN:
(纸本)9781450366151
Egocentric activity recognition (EAR) is an emerging area in the field of computervision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal egocentric activity recognition using visual (RGB videos) and sensor stream (accelerometer, gyroscope, etc.). In order to effectively capture the spatio-temporal information contained in RGB videos, two types of modalities are extracted from visual data: Approximate Dynamic image (ADI) and Stacked Difference image (SDI). these image-based representations are generated both at clip level as well as entire video level, and are then utilized to finetune a pretrained 2D-CNN called MobileNet, which is specifically designed for mobile vision applications. Similarly for sensor data, each training sample is divided into three segments, and a deep 1D-CNN network is trained (corresponding to each type of sensor stream) from scratch. During testing, the softmax scores of all the streams (visual + sensor) are combined by late fusion. the experiments performed on multimodal egocentric activity dataset demonstrates that our proposed approach can achieve state-of-the-art results, outperforming the current best handcrafted and deep learning based techniques.
Describing the contents of an image automatically has been a fundamental problem in the field of artificial intelligence and computervision. Existing approaches are either top-down, which start from a simple represen...
ISBN:
(纸本)9781450366151
Describing the contents of an image automatically has been a fundamental problem in the field of artificial intelligence and computervision. Existing approaches are either top-down, which start from a simple representation of an image and convert it into a textual description; or bottom-up, which come up with attributes describing numerous aspects of an image to form the caption or a combination of both. Recurrent neural networks (RNN) enhanced by Long Short-Term Memory networks (LSTM) have become a dominant component of several frameworks designed for solving the image captioning task. Despite their ability to reduce the vanishing gradient problem, and capture dependencies, they are inherently sequential across time. In this work, we propose two novel approaches, a top-down and a bottom-up approach independently, which dispenses the recurrence entirely by incorporating the use of a Transformer, a network architecture for generating sequences relying entirely on the mechanism of attention. Adaptive positional encodings for the spatial locations in an image and a new regularization cost during training is introduced. the ability of our model to focus on salient regions in the image automatically is demonstrated visually. Experimental evaluation of the proposed architecture on the MS-COCO dataset is performed to exhibit the superiority of our method.
Person re-identification (ReID) is an important problem in computervision, especially for video surveillance applications. the problem focuses on identifying people across different cameras or across different frames...
ISBN:
(纸本)9781450366151
Person re-identification (ReID) is an important problem in computervision, especially for video surveillance applications. the problem focuses on identifying people across different cameras or across different frames of same camera. the main challenge lies in identifying similarity of the same person against large appearance and structure variations, while differentiating between individuals. Recently, deep learning networks with triplet loss has become a common framework for person ReID. However, triplet loss focuses on obtaining correct orders on the training set. We demonstrate that it performs inferior in a clustering task. In this paper, we design a cluster loss, which can lead to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss. As a result, our model has a better generalisation ability and can achieve a higher accuracy on the test set especially for a clustering task. We also introduce a batch hard training mechanism for improving the results and faster convergence of training.
In practice, images can contain different amounts of noise for different color channels, which is not acknowledged by existing super-resolution approaches. In this paper, we propose to super-resolve noisy color images...
详细信息
ISBN:
(纸本)9781450366151
In practice, images can contain different amounts of noise for different color channels, which is not acknowledged by existing super-resolution approaches. In this paper, we propose to super-resolve noisy color images by considering the color channels jointly. Noise statistics are blindly estimated from the input low-resolution image and are used to assign different weights to different color channels in the data cost. Implicit low-rank structure of visual data is enforced via nuclear norm minimization in association with adaptive weights, which is added as a regularization term to the cost. Additionally, multi-scale details of the image are added to the model through another regularization term that involves projection onto PCA basis, which is constructed using similar patches extracted across different scales of the input image. the results demonstrate the super-resolving capability of the approach in real scenarios.
暂无评论