We consider the license plate re-identification task, treated here as a one-shot image retrieval problem. Our objective is to learn a feature representation for license plate images, such that a single training image ...
详细信息
ISBN:
(纸本)9781450366151
We consider the license plate re-identification task, treated here as a one-shot image retrieval problem. Our objective is to learn a feature representation for license plate images, such that a single training image of a given license plate (referred to as a template image) is sufficient to perform nearest-neighbour retrieval with high accuracy at test time. Also, the feature representation should ideally be generalisable across datasets and should be extractable in real-time on resource-constrained embedded hardware or a moderately powerful cellphone. We evaluate representations from person re-identification (re-id) literature, learned from a trained deep convolutional network as well with those derived from a trained Fisher vector. While the convolutional network features perform better than the Fisher vector, we obtain comparable results from a hybrid model projecting the Fisher vector into a lower-dimensional space via two fully connected layers called f2nn using the triplet loss. The proposed hybrid model f2nn generates features which outperform and generalise better than convolutional features on datasets dissimilar to the training corpus. The model can be trained in stages and takes significantly less time to extract features. Further, it uses much smaller feature dimensions for license plate images resulting in faster re-identification, and is therefore well-suited for resource-constrained platforms such as mobile devices.
As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skatin...
详细信息
ISBN:
(纸本)9781450366151
As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions with the help of some imperfect performances. The efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers.
Conventional convolutional neural networks (CNN) are trained on large domain datasets and are hence typically over-represented and inefficient in limited class applications. An efficient way to convert such large many...
详细信息
ISBN:
(纸本)9781450366151
Conventional convolutional neural networks (CNN) are trained on large domain datasets and are hence typically over-represented and inefficient in limited class applications. An efficient way to convert such large many-class pre-trained networks into small few-class networks is through a hierarchical decomposition of its feature maps. To alleviate this issue, we propose an automated framework for such decomposition in Hierarchically Self Decomposing CNN (HSD-CNN), in four steps. HSD-CNN is derived automatically using a class-specific filter sensitivity analysis that quantifies the impact of specific features on a class prediction. The decomposed hierarchical network can be utilized and deployed directly to obtain sub-networks for a subset of classes, and it is shown to perform better without the requirement of retraining these sub-networks. Experimental results show that HSD-CNN generally does not degrade accuracy if the full set of classes is used. Interestingly, when operating on known subsets of classes, HSD-CNN has an improvement in accuracy with a much smaller model size requiring much fewer operations. HSD-CNN flow is verified on the CIFAR10, CIFAR100 and CALTECH101 datasets. We report accuracies up to 85.6% ( 94.75%) on scenarios with 13 ( 4) classes of CIFAR100, using a pre-trained VGG-16 network on the full dataset. In this case, the proposed HSD-CNN requires 3.97x fewer parameters and has 71.22% savings in operations, in comparison to baseline VGG-16 containing features for all 100 classes.
Distance Metric Learning (DML) has been successfully applied in a variety of computervision and imageprocessing tasks. Laplacian Regularized Metric Learning (LRML) computes a distance metric by satisfying given sets...
详细信息
This book constitutes the refereed proceedings of the 6th National conference on computervision, Pattern Recognition, imageprocessing, and graphics, NCVPRIPG 2017, held in Mandi, India, in December 2017. The 48 revi...
ISBN:
(数字)9789811300202
ISBN:
(纸本)9789811300196
This book constitutes the refereed proceedings of the 6th National conference on computervision, Pattern Recognition, imageprocessing, and graphics, NCVPRIPG 2017, held in Mandi, India, in December 2017. The 48 revised full papers presented in this volume were carefully reviewed and selected from 147 submissions. The papers are organized in topical sections on video processing; image and signal processing; segmentation, retrieval, captioning; pattern recognition applications.
Saliency plays a key role in various computervision tasks. Extracting salient regions from images and videos have been a well established problem of computervision. While segmenting salient objects from images depen...
详细信息
We propose two novel approaches to classify indian monuments according to their distinct architectural styles. While the historical significance of most indian monuments is well documented, the details of their archit...
详细信息
The rapid development in face detection study has been greatly supported by the availability of large image datasets, which provide detailed annotations of faces on images. However, among a number of publicly accessib...
详细信息
Drone systems have been deployed by various law enforcement agencies to monitor hostiles, spy on foreign drug cartels, conduct border control operations, etc. This paper introduces a real-time drone surveillance syste...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Drone systems have been deployed by various law enforcement agencies to monitor hostiles, spy on foreign drug cartels, conduct border control operations, etc. This paper introduces a real-time drone surveillance system to identify violent individuals in public areas. The system first uses the Feature Pyramid Network to detect humans from aerial images. The image region with the human is used by the proposed ScatterNet Hybrid Deep Learning (SHDL) network for human pose estimation. The orientations between the limbs of the estimated pose are next used to identify the violent individuals. The proposed deep network can learn meaningful representations quickly using ScatterNet and structural priors with relatively fewer labeled examples. The system detects the violent individuals in real-time by processing the drone images in the cloud. This research also introduces the aerial violent individual dataset used for training the deep network which hopefully may encourage researchers interested in using deep learning for aerial surveillance. The pose estimation and violent individuals identification performance is compared with the state-of-the-art techniques.
暂无评论