As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skatin...
详细信息
ISBN:
(纸本)9781450366151
As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions withthe help of some imperfect performances. the efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers.
Anomaly detection is a major task in crowd management through video surveillance. It refers to the events which are deviated from normal events. We have introduced an unsupervised method to detect motion anomaly in su...
详细信息
ISBN:
(纸本)9781450366151
Anomaly detection is a major task in crowd management through video surveillance. It refers to the events which are deviated from normal events. We have introduced an unsupervised method to detect motion anomaly in surveillance video. In this work we have considered only optical flow as feature. First, for each frame we compute magnitude of optical flow of motion using flownet2 [10]. then, mean of magnitude of flow due to regular normal motion (given in training data) is computed at each pixel where such motion exists in the training video frames. Our strategy is to compare the motion under consideration against this mean flow magnitude, and we expect that the anomalous motion would differ significantly from normal motion. An autoencoder type network is trained to detect this anomaly. Training data patch is constructed by Interleaving the columns of mean optical flow patch and the corresponding flow patch from each frame. this interleaving is done to incorporate context dependency. the autoencoder is trained to minimize mean-square reconstruction error between input column wise interleaved patch and output (i.e., reconstructed patch) of the autoencoder. During testing, a patch is declared anomalous if the reconstruction error is high compared to the training error. Experiments have been carried out on UCSD and UMN dataset and are compared with other methods. Our method gives comparable results with other state-of-the-art methods.
We present a deep network based hierarchical framework to recognize activities performed by people collectively from videos at various levels of granularity - individual person, group and overall (or scene level). Ind...
ISBN:
(纸本)9781450366151
We present a deep network based hierarchical framework to recognize activities performed by people collectively from videos at various levels of granularity - individual person, group and overall (or scene level). Individual person analysis, which includes person detection, tracking, pose estimation and individual activity recognition, has been studied extensively. Most of the existing work on collective activity recognition has focused on overall scene activity estimation. However, in scenarios where multiple groups perform different group activities, overall scene activity recognition in isolation paints an incomplete picture of various activities. Identifying groups and recognizing their activities is therefore important to understand a scene in it's completeness. To this end, we add an extra layer in existing methods that finds the groups (or clusters) of people present in a scene and their activities. We then utilize these group activities along withthe scene context to recognize the scene activity. To discover these groups, we propose a min-max criteria within the framework to train a sub-network which learns pairwise similarity between any two individuals, used by a clustering algorithm for identification of groups. the group activity is captured by an LSTM module whereas the individual and scene activities are captured by CNN-LSTM based modules. these modules along withthe grouping layer form the proposed network. We evaluate the network on a publicly available dataset to indicate the usefulness of our approach.
Human activities are predominantly spatio-temporal, involving spatial changes over time. Qualitative spatial relations between interacting entities are often used to describe spatial change. To derive such qualitative...
ISBN:
(纸本)9781450366151
Human activities are predominantly spatio-temporal, involving spatial changes over time. Qualitative spatial relations between interacting entities are often used to describe spatial change. To derive such qualitative spatial relations, the interacting entities are approximated as a single bounding box or set of bounding boxes. A set of bounding boxes abstracting a single entity has been termed as an extended object; where each box is bounding a component. Extended object abstraction of spatial entities has been shown to be more effective for representation of human activities [10]. the temporal aspect of an activity is characterized through changing spatial relations between components of interacting extended objects over time. In this paper, we propose Temporal Activity Graph (TAG) based representation model to keep track of the sequences of relations between components of the extended objects. A kernel is designed for classification of spatio-temporal interactions in the TAG based model. the TAG kernel uses concepts of label sequence similarity and interestingness to compute similarity of a pair of TAGs. the TAG kernel is a generic solution that can be used with any kernel based method. Here, the kernel is used within a Support Vector Machine classifier. the TAG kernel based classification of activities is found on par withthe state-of-the-art approaches for experiments performed on the Mind's Eye, the UT Interaction, and the SBU Kinect Interaction datasets.
Facial expressions have essential cues to infer the humans state of mind, that conveys adequate information to understand individuals' actual feelings. thus, automatic facial expression recognition is an interesti...
详细信息
ISBN:
(纸本)9781450366151
Facial expressions have essential cues to infer the humans state of mind, that conveys adequate information to understand individuals' actual feelings. thus, automatic facial expression recognition is an interesting and crucial task to interpret the humans cognitive state through the machine. In this paper, we proposed an Exigent Features Preservative Network (EXPERTNet), to describe the features of the facial expressions. the EXPERTNet extracts only pertinent features and neglect others by using exigent feature (ExFeat) block, mainly comprises of elective layer. Specifically, elective layer selects the desired edge variation features from the previous layer outcomes, which are generated by applying different sized filters as 1 x 1, 3 x 3, 5 x 5 and 7 x 7. Different sized filters aid to elicits both micro and high-level features that enhance the learnability of neurons. ExFeat block preserves the spatial structural information of the facial expression, which allows to discriminate between different classes of facial expressions. Visual representation of the proposed method over different facial expressions shows the learning capability of the neurons of different layers. Experimental and comparative analysis results over four comprehensive datasets: CK+, MMI DISFA and GEMEP-FERA, ensures the better performance of the proposed network as compared to existing networks.
Conventional convolutional neural networks (CNN) are trained on large domain datasets and are hence typically over-represented and inefficient in limited class applications. An efficient way to convert such large many...
详细信息
ISBN:
(纸本)9781450366151
Conventional convolutional neural networks (CNN) are trained on large domain datasets and are hence typically over-represented and inefficient in limited class applications. An efficient way to convert such large many-class pre-trained networks into small few-class networks is through a hierarchical decomposition of its feature maps. To alleviate this issue, we propose an automated framework for such decomposition in Hierarchically Self Decomposing CNN (HSD-CNN), in four steps. HSD-CNN is derived automatically using a class-specific filter sensitivity analysis that quantifies the impact of specific features on a class prediction. the decomposed hierarchical network can be utilized and deployed directly to obtain sub-networks for a subset of classes, and it is shown to perform better without the requirement of retraining these sub-networks. Experimental results show that HSD-CNN generally does not degrade accuracy if the full set of classes is used. Interestingly, when operating on known subsets of classes, HSD-CNN has an improvement in accuracy with a much smaller model size requiring much fewer operations. HSD-CNN flow is verified on the CIFAR10, CIFAR100 and CALTECH101 datasets. We report accuracies up to 85.6% (94.75%) on scenarios with 13 (4) classes of CIFAR100, using a pre-trained VGG-16 network on the full dataset. In this case, the proposed HSD-CNN requires 3.97x fewer parameters and has 71.22% savings in operations, in comparison to baseline VGG-16 containing features for all 100 classes.
In this paper, we propose a novel, real-time dynamic hand gesture recognition framework using convolutional neural network with depth and RGB data fusion. Hand gestures are a natural form of communication between huma...
ISBN:
(纸本)9781450366151
In this paper, we propose a novel, real-time dynamic hand gesture recognition framework using convolutional neural network with depth and RGB data fusion. Hand gestures are a natural form of communication between humans as well as between human and machine. they also find important applications in areas such as sign language recognition, man-machine interaction and behavior understanding. Natural hand gestures are complex hand movements in space and time and are challenging to recognize. In our proposed framework, we use both RGB and depth data to automatically recognize dynamic hand gestures. Initially, we work with RGB and depth data separately. We find the motion history of the gesture performed with RGB data and independently with depth data to store the motion information of the moving hands. Motion history of the performed gesture stores the rich information of the movement. then, we use transfer learning on two separate VGG16 networks, where one network is fine-tuned using RGB motion history while the other network is fine-tuned using depth motion history, to configure them for dynamic hand gesture recognition problem. then, using the two fine-tunned VGG16 networks, we extract the features of boththe motion history images obtained from RGB and depth data separately, for each dynamic hand gesture. We then, integrate the features obtained from boththe networks using weighted summation, to accurately and robustly recognize the dynamic hand gesture. We perform experiments on standard and the publicly available dynamic hand gesture datasets and show that our method outperforms state of the art methods.
this paper presents an online handwritten benchmark dataset (OHWR-Gurmukhi) for Gurmukhi script. TIET, Patiala released the unconstrained online handwriting databases, OHWR-GNumerals and OHWR-GScript, which contain is...
详细信息
暂无评论