We present a deep network based hierarchical framework to recognize activities performed by people collectively from videos at various levels of granularity - individual person, group and overall (or scene level). Ind...
详细信息
ISBN:
(纸本)9781450366151
We present a deep network based hierarchical framework to recognize activities performed by people collectively from videos at various levels of granularity - individual person, group and overall (or scene level). Individual person analysis, which includes person detection, tracking, pose estimation and individual activity recognition, has been studied extensively. Most of the existing work on collective activity recognition has focused on overall scene activity estimation. However, in scenarios where multiple groups perform different group activities, overall scene activity recognition in isolation paints an incomplete picture of various activities. Identifying groups and recognizing their activities is therefore important to understand a scene in it's completeness. To this end, we add an extra layer in existing methods that finds the groups (or clusters) of people present in a scene and their activities. We then utilize these group activities along withthe scene context to recognize the scene activity. To discover these groups, we propose a min-max criteria within the framework to train a sub-network which learns pairwise similarity between any two individuals, used by a clustering algorithm for identification of groups. the group activity is captured by an LSTM module whereas the individual and scene activities are captured by CNN-LSTM based modules. these modules along withthe grouping layer form the proposed network. We evaluate the network on a publicly available dataset to indicate the usefulness of our approach.
the prohibitive amounts of time required to review the large amounts of data captured by surveillance and other cameras has brought into question the very utility of large scale video logging. Yet, one recognizes that...
详细信息
We present a modified Temporal Conditional Random Fields framework for modeling and predicting object motion. To facilitate such a powerful graphical model with prediction and come up with a CRF-based predictor, we pr...
详细信息
In this paper, balanced two-stage residual networks (BT SRN) are proposed for single image super-resolution. the deep residual design with constrained depth achieves the optimal balance between the accuracy and the sp...
详细信息
ISBN:
(纸本)9781538607336
In this paper, balanced two-stage residual networks (BT SRN) are proposed for single image super-resolution. the deep residual design with constrained depth achieves the optimal balance between the accuracy and the speed for super-resolving images. the experiments show that the balanced two-stage structure, together with our lightweight two-layer PConv residual block design, achieves very promising results when considering both accuracy and speed. We evaluated our models on the New Trends in image Restoration and Enhancement workshop and challenge on image super-resolution (NTIRE SR 2017). Our final model with only 10 residual blocks ranked among the best ones in terms of not only accuracy (6th among 20 final teams) but also speed (2nd among top 6 teams in terms of accuracy).
In this work, speed vs accuracy of different Neural Network architectures using alternate feature extractors in the field of Object Detection is being computed, thereby finding the fastest and most accurate architectu...
详细信息
ISBN:
(纸本)9781728113807
In this work, speed vs accuracy of different Neural Network architectures using alternate feature extractors in the field of Object Detection is being computed, thereby finding the fastest and most accurate architecture out of the lot in order to carry out Object Detection. We made use of three architectures and three extractors to build different combinations of models in order to compute mAP, which is the metric used or commenting upon accuracy. COCO data-set has been used to extract sample images and the work is implemented on TensorFlow library.
Transmitting the face image data through wireless fading channels have been widely used for face recognition and automatic surveillance applications and many techniques can be used to do that. However, due to the nois...
详细信息
the proceedings contain 92 papers. the topics discussed include: GPU supported patch-based tessellation for dual subdivision;ball robot and the graphics generation and its image calculation based on cad geometric mode...
ISBN:
(纸本)9780769537894
the proceedings contain 92 papers. the topics discussed include: GPU supported patch-based tessellation for dual subdivision;ball robot and the graphics generation and its image calculation based on cad geometric model;particle importance based fluid simulation;facial expression representation using a quadratic deformation model;implementation of ant colony algorithm based on GPU;a fast method for real-time computation of approximated global illumination;expansion of communication in media art through the intelligent interaction;a motion retargeting method for topologically different characters;real-time simulation of large area nearshore wave for marine simulator;a preliminary study of human motion based on actor physiques using motion capture;providing novel and useful data for game development using usability expert evaluation and testing;digital video watermarking in the discrete wavelet transform domain;and style-sheets extraction from existing digital contents by imageprocessing for web-based BML contents management system.
Segmenting foreground object from a video is a challenging task because of large deformations of objects, occlusions, and background clutter. In this paper, we propose a frame-by-frame but computationally efficient ap...
详细信息
As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skatin...
详细信息
ISBN:
(纸本)9781450366151
As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions withthe help of some imperfect performances. the efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers.
In this paper we propose a publicly available static hand pose database called OUHANDS and protocols for training and evaluating hand pose classification and hand detection methods. A comparison between the OUHANDS da...
详细信息
ISBN:
(纸本)9781467389105
In this paper we propose a publicly available static hand pose database called OUHANDS and protocols for training and evaluating hand pose classification and hand detection methods. A comparison between the OUHANDS database and existing databases is given. Baseline results for both of the protocols are presented.
暂无评论