In this paper, a real time multi-view human activity recognition model using a RGB-D (Red Green Blue-Depth) sensor is proposed. The method receives as input RGB-D data streams in real time from a Kinect for Windows V2...
详细信息
In this paper, a real time multi-view human activity recognition model using a RGB-D (Red Green Blue-Depth) sensor is proposed. The method receives as input RGB-D data streams in real time from a Kinect for Windows V2 sensor. Initially, a skeleton-tracking algorithm is applied which gives 3D joint information of 25 unique joints. The presented approach uses a weighted version of the Fast Dynamic Time Warping that weighs the importance of each skeleton joint towards the Dynamic Time Warping (DTW) similarity cost. To recognize multi-view human activities, the weighted Dynamic Time Warping warps a time sequence of joint positions to reference time sequences and produces a similarity value. Experimental results demonstrate that the proposed method is robust, flexible and efficient with respect to multiple views activity recognition, scale and phase variations activities at different realistic scenes.
Attribute-based facial image retrieval has wide range of applications, such as in law enforcement, online social networks, etc. The problem becomes more challenging if the images are from different modalities. For exa...
详细信息
Attribute-based facial image retrieval has wide range of applications, such as in law enforcement, online social networks, etc. The problem becomes more challenging if the images are from different modalities. For example, the input is a sketch or a composite image, and the task is to retrieve photo images which have the same facial attributes as the input data. In this work, we propose a learning-based approach, in which two transformations are learnt to transform the training images from the two modalities with associated attribute annotations such that images which have similar attributes move closer to each other, and images with very different attributes move farther from each other in the transformed space. Given a query image, it is first transformed to the learnt space in which the images with similar attributes are retrieved. The same framework works seamlessly if the images to be retrieved are of same or different modality as compared to the query data. The attributes of the query image are also automatically obtained as a byproduct of the algorithm. Extensive experimental evaluation on three datasets shows the effectiveness of the proposed approach.
The popular techniques to eliminate temporal redundancy in video sequences are Motion Estimation and Motion Compensation. These techniques have also been used in popular H.264, MPEG-2 and MPEG-4 video coding standards...
详细信息
The popular techniques to eliminate temporal redundancy in video sequences are Motion Estimation and Motion Compensation. These techniques have also been used in popular H.264, MPEG-2 and MPEG-4 video coding standards. Conventional fast Block Matching Algorithms (BMA) perform exhaustive search between the current and the reference frame. Although BMA technique gives the exact result but it is computationally very expensive. Another drawback of this method is that it easily gets trapped into the local minima which eventually lead to degradation of the video quality. The proposed Motion Estimation Technique exploits the fact that the human eyes are incapable of detecting different frames when they are run at particular frame rate. The experimental results on various video sequences demonstrate that the proposed technique has outperformed all the existing conventional motion estimation techniques.
This paper presents an optimized and efficient video stabilization technique based on projection curve warping. In most of the recorded videos, the relative displacement between two consecutive frames goes from 3-4 pi...
详细信息
This paper presents an optimized and efficient video stabilization technique based on projection curve warping. In most of the recorded videos, the relative displacement between two consecutive frames goes from 3-4 pixel for hand-held and 25-30 for moving platform applications. Based on this experimental data, the use of Sakoe-Chiba band with fixed window size has been proposed for constraining distance matrix estimation, in the dynamic time warping algorithm. In the existing projection based stabilization techniques, intensity values are matched for motion estimation. Any change in the local intensity values either induced due to intensity variation, moving objects or scene variation, causes error in the estimated motion. To overcome this problem, a higher level feature i.e. shape of the projection curve has been incorporated by matching the local derivative of curve instead of the intensity values itself. Robustness and time efficiency of the proposed technique is measured in terms of interframe transformation fidelity and processing time respectively.
This paper helps to explore the intricate task of natural and precise image grouping for effective organization and retrieval. High precision in image classification is challenging due to the complexity of images and ...
详细信息
A script independent, font-size independent scheme is proposed for detecting bold words in printed pages. In OCR applications such as minor modifications of an existing printed form, it is desirable to reproduce the f...
详细信息
A script independent, font-size independent scheme is proposed for detecting bold words in printed pages. In OCR applications such as minor modifications of an existing printed form, it is desirable to reproduce the font size and characteristics such as bold, and italics in the OCR recognized document. In this morphological opening based detection of bold (MOBDoB) method, the binarized image is segmented into sub-images with uniform font sizes, using the word height information. Rough estimation of the stroke widths of characters in each sub-image is obtained from the density. Each sub-image is then opened with a square structuring element of size determined by the respective stroke width. The union of all the opened sub-images is used to determine the locations of the bold words. Extracting all such words from the binarized image gives the final image. A minimum of 98 % of bold words were detected from a total of 65 Tamil, Kannada and English pages and the false alarm rate is less than 0.4 %.
The video coding standard H.264 uses Context-based Adaptive Variable Length Coding (CAVLC) as one of its entropy encoding techniques. This paper proposes VLSI architecture for CAVLC algorithm. The designed hardware me...
详细信息
The video coding standard H.264 uses Context-based Adaptive Variable Length Coding (CAVLC) as one of its entropy encoding techniques. This paper proposes VLSI architecture for CAVLC algorithm. The designed hardware meets the required speed of H.264 without compromising the hardware cost. The CAVLC encoder works at a maximum clock frequency of 126 MHz when implemented in Xilinx 10.1i, Virtex-5 technology. The speed is quite appreciable when compared to other existing works. The implemented architecture meets the required rate for processing of HD-1080 format video sequence.
Human authentication can now be seen as a crucial social problem. In this paper a multimodal authentication system is presented which is highly reliable and fuses iris, finger-knuckle-print and palmprint image matchin...
详细信息
Human authentication can now be seen as a crucial social problem. In this paper a multimodal authentication system is presented which is highly reliable and fuses iris, finger-knuckle-print and palmprint image matching scores. Segmented ROI are preprocessed using DCP (Differential Code Pattern) to obtain robust corner features. Later they are matched using the GOF (Global Optical Flow) based dissimilarity measure. The proposed system has been tested on Casia Interval and Lamp iris, PolyU finger-knuckle-print and PolyU and Casia palmprint, public databases. The proposed system has shown good performance over all unimodal databases while over multimodal (fusion of all three) databases it has shown perfect performance (i.e. CRR = 100% with EER = 0%).
We use the RGB-D technology of Kinect to control an application with hand-gestures. We use PowerPoint for test. The system can start/end PPT, navigate between slides, capture or release the control of the cursor, and ...
详细信息
We use the RGB-D technology of Kinect to control an application with hand-gestures. We use PowerPoint for test. The system can start/end PPT, navigate between slides, capture or release the control of the cursor, and control it through natural gestures. Such a system is useful and hygienic in the kitchen, lavatories, hospital ICUs for touch-less surgery, and the like. The challenge is to extract meaningful gestures from continuous hand motions. We propose a system that recognizes isolated gestures from continuous hand motions for multiple gestures in real-time. Experimental results show that the system has 96.48% precision (at 96.00% recall) and performs better than the Microsoft Gesture Recognition library for swipe gestures.
This paper presents the design of STAR (Spatio-Temporal Analysis and Retrieval), an unsupervised Content Based Video Retrieval (CBVR) System. STAR's key insight and primary contribution is that it models video con...
详细信息
This paper presents the design of STAR (Spatio-Temporal Analysis and Retrieval), an unsupervised Content Based Video Retrieval (CBVR) System. STAR's key insight and primary contribution is that it models video content using a joint spatio-temporal feature representation and retrieves videos from the database which have similar moving object and trajectories of motion. Foreground moving blobs from a moving camera video shot are extracted, along with a trajectory for camera motion compensation, to form the space-time volume (STV). The STV is processed to obtain the EMST-CSS representation, which can discriminate across different categories of videos. Performance of STAR has been evaluated qualitatively and quantitatively using precision-recall metric on benchmark video datasets having unconstrained video shots, to exhibit efficiency of STAR.
暂无评论