Automated face detection is the pivotal step in computervision aided facial medical diagnosis and biometrics. This paper presents an automatic, subject adaptive framework for accurate face detection in the long infra...
详细信息
ISBN:
(数字)9781510607125
ISBN:
(纸本)9781510607118;9781510607125
Automated face detection is the pivotal step in computervision aided facial medical diagnosis and biometrics. This paper presents an automatic, subject adaptive framework for accurate face detection in the long infrared spectrum on our database for oral cancer detection consisting of malignant, precancerous and normal subjects of varied age group. Previous works on oral cancer detection using Digital Infrared Thermal Imaging(DITI) reveals that patients and normal subjects differ significantly in their facial thermal distribution. Therefore, it is a challenging task to formulate a completely adaptive framework to veraciously localize face from such a subject specific modality. Our model consists of first extracting the most probable facial regions by minimum error thresholding followed by ingenious adaptive methods to leverage the horizontal and vertical projections of the segmented thermal image. Additionally, the model incorporates our domain knowledge of exploiting temperature difference between strategic locations of the face. To our best knowledge, this is the pioneering work on detecting faces in thermal facial images comprising both patients and normal subjects. Previous works on face detection have not specifically targeted automated medical diagnosis;face bounding box returned by those algorithms are thus loose and not apt for further medical automation. Our algorithm significantly outperforms contemporary face detection algorithms in terms of commonly used metrics for evaluating face detection accuracy. Since our method has been tested on challenging dataset consisting of both patients and normal subjects of diverse age groups, it can be seamlessly adapted in any DITI guided facial healthcare or biometric applications.
Bishnupur is an attractive tourist place in West Bengal, India and is known for its terracotta temples. The place is one of the prospective candidates to be included in the list of UNESCO World Heritage sites. We inte...
详细信息
ISBN:
(纸本)9781450347532
Bishnupur is an attractive tourist place in West Bengal, India and is known for its terracotta temples. The place is one of the prospective candidates to be included in the list of UNESCO World Heritage sites. We intend to preserve this heritage site digitally and also to present some virtual interaction for the tourist and researchers. In this paper, we present an image dataset of different temples (namely, Jor Bangla, Kalachand, Madan Mohan, Radha Madhav, Rasmancha, Shyamrai and Nandalal) in Bishnupur for evaluating different types of computervision and imageprocessing algorithms (like 3D reconstruction, image inpainting, texture classification and content specific image retrieval). The dataset is captured using four different cameras with different parameter settings. Some datasets are extracted and earmarked for certain applications such as texture classification, image inpainting and content specific image retrieval. Example results of baseline methods are also shown for these applications. Thus we evaluate the usefulness of this dataset. To the best of our knowledge, probably this is the first attempt of combined dataset for evaluating various types of problems for a heritage site in India.
Music transcription refers to the process of analyzing a piece of music to generate a sequence of constituent notes and their duration. Transcription of music from audio signals is fraught with problems due to auditor...
详细信息
ISBN:
(纸本)9781450347532
Music transcription refers to the process of analyzing a piece of music to generate a sequence of constituent notes and their duration. Transcription of music from audio signals is fraught with problems due to auditory interference such as ambient noise, multiple instruments playing simultaneously, accompanying vocals or polyphonic sounds. For several instruments, there exists added information for music transcription which can be derived from a video sequence of the instrument as it is being played. This paper proposes a method to utilize this visual information for the case of keyboard-like instruments to generate a transcript automatically, by analyzing the video frames. We present encouraging results under varying lighting conditions on different song sequences played out on a keyboard.
image Hallucination has many applications in areas such as imageprocessing, computational photography and image fusion. In this paper, we present an image Hallucination technique based on the template (patch) matchin...
详细信息
ISBN:
(纸本)9781450347532
image Hallucination has many applications in areas such as imageprocessing, computational photography and image fusion. In this paper, we present an image Hallucination technique based on the template (patch) matching from the database of time lapse images and learned locally affine model. Template based techniques suffer from blocky artifacts. So, we propose two approaches for imposing consistency criteria across neighbouring patches in the form of regularization. We validate our Color transfer technique by hallucinating a variety of natural images at different times the day. We compare the proposed approach with other state of the art techniques of example image based color transfer and show that the images obtained using our approach look more plausible and natural.
We present a novel algorithm to remove near regular, fence or wire like foreground patterns from an image. The fence detection or fence removal algorithms, developed so far, have poor performance in detecting the fenc...
详细信息
ISBN:
(纸本)9781450347532
We present a novel algorithm to remove near regular, fence or wire like foreground patterns from an image. The fence detection or fence removal algorithms, developed so far, have poor performance in detecting the fence. We use signal demixing to utilize the sparsity and regularity property of fences to detect them. Results demonstrate the effectiveness of our technique as compared to other state of the art techniques.
Rotation invariance has been studied in the computervision community primarily in the context of small in-plane rotations. This is usually achieved by building invariant image features. However, the problem of achiev...
详细信息
ISBN:
(纸本)9781450347532
Rotation invariance has been studied in the computervision community primarily in the context of small in-plane rotations. This is usually achieved by building invariant image features. However, the problem of achieving invariance for large rotation angles remains largely unexplored. In this work, we tackle this problem by directly compensating for large rotations, as opposed to building invariant features. This is inspired by the neuro-scientific concept of mental rotation, which humans use to compare pairs of rotated objects. Our contributions here are three-fold. First, we train a Convolutional Neural Network (CNN) to detect image rotations. We find that generic CNN architectures are not suitable for this purpose. To this end, we introduce a convolutional template layer, which learns representations for canonical 'unrotated' images. Second, we use Bayesian Optimization to quickly sift through a large number of candidate images to find the canonical 'unrotated' image. Third, we use this method to achieve robustness to large angles in an image retrieval scenario. Our method is task-agnostic, and can be used as a pre-processing step in any computervision system.
Skin colour detection under poor or varying illumination condition is a big challenge for various imageprocessing and human-computer interaction applications. In this paper, a novel skin detection method utilizing im...
详细信息
ISBN:
(纸本)9781450347532
Skin colour detection under poor or varying illumination condition is a big challenge for various imageprocessing and human-computer interaction applications. In this paper, a novel skin detection method utilizing image pixel distribution in a given colour space is proposed. The pixel distribution of an image can provide a better localization of the actual skin colour distribution of an image. Hence, a local skin distribution model (LSDM) is derived using the image pixel distribution model and its similarity with the global skin distribution model (GSDM). Finally, a fusion-based skin model is obtained using both the GSDM and the LSDM. Subsequently, a dynamic region growing method is employed to improve the overall detection rate. Experimental results show that proposed skin detection method can significantly improve the detection accuracy in presence of varying illumination conditions.
Understanding crowd dynamics is an interesting problem in computervision owing to its various applications. We propose a dynamical system to model the dynamics of collective motion of the crowd. The model learns the ...
详细信息
ISBN:
(纸本)9781450347532
Understanding crowd dynamics is an interesting problem in computervision owing to its various applications. We propose a dynamical system to model the dynamics of collective motion of the crowd. The model learns the spatio-temporal interaction pattern of the crowd from the track data captured over a time period. The model is trained under a least square formulation with spatial and temporal constraints. The spatial constraint allows the model to consider only the neighbors of a particular agent and the temporal constraint enforces temporal smoothness in the model. We also propose an effective group detection algorithm that utilizes the eigenvectors of the interaction matrix of the model. The group detection is cast as a spectral clustering problem. Extensive experimentation demonstrates a superlative performance of our group detection algorithm over state-of-the-art methods.
Dictionary learning has been used to solve inverse problems in imaging and as an unsupervised feature extraction tool in vision. The main disadvantage of dictionary learning for applications in vision is the relativel...
详细信息
ISBN:
(纸本)9781450347532
Dictionary learning has been used to solve inverse problems in imaging and as an unsupervised feature extraction tool in vision. The main disadvantage of dictionary learning for applications in vision is the relatively long feature extraction time during testing;owing to the requirement of solving an iterative optimization problem (10-minimization). The newly developed analysis framework of transform learning does not suffer from this shortcoming;feature extraction only requires a matrix vector multiplication. This work proposes an alternate formulation for transform learning that improves the accuracy even further. Experiments on benchmark databases show that our proposed transform learning yields results better than dictionary learning, autoencoder (AE) and restricted Boltzmann machine (RBM). The feature extraction time is fast as AE and RBM.
Compressed sensing magnetic resonance imaging (CSMRI) have demonstrated that it is possible to accelerate MRI scan time by reducing the number of measurements in the k-space without significant loss of anatomical deta...
详细信息
ISBN:
(纸本)9781450347532
Compressed sensing magnetic resonance imaging (CSMRI) have demonstrated that it is possible to accelerate MRI scan time by reducing the number of measurements in the k-space without significant loss of anatomical details. The number of k-space measurements is roughly proportional to the sparsity of the MR signal under consideration. Recently, a few works on CSMRI have revealed that the sparsity of the MR signal can be enhanced by suitable weighting of different regularization priors. In this paper, we have proposed an efficient adaptive weighted reconstruction algorithm for the enhancement of sparsity of the MR image. Experimental results show that the proposed algorithm gives better reconstructions with less number of measurements without significant increase of the computational time compared to existing algorithms in this line.
暂无评论