this paper proposes a method for segmentation of nuclei of single/isolated and overlapping/touching immature white blood cells from microscopic images of B-Lineage acute lymphoblastic leukemia (ALL) prepared from peri...
详细信息
ISBN:
(纸本)9781450347532
this paper proposes a method for segmentation of nuclei of single/isolated and overlapping/touching immature white blood cells from microscopic images of B-Lineage acute lymphoblastic leukemia (ALL) prepared from peripheral blood and bone marrow aspirate. We propose deep belief network approach for the segmentation of these nuclei. Simulation results and comparison with some of the existing methods demonstrate the efficacy of the proposed method.
In speech training aids for providing visual feedback of the articulatory efforts, time-varying vocal tract shape during speech production is generally obtained by linear prediction (LP) analysis of the speech signal ...
详细信息
ISBN:
(纸本)9781467385640
In speech training aids for providing visual feedback of the articulatory efforts, time-varying vocal tract shape during speech production is generally obtained by linear prediction (LP) analysis of the speech signal and assuming a constant area at the glottis end as a reference. Its variation during speech production causes errors in the estimated vocal tract shape. the problem can be overcome by using area of the mouth opening as the reference. this area can be estimated by detecting the inner lip contour from the video recording of speaker's face during speech utterance. A technique for detection of inner lip contour, based on color transformation and template matching, is presented for reducing the errors caused by presence of teeth and tongue. Face detection by Viola-Jones algorithm, localization using a mouth detection technique, and outer lip contour detection are used to narrow down the search region for inner mouth opening. Presence of the teeth is masked by separate color transformations for upper and lower lip segments. For reducing the errors due to visibility of the tongue, which may not have any significant separation from the lips in the color space, a template matching technique is employed. It is used separately for the upper and lower lip segments to obtain the mouth opening area. the technique has been validated against graphically measured values of the mouth opening and found to be successful in estimating the mouth opening area, and it is not affected by skin hue and presence of teeth.
In this paper feature-preserving denoising scheme for fluorescence video microscopy is presented. Fluorescence image sequences comprise of edges and fine structures with fast moving objects. Improving signal to noise ...
详细信息
ISBN:
(纸本)9781450347532
In this paper feature-preserving denoising scheme for fluorescence video microscopy is presented. Fluorescence image sequences comprise of edges and fine structures with fast moving objects. Improving signal to noise ratio (SNR) while preserving structural details is a difficult task for these image sequences. Few existing denoising techniques result in over smoothing these image sequences while others fail due to inappropriate implementation of motion estimation and compensation steps. In this paper we use nonlocal means (NLM) video denoising algorithm as to avoid motion estimation and compensation steps. the proposed shot boundary detection technique pre-processes the sequence systematically and accurately to form different shots with content-wise similar frames. To preserve the edges and fine structural details in the image sequences we modify the weighing term of NLM filter. Further, to accelerate the denoising process, separable non-local means filter is implemented for video sequences. We compare the results with existing fluorescence video denoising techniques and show that the proposed method not only preserves the edges and small structural details more efficiently, also reduces the computational time. Efficacy of the proposed algorithm is evaluated quantitatively and qualitatively with PSNR and vision perception.
Learning image representations has been an interesting and challenging problem. When users upload images to photo sharing websites, they often provide multiple textual tags for ease of reference. these tags can reveal...
详细信息
ISBN:
(纸本)9781450347532
Learning image representations has been an interesting and challenging problem. When users upload images to photo sharing websites, they often provide multiple textual tags for ease of reference. these tags can reveal significant information about the content of the image such as the objects present in the image or the action that is taking place. Approaches have been proposed to extract additional information from these tags in order to augment the visual cues and build a multi-modal image representation. However, the existing approaches do not pay much attention to the semantic meaning of the tags while they encode. In this work, we attempt to enrich the image representation withthe tag encodings that leverage their semantics. Our approach utilizes neural network based natural language descriptors to represent the tag information. By complementing the visual features learned by convnets, our approach results in an efficient multi-modal image representation. Experimental evaluation suggests that our approach results in a better multi-modal image representation by exploiting the two data modalities for classification on benchmark datasets.
Automatic recognition of important events in soccer broadcast videos plays a vital role in many applications including video summarization, indexing, content-based search, and in performance analysis of players and te...
详细信息
ISBN:
(纸本)9781450347532
Automatic recognition of important events in soccer broadcast videos plays a vital role in many applications including video summarization, indexing, content-based search, and in performance analysis of players and teams. this paper proposes an approach for soccer event recognition using deep convolutional features combined with domain-specific cues. For deep representation, we use the recently proposed trajectory based deep convolutional descriptor (TDD) [1] which samples and pools the discriminatively trained convolutional features around the improved trajectories. We further improve the performance by incorporating domain specific knowledge based on camera view type and its position. the camera position and view type captures the statistics of occurrence of events in different play-field regions and zoom-level respectively. We conduct extensive experiments on 6 hour long soccer matches and show the effectiveness of deep video representation for soccer and the improvements obtained using domain-specific cues.
We have attempted the problem of novel view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras. Under the assumption of availability of the correspondence of three van...
详细信息
We have attempted the problem of novel view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras. Under the assumption of availability of the correspondence of three vanishing points, in general position, we propose two techniques. the first is a transfer-based scheme which synthesizes new views with only a translation of the virtual camera and computes z-buffer values for handling occlusions in synthesized views. the second is a reconstruction-based scheme which synthesizes arbitrary new views in which the camera can undergo rotation as well as translation. We present experimental results to establish the validity of both formulations. (c) 2006 Published by Elsevier B.V.
In this work, we present a novel non-photorealistic rendering method which produces good quality stylization results for color images. the procedure is driven by saliency measure in the foreground and the background r...
详细信息
ISBN:
(纸本)9781450347532
In this work, we present a novel non-photorealistic rendering method which produces good quality stylization results for color images. the procedure is driven by saliency measure in the foreground and the background region. We start with generating saliency map and simple thresholding based segmentation to get rough estimation of the foreground background mask. We improve this mask by using a scribble based method where the scribbles for foreground-background regions are automatically generated from the previous rough estimation. Followed by the mask generation, we proceed with an iterative abstraction process which involves edge preserving blurring and edge detection. the number of iterations of the abstraction process to be performed in the foreground and background regions are decided by tracking the changes in saliency measure in the foreground and the background regions. Performing unequal number of iterations helps to improve the average saliency measure in more salient region (foreground) while decreasing the average saliency measure in the non-salient region (background). Implementation results of our method shows the merits of this approach with other competing methods.
Face Recognition (FR) under adversarial conditions has been a big challenge for researchers in the computervision and Machine Learning communities in the recent past. Most of state-of-the-art face recognition systems...
详细信息
ISBN:
(纸本)9781450366151
Face Recognition (FR) under adversarial conditions has been a big challenge for researchers in the computervision and Machine Learning communities in the recent past. Most of state-of-the-art face recognition systems have been designed to overcome degradations in a face due to variations in pose, illumination, contrast, resolution, along with blur. However, interestingly none have addressed the fascinating issue of makeup as a spoof attack, which drastically changes the appearance of a face, making it difficult for even humans to detect and identify the impostor. In this paper, we propose a novel multi-component deep convolutional neural network (CNN) based architecture which performs the complex task of makeup removal from a disguised face, to reveal the original mugshot image of the impostor (i.e. without makeup). the proposed network also performs the hard tasks of FR on a disguised face in addition to recognition of identity and generation of the face of the spoofed target, by minimizing a novel multi-component objective function. Comparison of performance with a few recent state-of-the-art methods of FR over three benchmark datasets reveals the superiority of our proposed method for both synthesis as well as recognition (FR) tasks.
Egocentric activity recognition (EAR) is an emerging area in the field of computervision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal ...
详细信息
ISBN:
(纸本)9781450366151
Egocentric activity recognition (EAR) is an emerging area in the field of computervision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal egocentric activity recognition using visual (RGB videos) and sensor stream (accelerometer, gyroscope, etc.). In order to effectively capture the spatio-temporal information contained in RGB videos, two types of modalities are extracted from visual data: Approximate Dynamic image (ADI) and Stacked Difference image (SDI). these image-based representations are generated both at clip level as well as entire video level, and are then utilized to finetune a pretrained 2D-CNN called MobileNet, which is specifically designed for mobile vision applications. Similarly for sensor data, each training sample is divided into three segments, and a deep 1D-CNN network is trained (corresponding to each type of sensor stream) from scratch. During testing, the softmax scores of all the streams (visual + sensor) are combined by late fusion. the experiments performed on multimodal egocentric activity dataset demonstrates that our proposed approach can achieve state-of-the-art results, outperforming the current best handcrafted and deep learning based techniques.
Fragile watermarking schemes are very common in practice for tamper detection and image authentication. In this paper, we propose a new fragile watermarking scheme to detect common image tampering operations such as c...
详细信息
ISBN:
(纸本)9781450347532
Fragile watermarking schemes are very common in practice for tamper detection and image authentication. In this paper, we propose a new fragile watermarking scheme to detect common image tampering operations such as copy and paste attack and image splicing attack from color images. the proposed scheme is intended to do the watermarking with an image dependent authentication code during color image demosaicking procedure. We propose a new method to generate almost unique authentication code from an image by comparing RGB color components of the pixels. In the proposed scheme, every pixel will be watermarked with an encrypted authentication code derived from the same pixel. To embed the watermark a new method has been proposed in this paper, where the color filter array sampled components will not be considered during watermarking, and only the rebuilt color component values of every pixel will be modified according to the authentication code. the experimental study shows that the proposed watermarking scheme produces images with better visual quality as compared to the state-of-the-art, and is capable of detecting both copy and paste attacks and image splicing attacks more accurately than the existing schemes.
暂无评论