We propose a novel technique for event geo-localization (i.e. 2-D location of the event on the surface of the earth) from the sensor metadata of crowd-sourced videos collected from smartphone devices. Withthe help of...
详细信息
ISBN:
(纸本)9781450347532
We propose a novel technique for event geo-localization (i.e. 2-D location of the event on the surface of the earth) from the sensor metadata of crowd-sourced videos collected from smartphone devices. Withthe help of sensors available in the smartphone devices, such as digital compass and GPS receiver, we collect metadata information such as camera viewing direction and location along withthe video. the event localization is then posed as a constrained optimization problem using available sensor metadata. Our results on the collected experimental data shows correct localization of events, which is particularly challenging for classical vision based methods because of the nature of the visual data. Since we only use sensor metadata in our approach, computational overhead is much less compared to what would be if video information is used. At the end, we illustrate the benefits of our work in analyzing the video data from multiple sources through geo-localization.
Automated segmentation of medical image volumes promises to reduce costly medical experts' time for annotation. However, using machine learning for the task is challenging due to variations in imaging modalities a...
详细信息
ISBN:
(纸本)9798400710759
Automated segmentation of medical image volumes promises to reduce costly medical experts' time for annotation. However, using machine learning for the task is challenging due to variations in imaging modalities and scarcity of patient data. While interactive image segmentation methods and foundational models incorporating user-provided prompts to refine segmentation masks have shown promise, they overlook crucial sequential information between the slices in 3D medical image volumes and videos, resulting in discontinuities in the segmentation results. this paper proposes a new framework that dynamically updates model parameters during inference in a test time training framework using user-provided scribbles. Our framework preserves acquired knowledge from the previous slices of the current medical volume and the training dataset via student-teacher learning. We evaluate our method on diverse CT, MRI, and microscopic cell datasets. Our framework significantly reduces user annotation time by a factor of 6.72x. Compared to other interactive segmentation methods, we reduce the time by a factor of 2.64x. Our method also outperforms prompting foundation models for segmentation by achieving a dice score of 0.9 in 3-4 interactions compared to 5-8 user interactions for the foundation model, significantly reducing annotation time for the CT and MRI volumes.
In shape recognition, a multiscale description provides more information about the object, increases discrimination power and immunity to noise. In this paper, we develop a new multiscale Fourier-based object descript...
详细信息
ISBN:
(纸本)9781424442195
In shape recognition, a multiscale description provides more information about the object, increases discrimination power and immunity to noise. In this paper, we develop a new multiscale Fourier-based object description in 2-D space using a low-pass Gaussian filter (LPGF) and a high-pass Gaussian filter (HPGF), separately. Using the LPGF, at different scales, represents the inner and central part of an object more than the boundary. On the other hand using the HPGF, at different scales, represents the boundary and exterior parts of an object more than the central part. Our algorithms are also organized to achieve size, translation and rotation invariance. Evaluation indicates that representing the boundary and exterior parts more than the central part using the HPGF performs better than the LPGF based multiscale representation, and in comparison to Zernike moments and elliptic Fourier descriptors with respect to increasing noise.
Matrix factorization technique has been widely used as a popular method to learn a joint latent-compact subspace, when multiple views or modals of objects (belonging to single-domain or multiple-domain) are available....
详细信息
ISBN:
(纸本)9781450347532
Matrix factorization technique has been widely used as a popular method to learn a joint latent-compact subspace, when multiple views or modals of objects (belonging to single-domain or multiple-domain) are available. Our work confronts the problem of learning an informative latent subspace by imparting supervision to matrix factorization for fusing multiple modals of objects, where we devise simpler supervised additive updates instead of multiplicative updates, thus scalable to large scale datasets. To increase the classification accuracy we integrate the label information of images withthe process of learning a semantically enhanced subspace. We perform extensive experiments on two publicly available standard image datasets of NUS WIDE and compare the results with state-of-the-art subspace learning and fusion techniques to evaluate the efficacy of our framework. Improvement obtained in the classification accuracy confirms the effectiveness of our approach. In essence, we propose a novel method for supervised data fusion thus leading to supervised subspace learning.
In Digital Subtraction Angiography (DSA), non-rigid registration of the mask and contrast images to reduce the motion artifacts is a challenging problem. In this paper, we have proposed a novel stratified registration...
详细信息
ISBN:
(纸本)9781450347532
In Digital Subtraction Angiography (DSA), non-rigid registration of the mask and contrast images to reduce the motion artifacts is a challenging problem. In this paper, we have proposed a novel stratified registration framework for DSA artifact reduction. We use quad-trees to generate the non-uniform grid of control points and obtain the sub-pixel displacement offsets using Random Walker (RW). We have also proposed a sequencing logic for the control points and an incremental LU decomposition approach that enables reuse of the computations in the RW step. We have tested our approach using clinical data sets, and found that our registration framework has performed comparable to the graph-cuts (at the same partition level), in regions wherein 95% artifact reduction was achieved. the optimization step achieves a speed improvement of 4.2 times with respect to graph-cuts.
this paper proposes a method for segmentation of nuclei of single/isolated and overlapping/touching immature white blood cells from microscopic images of B-Lineage acute lymphoblastic leukemia (ALL) prepared from peri...
详细信息
ISBN:
(纸本)9781450347532
this paper proposes a method for segmentation of nuclei of single/isolated and overlapping/touching immature white blood cells from microscopic images of B-Lineage acute lymphoblastic leukemia (ALL) prepared from peripheral blood and bone marrow aspirate. We propose deep belief network approach for the segmentation of these nuclei. Simulation results and comparison with some of the existing methods demonstrate the efficacy of the proposed method.
We propose a new method to compress the geometry component of 3D animation sequence. It is based on the Linear Discriminant Analysis (LDA) of the animation geometry data. the redundancy across the animation frames has...
详细信息
ISBN:
(纸本)9781424442195
We propose a new method to compress the geometry component of 3D animation sequence. It is based on the Linear Discriminant Analysis (LDA) of the animation geometry data. the redundancy across the animation frames has been exploited by using the LDA in the temporal direction. Owing to the redundancy between the frames of a class, the covariance matrix of that class for the LDA computation may become singular. To overcome this drawback, we first transform the data into a new basis using the Principal Component Analysis (PCA) and then apply the LDA on a few principal components. the reconstruction is simple and involves two stages: firstly for the LDA and then for the PCA. the experimental results show that the proposed method has the advantage of better reconstruction error at high compression ratios.
We present a novel dance posture based annotation model by combining features using Multiple Kernel Learning (MKL). We have proposed a novel feature representation which represents the local texture properties of the ...
详细信息
In this paper feature-preserving denoising scheme for fluorescence video microscopy is presented. Fluorescence image sequences comprise of edges and fine structures with fast moving objects. Improving signal to noise ...
详细信息
ISBN:
(纸本)9781450347532
In this paper feature-preserving denoising scheme for fluorescence video microscopy is presented. Fluorescence image sequences comprise of edges and fine structures with fast moving objects. Improving signal to noise ratio (SNR) while preserving structural details is a difficult task for these image sequences. Few existing denoising techniques result in over smoothing these image sequences while others fail due to inappropriate implementation of motion estimation and compensation steps. In this paper we use nonlocal means (NLM) video denoising algorithm as to avoid motion estimation and compensation steps. the proposed shot boundary detection technique pre-processes the sequence systematically and accurately to form different shots with content-wise similar frames. To preserve the edges and fine structural details in the image sequences we modify the weighing term of NLM filter. Further, to accelerate the denoising process, separable non-local means filter is implemented for video sequences. We compare the results with existing fluorescence video denoising techniques and show that the proposed method not only preserves the edges and small structural details more efficiently, also reduces the computational time. Efficacy of the proposed algorithm is evaluated quantitatively and qualitatively with PSNR and vision perception.
In speech training aids for providing visual feedback of the articulatory efforts, time-varying vocal tract shape during speech production is generally obtained by linear prediction (LP) analysis of the speech signal ...
详细信息
ISBN:
(纸本)9781467385640
In speech training aids for providing visual feedback of the articulatory efforts, time-varying vocal tract shape during speech production is generally obtained by linear prediction (LP) analysis of the speech signal and assuming a constant area at the glottis end as a reference. Its variation during speech production causes errors in the estimated vocal tract shape. the problem can be overcome by using area of the mouth opening as the reference. this area can be estimated by detecting the inner lip contour from the video recording of speaker's face during speech utterance. A technique for detection of inner lip contour, based on color transformation and template matching, is presented for reducing the errors caused by presence of teeth and tongue. Face detection by Viola-Jones algorithm, localization using a mouth detection technique, and outer lip contour detection are used to narrow down the search region for inner mouth opening. Presence of the teeth is masked by separate color transformations for upper and lower lip segments. For reducing the errors due to visibility of the tongue, which may not have any significant separation from the lips in the color space, a template matching technique is employed. It is used separately for the upper and lower lip segments to obtain the mouth opening area. the technique has been validated against graphically measured values of the mouth opening and found to be successful in estimating the mouth opening area, and it is not affected by skin hue and presence of teeth.
暂无评论