Automatic image annotation is the computervision task of assigning a set of appropriate textual tags to a novel image. the aim is to eventually bridge the semantic gap of visual and textual representations withthe h...
详细信息
ISBN:
(纸本)9781467385640
Automatic image annotation is the computervision task of assigning a set of appropriate textual tags to a novel image. the aim is to eventually bridge the semantic gap of visual and textual representations withthe help of these tags. this also has applications in designing scalable image retrieval systems and providing multilingual interfaces. though a wide varieties of powerful machine learning algorithms have been explored for the image annotation problem in the recent past, nearest neighbor techniques still yield superior results to them. A challenge ahead of the present day annotation schemes is the lack of sufficient training data. In this paper, an active Learning based image annotation model is proposed. We leverage the image-toimage and image-to-tag similarities to decide the best set of tags describing the semantics of an image. the advantages of the proposed model includes: (a). It is able to output the variable number of tags for images which improves the accuracy. (b). It is effectively able to choose the difficult samples that needs to be manually annotated and thereby reducing the human annotation efforts. Studies on Corel and IAPR TC-12 datasets validate the effectiveness of this model.
TABS is a software framework designed to support research, development and evaluation of components and systems in imageprocessing, computervision and pattern recognition. It utilises a novel integration of the well...
详细信息
TABS is a software framework designed to support research, development and evaluation of components and systems in imageprocessing, computervision and pattern recognition. It utilises a novel integration of the well-known Tcl/tk scripting language with a further software package ET to provide a software environment where systems can be developed in a seamless mixture of C/C++ and Tcl. In addition it integrates a user interface prototyping capability and a novel database abstraction, which enables systems to gain efficient access to any data generated during application execution. TABS runs under Unix/X-Windows, and Windows 95/NT, and is available in the public domain via the Internet.
this paper describes a sparse representation based approach to learn a classifier for assessing the video quality without a reference. First we calculate the natural scene statistics (NSS) based spatial features of ea...
详细信息
ISBN:
(纸本)9781467385640
this paper describes a sparse representation based approach to learn a classifier for assessing the video quality without a reference. First we calculate the natural scene statistics (NSS) based spatial features of each frame/ image and then learn a dictionary by K-SVD algorithm from NSS features of correct frames. In this work we identified the fact that correct frame can be represented precisely in terms of dictionary atoms but while representing a distorted frame, the error drastically increases with increase in distortion thus we can easily classify the frames as correct and distorted based on error score calculated by sparse representation framework. this framework has been validated on two datasets and we observe improved accuracies as compared to state-of-art algorithms.
We present an improved mesh denoising method based on 3D geometric bilateral filtering. Its novelty is that it can preserve the details of the object as well as reduce the noise in an effective manner. the previous ap...
详细信息
ISBN:
(纸本)9781479915880
We present an improved mesh denoising method based on 3D geometric bilateral filtering. Its novelty is that it can preserve the details of the object as well as reduce the noise in an effective manner. the previous approach of geometric bilateral filtering for 3D-scan points has a limitation that it reduces the point density, thereby losing the details present in the object. the approach proposed by us, on the contrary, works on the surface mesh obtained after triangulating the 3D-scan points without any data downsampling. Each vertex of the mesh is repositioned appropriately based on the estimated centroid of the vertices in its local neighborhood and a Gaussian weight function. Experimental results demonstrate its strength, efficiency, and robustness.
In many common applications of Microsoft Kinect (TM) including navigation, surveillance, 3D reconstruction, and the like;it is required to estimate the geometry of mirrors or other reflecting surfaces existing in the ...
详细信息
ISBN:
(纸本)9781479915880
In many common applications of Microsoft Kinect (TM) including navigation, surveillance, 3D reconstruction, and the like;it is required to estimate the geometry of mirrors or other reflecting surfaces existing in the field of view. this often is difficult as in most positions a mirror does not support diffuse reflection of speckles and hence cannot be seen in the Kinect depth map. A mirror shows up as unknown depth. However, suitably placed objects reflecting in the mirror can provide important clues for the orientation and distance of the mirror. In this paper we present a method using a ball and its mirror image to set-up point-to-point correspondence between object and image points to solve for the geometry of the mirror. Withthis simple estimators are designed for the orientation and distance of a plane vertical mirror with respect to the Kinect camera. In addition an estimator is presented for the diameter of the ball. the estimators are validated through a set of experiments.
Text recognition from a natural scene and video is challenging compared to that in scanned document images. this is due to the problems of text on different sources of various styles, font variation, font size variati...
详细信息
ISBN:
(纸本)9781479915880
Text recognition from a natural scene and video is challenging compared to that in scanned document images. this is due to the problems of text on different sources of various styles, font variation, font size variations, background variations, etc. there are approaches for word segmentation from video and scene images to feed the word image into OCRs. Nevertheless, such methods often fail to yield satisfactory results in recognition. therefore, in this paper, we propose to combine Hidden Markov Model (HMM) and Convolutional Neural Network (CNN) to achieve good recognition rate. Sequential gradient features with HMM help to find character alignment of a word. Later the character alignments are verified by Convolutional Neural network (CNN). the approach is tested on both video and scene data to show the effectiveness of the proposed approach. the results are found encouraging.
Segmentation of cell nuclei in PAP-smear cervical images is of preeminent importance in computer-aided-diagnostic screening technique for cervical cancer. this paper proposes a novel nuclei segmentation approach which...
详细信息
ISBN:
(纸本)9781467385640
Segmentation of cell nuclei in PAP-smear cervical images is of preeminent importance in computer-aided-diagnostic screening technique for cervical cancer. this paper proposes a novel nuclei segmentation approach which builds upon the mean-shift method. the mean-shift method is applied on the cell images which first undergo a decorrelation-stretch contrast enhancement. the results of mean-shift based approach is refined further using morphological operations. We have validated results of segmentation on dataset which includes 900 images withthe given ground truth. We demonstrate that our simple and efficient approach yields high validation rate on a large image dataset. In addition, we also show encouraging visual results on another set of more complex real images.
Accurate detection of optic disk and macula are of interest in automated analysis of retinal images as they are landmarks in retina and their detection aids in assessing the severity of diseases based on the locations...
详细信息
ISBN:
(纸本)9781467385640
Accurate detection of optic disk and macula are of interest in automated analysis of retinal images as they are landmarks in retina and their detection aids in assessing the severity of diseases based on the locations of abnormalities relative to these landmarks. the general strategy is to design different methods to these landmarks. In contrast, we propose a novel and unified approach for Optic disk and macula detection in this paper using the Generalized Motion Pattern (GMP) [10] [19] which is derived by inducing motion to an image to smooth out unwanted information. the proposed method is unsupervised, parallelizable and handles illumination differences efficiently but assumes a fixed protocol in image acquisition. the proposed method has been tested on five public datasets and obtained results indicate comparable performance to supervised approaches for the same problem.
Analysis of a very long video and semantically describe the contents is a challenging task in computervision. the present approaches such as video shot detection and summarization address this problem partially while...
详细信息
ISBN:
(纸本)9781467385640
Analysis of a very long video and semantically describe the contents is a challenging task in computervision. the present approaches such as video shot detection and summarization address this problem partially while maintaining the temporal coherency. To reduce the user efforts for seeing the whole video we have introduced a new technique which combines similar content irrespective of their presence at different time instants. In this approach, we automatically identify only the representative frames corresponding to similar scenes which were captured at different instants of time. We also provide the labels of the objects that are present in the representative frames along withthe compact representation for the video. We achieve the task of semantic labelling of frames in a unified framework using a deep learning framework involving pre-trained features through a convolutional neural network. We show that the proposed approach is able to address the semantic labelling effectively as justified by the results obtained for videos of different scenes captured through different modalities.
We use the RGB-D technology of Kinect to control an application with hand-gestures. We use PowerPoint for test. the system can start/end PPT, navigate between slides, capture or release the control of the cursor, and ...
详细信息
ISBN:
(纸本)9781467385640
We use the RGB-D technology of Kinect to control an application with hand-gestures. We use PowerPoint for test. the system can start/end PPT, navigate between slides, capture or release the control of the cursor, and control it through natural gestures. Such a system is useful and hygienic in the kitchen, lavatories, hospital ICUs for touch-less surgery, and the like. the challenge is to extract meaningful gestures from continuous hand motions. We propose a system that recognizes isolated gestures from continuous hand motions for multiple gestures in real-time. Experimental results show that the system has 96.48% precision (at 96.00% recall) and performs better than the Microsoft Gesture Recognition library for swipe gestures.
暂无评论