The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered informa...
详细信息
ISBN:
(纸本)9781509009824
The presence of both caption/graphics/superimposed and scene texts in video frames is the major cause for the poor accuracy of text recognition methods. This paper proposes an approach for identifying tampered information by analyzing the spatial distribution of DCT coefficients in a new way for classifying caption and scene text. Since caption text is edited/superimposed, which results in artificially created texts comparing to scene texts that exist naturally in frames. We exploit this fact to identify the presence of caption and scene texts in video frames based on the advantage of DCT coefficients. The proposed method analyzes the distributions of both zero and non-zero coefficients (only positive values) locally by moving a window, and studies histogram operations over each input text line image. This generates line graphs for respective zero and non-zero coefficient coordinates. We further study the behavior of text lines, namely, linearity and smoothness based on centroid location analysis, and the principal axis direction of each text line for classification. Experimental results on standard datasets, namely, ICDAR 2013 video, 2015 video, YVT video and our own data, show that the performances of text recognition methods are improved significantly after-classification compared to before-classification.
Multiple researchers recently proposed the use of the digital compass embedded in mobile devices for touchless interaction in the 3D space around them. These methods overcome several limits imposed by other interactio...
详细信息
Achieving good recognition results from a single method for text lines in video/natural scene images captured by high resolution cameras or low resolution mobile cameras, and images in web pages, is often hard. In thi...
详细信息
Achieving good recognition results from a single method for text lines in video/natural scene images captured by high resolution cameras or low resolution mobile cameras, and images in web pages, is often hard. In this paper, we propose new sharpness based features of textual portion of each input text line image using HSI color space for the classification of an input image into one of the four classes (video, scene, mobile or born digital). This helps in choosing an appropriate method based on the class type of the input text for its improved recognition rate. For a given input text line image, the proposed method obtains H, S and I images. Then Canny edge images are obtained for H, S and I spaces, which results in text candidates. We perform sliding window operation over the text candidate image of each text line of each color space to estimate new sharpness by calculating stroke width and gradient information. The sharpness values of the text lines of the three color spaces are then fed to k-means clustering with maximum, minimum and average guesses, which results in three respective clusters. The mean of each cluster for respective color spaces outputs a feature vector having nine feature values for image classification with the help of an SVM classifier. Experimental results on standard datasets, namely, ICDAR 2013, ICDAR 2015 video, ICDAR 2015 natural scene data, ICDAR 2013 born digital data and the images captured by a mobile camera (our own data) show that the proposed classification method helps in improving recognition results.
As new digital technologies emerge to improve living style, at the same time, it also lead to increase crimes. Unlike existing approaches that use content of handwriting for fraud/forged document identification, in th...
详细信息
ISBN:
(纸本)9781509009824
As new digital technologies emerge to improve living style, at the same time, it also lead to increase crimes. Unlike existing approaches that use content of handwriting for fraud/forged document identification, in this paper we propose a novel approach that explores the quality of handwritten documents by considering both foreground and background information to identify whether it is old or new. The proposed approach works based on the fact that if a fraud document is created with some gaps after the original one, the fraud document happened to be a new one and the original happened to be an old one in this work. To identify whether a given handwritten document is old or new with gaps, we propose to divide Fourier coefficients of the input image into positive and negative coefficient images, and then reconstruct respective images to conquer two reconstructed ones. The contrast of the reconstructed images obtained before and after divide-conquer is studied to analyze the ages of the document based on image quality. The proposed approach finds a unique relationship between reconstructed images, obtained before and after divide-conquer, to identify the input image as old or new. To evaluate the proposed approach, we conduct experiments on our own handwritten dataset and a standard database, namely, Google-LIFE magazine. Comparative studies with the existing approaches show that the proposed approach outperforms the existing approaches in terms of classification rate.
computer aided diagnostic and segmentation tools have become increasingly important in reducing the workload of medical experts performing diagnosis, monitoring and documentation of various eye diseases such as age-re...
详细信息
computer aided diagnostic and segmentation tools have become increasingly important in reducing the workload of medical experts performing diagnosis, monitoring and documentation of various eye diseases such as age-related macular degeneration (AMD), diabetic retinopathy (DR) and glaucoma. Supervised methods have been developed for the segmentation and detection of lesions, and the reported performance has been good. The supervised methods, however, need representative data to properly train the classifier. Inaccuracies in the ground truth may have a significant impact on the performance of a supervised method as the training data are not representative. In this study, a quantitative evaluation of the sensitivity of different image features, including colour, texture, edge and higher-level features, to inaccuracy in the ground truth on exudates is presented. A mean decrease of approx. 20% in sensitivity and 13% in specificity was observed when using the most inaccurate training data.
Most existing discriminative tracking methods model a target object as a whole and train a tracker based on holistic templates, which cannot effectively deal with partial occlusions. Instead, in this paper, by treatin...
详细信息
ISBN:
(纸本)9781467372596
Most existing discriminative tracking methods model a target object as a whole and train a tracker based on holistic templates, which cannot effectively deal with partial occlusions. Instead, in this paper, by treating the target as a collection of local patches, we propose a novel tracking approach based on boosted local classifiers. Initially, a set of local patches are sampled to train a set of local classifiers, and the weight of each classifier is given based on the estimated error. In addition, the positive examples and negative examples are sampled for model update with two constraints during the tracking process, which helps obtain more negatives for updating the appearance model and improve the updating efficiency. With updating the weights of local classifiers based on the temporal stability, the tracker can effectively handle partial occlusions. Extensive experiments on various challenging image sequences demonstrate the superiority to several state-of-the-art methods.
Label propagation is an approach to iteratively spread the prior state of label confidence associated with each of samples to its neighbors until achieving a global convergence state. Such process has been shown to ho...
详细信息
Label propagation is an approach to iteratively spread the prior state of label confidence associated with each of samples to its neighbors until achieving a global convergence state. Such process has been shown to hold close connection with a general graph-based regularization framework. Within this framework, a closed- form linear system can be built to carry out label propagation. In this paper, to address several issues inherent with previous graph-based label propagation framework, we propose a reformulated one, i.e., local structure sensitive label propagation ( LSSLP ). By associating each graph vertex with a local structure sensitive tuning factor, the empirical loss error on each vertex can be controlled preferably to keep consistent with the commonly preconditioned ‘ cluster assumption ’ of data structure. Out of consideration for information conservation, we relax the state conservation constraint of label confidence from labeled samples proposed by Belkin et al. (2004) to a more general form. Meanwhile, an inverse-warping procedure is incorporated into the proposed local structure sensitive label propagation framework to maintain large and stable enough classification margin. Based on the felicitous inversion technique for blocked matrix, we extend LSSLP to its incremental and inductive versions and also present computationally efficient implementation of it. Experimental results demonstrate the performance of the reformulated regularization framework for label propagation is much competitive.
In this work, a new glass classification method is proposed. Firstly, images are enhanced by image preprocessing. Secondly, a series of glass features including shape and texture features are proposed. Finally, we emp...
详细信息
ISBN:
(纸本)9781509029181
In this work, a new glass classification method is proposed. Firstly, images are enhanced by image preprocessing. Secondly, a series of glass features including shape and texture features are proposed. Finally, we employ simple minimum distance classifier to classify the input glass images. The experimental results show that the proposed method has high classification efficiency and accuracy.
keyword spotting in video document images is challenging due to low resolution and complex background of video images. We propose the combination of Texture-Spatial-Features (TSF) for keyword spotting in video images ...
详细信息
暂无评论