Nowadays, text detection and localization have gained much popularity in the field of text analysis systems as they pave the way for the number of real-time based applications like mobile transliteration technologies,...
详细信息
Nowadays, text detection and localization have gained much popularity in the field of text analysis systems as they pave the way for the number of real-time based applications like mobile transliteration technologies, assistive methods for visually impaired persons, etc. text detection and localization techniques are used to find the position of text area in the *** paper intends to present a broad review in this field as five-fold: (1) comparison of document images with scene images and applications of natural scene images, (2) significant and up-to-date traditional machine learning and deep learning-based approaches for the text detection and localization for different languages, (3) various publicly available benchmarked datasets, (4) comparative analysis for other benchmarked datasets and, (5) related challenges and future scope on the field. The paper summarises some of the potential ways in this field, which can serve as a useful reference for the researchers for future exploration of the area.
This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery. It summarizes the fundamental problems and enumerates factor...
详细信息
This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery. It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems. Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition. Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed. The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared. This review provides a fundamental comparison and analysis of the remaining problems in the field.
text detection is important in the retrieval of texts from digital pictures, video databases and webpages. However, it can be very challenging since the text is often embedded in a complex background. In this paper, w...
详细信息
text detection is important in the retrieval of texts from digital pictures, video databases and webpages. However, it can be very challenging since the text is often embedded in a complex background. In this paper, we propose a classification-based algorithm for text detection using a sparse representation with discriminative dictionaries. First, the edges are detected by the wavelet transform and scanned into patches by a sliding window. Then, candidate text areas are obtained by applying a simple classification procedure using two learned discriminative dictionaries. Finally, the adaptive run-length smoothing algorithm and projection profile analysis are used to further refine the candidate text areas. The proposed method is evaluated on the Microsoft common test set, the ICDAR 2003 text locating set, and an image set collected from the web. Extensive experiments show that the proposed method can effectively detect texts of various sizes, fonts and colors from images and videos. (c) 2010 Elsevier B.V. All rights reserved.
Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video an...
详细信息
Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8 x 8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods. (c) 2007 Elsevier B.V. All rights reserved.
This paper proposes a three-level framework to detect texts in a single image. First, a salient feature map of text is extracted using a Fully Convolutional Network (FCN) that achieves good performance in semantic seg...
详细信息
This paper proposes a three-level framework to detect texts in a single image. First, a salient feature map of text is extracted using a Fully Convolutional Network (FCN) that achieves good performance in semantic segmentation. Label combination using both boxes of word and characters level is proposed to improve the detection of uneven boundaries of text regions. Second, in the feature map of FCN, the text region has a higher probability value than the background region, and the coordinates in the character area are very close to each other. We segment the text area and the background area by using the characteristics of text feature map with Hierarchical Cluster Analysis (HCA). Finally, we applied a Convolutional Neural Networks (CNN) to classify the candidate text area into text and non-text. In this paper, we used CNN which can classify 4 classes in total by separating the background area and three text classes (one character, two characters, three characters or more). The text detection framework proposed in this paper have shown good performance with ICDAR 2015, and high performance especially in Recall criterion, finding more texts than other algorithms.
textual information that resides in natural images is an important knowledge for indexing and retrieval purposes. In this paper, a new approach is proposed using Mnemonic Cellular Automata (m-CA) which strives towards...
详细信息
textual information that resides in natural images is an important knowledge for indexing and retrieval purposes. In this paper, a new approach is proposed using Mnemonic Cellular Automata (m-CA) which strives towards detecting scene text on natural images. Initially, an edge map is calculated and consequently binarized. Then, taking advantage of the Hybrid Cellular Automata (CA) flexibility, the transition rules are changed and are applied in different consecutive steps. Initially, its rules partially depend on Coordinating Logic Filters (CLF) and the majority state. Moreover, in the final steps of the m-CA evolution the update rules are modified as the history of past evolution steps is incorporated into each cell. Experimental work on the ICDAR 2011 Robust Reading Competition dataset shows improved performance.
In this paper, we focus on text detection in natural scene images which is conducive to content-based wild image analysis and understanding. This task is still an open problem and usually includes two key issues: text...
详细信息
In this paper, we focus on text detection in natural scene images which is conducive to content-based wild image analysis and understanding. This task is still an open problem and usually includes two key issues: text candidate extraction and verification. For text candidate extraction, we introduce a color prior to guide the character candidate extraction by Maximally Stable Extremal Region (MSER). The principle of color prior acquirement is to obtain stroke-like textures with modified Stroke Width Transform (SWT), which is based on segmented edges. For text verification, the ideology of deep learning is adopted to distinguish text/non-text candidates. To improve classification accuracy, the results of specific task CNNs are fused. The proposed framework is evaluated on the ICDAR 2013 Robust Reading Competition database. It achieves F-score at 85.87%, which are superior over several state-of-the-art text detection methods. (C) 2018 Elsevier B.V. All rights reserved.
text in natural scene images plays a vital role in scene understanding. It contains a rich and abundant amount of valuable semantic information useful in many applications such as analysis of products' labels, aut...
详细信息
text in natural scene images plays a vital role in scene understanding. It contains a rich and abundant amount of valuable semantic information useful in many applications such as analysis of products' labels, autonomous driving, and blind navigation. Consequently, detection, recognition, and identification of scripts of texts present in scene images have recently received massive attention. This paper intends to walk through the advances on the mentioned topics, mainly focusing on the approaches proposed in the last 8-10 years. As per our knowledge, this paper is the first to provide a review on the scene text script identification. We also provide a clear and precise classification between conventional-, deep learning-, and hybrid-based methods, including their advantages and disadvantages. State-of-the-art evaluation metrics, benchmark datasets' characteristics, and performances of the existing methods are also analyzed and discussed. Lastly, we present an insight into potential research directions to complete the review. We hope this review will provide a brief insight for the researchers into scene text understanding.
text detection system for natural images is a very challenging task in Computer Vision. Image acquisition introduces distortion in terms of perspective, blurring, illumination, and characters which may have very diffe...
详细信息
text detection system for natural images is a very challenging task in Computer Vision. Image acquisition introduces distortion in terms of perspective, blurring, illumination, and characters which may have very different shape, size, and color. We introduce in this article a full text detection scheme. Our architecture is based on a new process to combine a hypothesis generation step to get potential boxes of text and a hypothesis validation step to filter false detections. The hypothesis generation process relies on a new efficient segmentation method based on a morphological operator. Regions are then filtered and classified using shape descriptors based on Fourier, Pseudo Zernike moments and an original polar descriptor, which is invariant to rotation. Classification process relies on three SVM classifiers combined in a late fusion scheme. Detected characters are finally grouped to generate our text box hypotheses. Validation step is based on a global SVM classification of the box content using dedicated descriptors adapted from the HOG approach. Results on the well-known ICDAR database are reported showing that our method is competitive. Evaluation protocol and metrics are deeply discussed and results on a very challenging street-level database are also proposed.
text in images is one of the most important cues for understanding a scene. In this paper, we propose a novel approach based on interest points to localize text in natural scene images. The main ideas of this approach...
详细信息
text in images is one of the most important cues for understanding a scene. In this paper, we propose a novel approach based on interest points to localize text in natural scene images. The main ideas of this approach are as follows: first we used interest point detection techniques, which extract the corner points of characters and center points of edge connected components, to select candidate regions. Second, these candidate regions were verified by using tensor voting, which is capable of extracting perceptual structures from noisy data. Finally, area, orientation, and aspect ratio were used to filter out non-text regions. The proposed method was tested on the ICDAR 2003 dataset and images of wine labels. The experiment results show the validity of this approach.
暂无评论