In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orie...
详细信息
ISBN:
(纸本)9781467321808;9781467321792
In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios.
A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea is to subject the available text to progressive classification, i.e., a multi-stage classificat...
详细信息
A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea is to subject the available text to progressive classification, i.e., a multi-stage classification scheme where, at each intermediate stage, a classifier is learnt that analyzes the textual fragments not reconciled at the end of the previous steps. Classification is accomplished by an ad hoc exploitation of traditional association mining algorithms, and is supported by a data transformation scheme which takes advantage of domain-specific dictionaries/ontologies. A key feature is the capability of progressively enriching the available ontology with the results of the previous stages of classification, thus significantly improving the overall classification accuracy. An extensive experimental evaluation shows the effectiveness of our approach.
We propose a new method for achieving robust text segmentation in images by using a stroke filter. It is known that to segment text accurately and robustly from a complex background is a very difficult task. Most of t...
详细信息
We propose a new method for achieving robust text segmentation in images by using a stroke filter. It is known that to segment text accurately and robustly from a complex background is a very difficult task. Most of the existing methods are sensitive to text color, size, font, and background clutter, because they use simple segmentation methods or require prior knowledge about text shape. In this paper, we attempt to consider the intrinsic characteristics of the text by using the stroke filter and design a new and robust algorithm for text segmentation. First, we describe the stroke filter briefly based on local region analysis. Second, the determination of text color polarity and local region growing procedures are performed successively based on the response of the stroke filter. Finally, the feedback procedure by the recognition score from an optical character recognition (OCR) module is used to improve the performance of text segmentation. By means of experiments on a large database, we demonstrate that the performance of our method is quite impressive from the viewpoints of the accuracy and robustness. (c) 2008 Elsevier B.V. All rights reserved.
In this paper we introduce a dynamic programming algorithm which performs linear text segmentation by global minimization of a segmentation cost function which incorporates two factors: (a)within-segment word similari...
详细信息
In this paper we introduce a dynamic programming algorithm which performs linear text segmentation by global minimization of a segmentation cost function which incorporates two factors: (a)within-segment word similarity and (b) prior information about segment length. We evaluate segmentation accuracy of the algorithm by precision, recall and Beeferman's segmentation metric. On a segmentation task which involves Choi's text collection, the algorithm achieves the best segmentation accuracy so far reported in the literature. The algorithm also achieves high accuracy on a second task which involves previously unused texts.
This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that ...
详细信息
This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, which may he domain-specific, that tend to be used near segment boundaries, Assessment of our approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, which combines precision and recall in a natural and flexible way. This metric is used to make a quantitative assessment of the relative contributions of the different feature types, as well as a comparison with decision trees and previously proposed text segmentation algorithms.
text segmentation is usually the first step taken towards the reuse and repurposing of PDF documents. Through experimental evaluation, we found that the leading text segmentation algorithms have limitations for contem...
详细信息
ISBN:
(纸本)9780769545202
text segmentation is usually the first step taken towards the reuse and repurposing of PDF documents. Through experimental evaluation, we found that the leading text segmentation algorithms have limitations for contemporary consumer magazines. We propose a new local homogeneity measure based on line space, and incorporate this new feature into a region growing algorithm. Using a fixed set of parameters, our algorithm achieved robust performance on PDF magazines with wide-ranging layouts and styles.
Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all questio...
详细信息
Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified, an automatic text segmentation approach analogous to text'filing is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some "important" sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness.
This paper presents a robust approach to segmenting text embedded in complex background. Our approach consists of four steps: smart sampling, unsupervised clustering, the Bayesian decision, post-processing. The experi...
详细信息
ISBN:
(纸本)9783540772545
This paper presents a robust approach to segmenting text embedded in complex background. Our approach consists of four steps: smart sampling, unsupervised clustering, the Bayesian decision, post-processing. The experimental results show that it works effectively, and is more efficient in removing complex background residues than the popular K-means method.
This paper describes a system which uses entity and topic coherence for improved text segmentation (TS) accuracy. First, Linear Dirichlet Allocation (LDA) algorithm was used to obtain topics for sentences in the docum...
详细信息
ISBN:
(数字)9783319529417
ISBN:
(纸本)9783319529417;9783319529400
This paper describes a system which uses entity and topic coherence for improved text segmentation (TS) accuracy. First, Linear Dirichlet Allocation (LDA) algorithm was used to obtain topics for sentences in the document. We then performed entity mapping across a window in order to discover the transition of entities within sentences. We used the information obtained to support our LDA-based boundary detection for proper boundary adjustment. We report the significance of the entity coherence approach as well as the superiority of our algorithm over existing works.
text contained in images and video frames provide important clues for information indexing and retrieval. But it is difficult to segment text from images, especially those images with complex background. This paper pr...
详细信息
text contained in images and video frames provide important clues for information indexing and retrieval. But it is difficult to segment text from images, especially those images with complex background. This paper presents a new conditional random field approach, in which contextual features are introduced into text segmentation. Local visual information and contextual label information are integrated into a conditional random field by several components. Some components focus on visual image information to predict the category within the image sites, while others focus on contextual label information to determine the patterns within the label field. Integrating contextual label information in conditional random field can effectively resolve local ambiguities and improve text segmentation performance in complex background. The comparing results demonstrate that the proposed method outperforms other methods for text segmentation from complex background. (C) 2010 Elsevier B.V. All rights reserved.
暂无评论