检索结果-内蒙古大学图书馆

text segmentation by integrating hybrid strategy and non-text filtering

MULTIMEDIA TOOLS AND APPLICATIONS 2022年第30期81卷 44505-44522页

作者： Li, Minhua Bai, Meng Lv, Yingjun Shandong Univ Sci & Technol Dept Elect Engn & Informat Technol 17 Sheng Lizhuang Rd Jinan 250031 Shandong Peoples R China

The text embedded in images provides important information for image understanding. text segmentation is an essential step for text recognition. It is often difficult to segment text from images at low resolution or with complex background. In this paper, a novel text segmentation framework is proposed to solve the problem. The proposed framework adopts a hybrid strategy integrating two different text segmentation methods to produce text candidates. One segmentation method is designed based on the intensity uniformity of text regions, while the other is developed by integrating the features of intensity and stroke width of text. To separate text pixels from the text candidates, a new non-text pixel filtering method is proposed. In the filtering method, an effective classifier is designed based on the number of breaking elements and the k-means clustering algorithm. The performance of the proposed segmentation framework is tested by the pixel-based and recognition-based evaluation methods. Experimental results show that the F-score of the proposed framework on the video caption dataset and born-digital dataset of ICDAR2013 are 95.29% and 89.09% respectively, while the correctly recognized character rate and word rate on the German TV public dataset are 91.00% and 72.33%. The experimental results indicate that the proposed text segmentation framework has excellent performance and high robustness in text segmentation and recognition.

关键词： text segmentation Intensity Stroke width Integration Non-text pixel filtering

来源：评论

学校读者我要写书评

暂无评论

text segmentation USING ROGET-BASED WEIGHTED LEXICAL CHAINS

引用

COMPUTING AND INFORMATICS 2013年第2期32卷 393-410页

作者： Tatar, Doina Inkpen, Diana Czibula, Gabriela Univ Babes Bolyai Cluj Napoca 400084 Romania Univ Ottawa Ottawa ON Canada

In this article we present a new method for text segmentation. The method relies on the number of lexical chains (LCs) which end in a sentence, which begin in the following sentence and which traverse the two successive sentences. The lexical chains are based on Roget's thesaurus (the 1987 and the 1911 version). We evaluate the method on ten texts from the DUC 2002 conference and on twenty texts from the CAST project corpus, using a manual segmentation as gold standard.

关键词： Lexical chains text segmentation topic boundaries Roget's thesaurus segmentation evaluation

来源：评论

学校读者我要写书评

暂无评论

text segmentation: A topic modeling perspective

引用

INFORMATION PROCESSING & MANAGEMENT 2011年第4期47卷 528-544页

作者： Misra, Hemant Yvon, Francois Cappe, Olivier Jose, Joemon Univ Glasgow Dept Comp Sci Glasgow G12 8QQ Lanark Scotland Xerox Res Ctr Europe Meylan France Univ Paris 11 Orsay France CNRS LIMSI F-91405 Orsay France TELECOM ParisTech Paris France CNRS LTCI Paris France

In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts. The proposed topic model based approaches consistently outperform a standard baseline method on several datasets. A major benefit of the proposed LDA based approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications such as segment retrieval and discourse analysis. However, the proposed approaches, especially the LDA based method, have high computational requirements. Based on an analysis of the dynamic programming (DP) algorithm typically used for segmentation, we suggest a modification to DP that dramatically speeds up the process with no loss in performance. The proposed modification to the DP algorithm is not specific to the topic models only;it is applicable to all the algorithms that use DP for the task of text segmentation., (C) 2010 Elsevier Ltd. All rights reserved.

关键词： text segmentation Topic modeling Latent Dirichlet allocation Semantic information Dynamic programming

来源：评论

学校读者我要写书评

暂无评论

text segmentation via Hierarchical Document Attention Model

引用

IEEE ACCESS 2023年 11卷 130296-130305页

作者： Wang, Yanhua Min, Chunfang Lanzhou Univ Technol Fac Arts Lanzhou 730000 Peoples R China

With the rapid development of natural language processing technology, text segmentation has become an important task in text processing. However, existing text segmentation methods often perform poorly when faced with long texts and complex structures, requiring a more efficient and accurate approach. In this paper, we propose a new text segmentation method based on the Hierarchical Document Attention (HDA), which automatically identifies and segments different paragraphs in the text by analyzing and weighting the hierarchical structure of the text sequence data. Compared with existing methods, the model has higher accuracy and efficiency, and better supports tasks such as text analysis and information extraction. The main contribution of this paper is the proposal of a text segmentation method based on the HDA, which effectively models text sequences through multi-level attention mechanisms. Experimental verification on public datasets shows that this model exhibits good performance in text segmentation tasks.

关键词： Hidden Markov models Task analysis text processing Probability distribution Machine learning Computational modeling Bayes methods Natural language processing text categorization Hierarchical systems Document handling text segmentation hierarchical document attention attention mechanism

来源：评论

学校读者我要写书评

暂无评论

text segmentation in color images using tensor voting

引用

IMAGE AND VISION COMPUTING 2007年第5期25卷 671-685页

作者： Lim, Jaeguyn Park, Jonghyun Medioni, Gerard G. Univ So Calif Inst Robot & Intelligent Syst Los Angeles CA 90089 USA

In natural scene, text elements are corrupted by many types of noise, such as streaks, highlights, or cracks. These effects make the clean and automatic segmentation very difficult and can reduce the accuracy of further analysis such as optical character recognition. We propose a method to drastically improve segmentation using tensor voting as the main filtering step. We first decompose an image into chromatic and achromatic regions. We then identify text layers using tensor voting, and remove noise using adaptive median filter iteratively. Finally, density estimation for center modes detection and K-means clustering algorithm is performed later for segmentation of values according to hue or intensity component in the improved image. Excellent results are achieved in experiments on real images. (c) 2006 Elsevier B.V. All rights reserved.

关键词： tensor voting text segmentation mean shift-based density estimation adaptive median filter color component analysis

来源：评论

学校读者我要写书评

暂无评论

text segmentation using superpixel clustering

引用

IET IMAGE PROCESSING 2017年第7期11卷 455-464页

作者： Zhu, Yuanping Zhang, Kuang Tianjin Normal Univ Dept Comp Sci 393 Binshuixi Rd Tianjin Peoples R China

text segmentation is important for text image analysis and recognition;however, it is challenging due to noise and complex background in natural scenes. Superpixel-based image representation can enhance robustness to noise and local disturbances, but conventional superpixel algorithms are difficult to obtain the complete stroke regions and accurate boundaries for text images. In this study, a text segmentation method based on superpixel clustering is proposed. First, to generate accurate superpixels for text images, an adaptive simple linear iterative clustering-based text superpixel generation algorithm is proposed. The adaptive superpixel size and compactness are calculated to enhance boundary adherence. Second, to increase the complete coverage of strokes from superpixels, superpixel clustering merges homogeneous superpixels into larger regions for both strokes and the background. A modified density-based spatial clustering of applications with noise is proposed. Finally, stroke superpixel verification assigns each region to a stroke or to the background and the text segmentation result is obtained. The proposed method shows promising robustness to noise and complex background textures. Experimental results on the Korea Advanced Institute of Science and Technology (KAIST) scene text dataset, International Conference on Document Analysis and Recognition (ICDAR) 2003 natural scene text image dataset and Street View text dataset verify that this method is effective and significantly outperforms existing methods.

关键词： text detection document image processing image representation image enhancement image resolution image segmentation pattern clustering natural scenes iterative methods image texture text segmentation text image analysis text image recognition natural scenes superpixel-based image representation local disturbances text image superpixels adaptive linear iterative clustering-based text superpixel generation adaptive superpixel size adaptive superpixel compactness boundary adherence homogeneous superpixels modified density-based spatial clustering stroke superpixel verification KAIST scene text dataset ICDAR2003 natural scene text image dataset Street View text dataset

来源：评论

学校读者我要写书评

暂无评论

text segmentation by product partition models and dynamic programming

引用

MATHEMATICAL AND COMPUTER MODELLING 2004年第2-3期39卷 209-217页

作者： Kehagias, A Nicolaou, A Petridis, V Fragkou, P Aristotle Univ Thessaloniki Fac Engn Dept Math Phys & Comp Sci GR-54006 Thessaloniki Greece Univ Macedonia Dept Business Adm Thessaloniki Greece Aristotle Univ Thessaloniki Fac Engn Dept Elect & Comp Engn GR-54006 Thessaloniki Greece

In this paper, we use Barry and Hartigan's Product Partition Models to formulate text segmentation as an optimization problem, which we solve by a fast dynamic programming algorithm. We test the algorithm on Choi's segmentation benchmark and achieve the best segmentation results so far reported in the literature. (C) 2004 Elsevier Ltd. All rights reserved.

关键词： text segmentation dynamic programming product partition models

来源：评论

学校读者我要写书评

暂无评论

text segmentation of spoken meeting transcripts

引用

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY 2008年第3-4期11卷 157-165页

作者： Sharp, Bernadette Chibelushi, Caroline Staffordshire Univ FCET Beaconside Stafford ST18 0AD England

text segmentation has played an important role in information retrieval as well as natural language processing. Current segmentation methods are well suited for written and structured texts making use of their distinctive macro-level structures;however text segmentation of transcribed multi-party conversation presents a different challenge given its ill-formed sentences and the lack of macro-level text units. This paper describes an algorithm suitable for segmenting spoken meeting transcripts combining semantically complex lexical relations with speech cue phrases to build lexical chains in determining topic boundaries.

关键词： text segmentation Lexical chaining Multi-party transcript analysis

来源：评论

学校读者我要写书评

暂无评论

text segmentation in Ancient Topographic Maps and Floor Plans with Support Vector Data Description

Text Segmentation in Ancient Topographic Maps and Floor Plan...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Machado, S. C. S. Mello, C. A. B. Univ Fed Pernambuco Ctr Informat Recife PE Brazil

ISBN: (纸本)9781479919598

Images of ancient maps and floor plans can present a great challenge for common character recognition tools. Besides the damage caused by time and handling, these documents have an important part of their information described graphically. In most examples, drawings of rivers or walls occupy most part of the document. Usually, text has different styles, sizes and orientations with possible overlapping with graphics. This paper presents a new method for text segmentation in images of ancient topographic maps and floor plans that uses a machine learning algorithm specialized in novelty detection to decide which components of the image are textual. Despite using artificial text examples for training, the method is able to outperform other state-of-the-art methods when applied to real images.

关键词： text segmentation topographic maps floor plans support vector data descriptor

来源：评论

学校读者我要写书评

暂无评论

text segmentation Based on PLSA-textTiling Model

Text Segmentation Based on PLSA-TextTiling Model

引用

International Conference on Mechatronics Engineering and Computing Technology (ICMECT)

作者： Zheng, YuChao East China Jiaotong Univ Nanchang 330013 Peoples R China

ISBN: (纸本)9783038351153

text segmentation is very important for many fields including information retrieval, summarization, language modeling, anaphora resolution and so on. text segmentation based on PLSA-textTiling associates different latent topic swith observable pairs of word and sentence. In the experiments, the whole sentences are taken as elementary blocks. PLSA model is used to calculated similarity metric basing on the idea of TestTiling and several approaches to discovering boundaries are tried. The results show the P mu value is 0.87, which is better than that of other algorithms of text segmentation.

关键词： text segmentation probabilistic latent semantic analysis (PLSA) similarity metric boundary discovering

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：