咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >A new Chinese word segmentatio... 收藏
IPPTA: Quarterly Journal of Indian Pulp and Paper Technical ...

A new Chinese word segmentation method based on character scope in the field of geoinformation

作     者:Wang, Hongling Liu, Gang 

作者机构:Geoinformation Engineering China University of Geosciences Wuhan China Software Engineering East China University of Technology Nanchang China Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology Nanchang China 

出 版 物:《IPPTA: Quarterly Journal of Indian Pulp and Paper Technical Association》 (IPPTA)

年 卷 期:2018年第30卷第1期

页      面:448-457页

核心收录:

学科分类:1205[管理学-图书情报与档案管理] 0810[工学-信息与通信工程] 0709[理学-地质学] 082903[工学-林产化学加工工程] 08[工学] 0829[工学-林业工程] 0714[理学-统计学(可授理学、经济学学位)] 082201[工学-制浆造纸工程] 0701[理学-数学] 0822[工学-轻工技术与工程] 

基  金:This work was supported by Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology (Grant: JELRGBDT201705)  Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology (Grant: JELRGBDT201708) and the Science and Technology Foundation of Jiangxi Province Education Department (Grant: GJJ160555) 

主  题:Hidden Markov models 

摘      要:The Chinese word segmentation applied in the geoinformation field has played important role in the specialized search of geological information, abstract extraction of geological literature, automatic classification of geological document. Chinese word segmentation is a key step in geoinformation processing. However, the majority of traditional Chinese word segmentation methods can t achieve satisfactory results and meet the needs of practical application when they are applied into geoinformation field. This paper presents a new Chinese word segmentation algorithm for geoinformation on the basis of divide-and-conquer strategy, which aims at improving segmentation accuracy rate that brought by low adaptability of existing Chinese word segmentation. After a coarse segmentation to the Chinese text, this algorithm divides specialized terminology in geology in priority, and then adds specific tagging to the segmentation to get an implicit temporary result for subsequent processing. In order to recognize the intersectional ambiguity fields further, a Reverse Minimum Matching segmentation algorithm based on character scope is designed and a second segmentation process is conducted for the implicit temporary result. At last, ambiguity resolution is made to partial ambiguity words by Hidden Markov Model (HMM). For the HMM processing we propose a class factor matrix to record the most impossible state level. Through testing and result comparison, this segmentation algorithm owns distinct advantage in segmentation accuracy of geoinformation literature. © 2018 Indian Pulp and Paper Technical Association. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分