版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Geoinformation Engineering China University of Geosciences Wuhan China Software Engineering East China University of Technology Nanchang China Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology Nanchang China
出 版 物:《IPPTA: Quarterly Journal of Indian Pulp and Paper Technical Association》 (IPPTA)
年 卷 期:2018年第30卷第1期
页 面:448-457页
核心收录:
学科分类:1205[管理学-图书情报与档案管理] 0810[工学-信息与通信工程] 0709[理学-地质学] 082903[工学-林产化学加工工程] 08[工学] 0829[工学-林业工程] 0714[理学-统计学(可授理学、经济学学位)] 082201[工学-制浆造纸工程] 0701[理学-数学] 0822[工学-轻工技术与工程]
基 金:This work was supported by Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology (Grant: JELRGBDT201705) Jiangxi Engineering Laboratory on Radioactive Geoscience and Big Data Technology (Grant: JELRGBDT201708) and the Science and Technology Foundation of Jiangxi Province Education Department (Grant: GJJ160555)
摘 要:The Chinese word segmentation applied in the geoinformation field has played important role in the specialized search of geological information, abstract extraction of geological literature, automatic classification of geological document. Chinese word segmentation is a key step in geoinformation processing. However, the majority of traditional Chinese word segmentation methods can t achieve satisfactory results and meet the needs of practical application when they are applied into geoinformation field. This paper presents a new Chinese word segmentation algorithm for geoinformation on the basis of divide-and-conquer strategy, which aims at improving segmentation accuracy rate that brought by low adaptability of existing Chinese word segmentation. After a coarse segmentation to the Chinese text, this algorithm divides specialized terminology in geology in priority, and then adds specific tagging to the segmentation to get an implicit temporary result for subsequent processing. In order to recognize the intersectional ambiguity fields further, a Reverse Minimum Matching segmentation algorithm based on character scope is designed and a second segmentation process is conducted for the implicit temporary result. At last, ambiguity resolution is made to partial ambiguity words by Hidden Markov Model (HMM). For the HMM processing we propose a class factor matrix to record the most impossible state level. Through testing and result comparison, this segmentation algorithm owns distinct advantage in segmentation accuracy of geoinformation literature. © 2018 Indian Pulp and Paper Technical Association. All rights reserved.