版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Shatin N.T. Hong Kong China
出 版 物:《International Journal of Computer Processing of Languages》
年 卷 期:2001年第14卷第1期
页 面:47-69页
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
摘 要:Noun phrases are commonly used for generating index terms for information retrieval systems. Therefore, we need an effective noun phrase extraction method. In this paper, we propose an approach to extract maximal noun phrases from Chinese text. Although previous studies have been proposed to extract noun phrases, most of them are only applicable to Western languages. To the best of our knowledge, very few has handled Chinese text. Many existing approaches for Western languages made use of statistical methods. However, due to the complicated structure of maximal Chinese noun phrase, pure statistical approaches are not effective. We attempt to improve the performance of a statistical method by integrating it with the transformation-based error-driven learning (TEL) technique. Our methodology includes two modules. The first module applies a statistical method to extract Chinese noun phrases. The performance of this approach, in terms of precision and recall, is investigated. The second module applies the TEL algorithm to further refine the output of the first module. The TEL algorithm automatically learns a set of transformation rules to fix the errors that are obtained through comparing the output of the first module with the correctly annotated corpus. The learned rules can be applied to sentences in any corpus one by one to correct the errors. The TEL algorithm is shown to be effective in improving the precision.