咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Statistical models for text se... 收藏

Statistical models for text segmentation

为文章分割的统计模型

作     者:Beeferman, D Berger, A Lafferty, J 

作者机构:Carnegie Mellon Univ Sch Comp Sci Pittsburgh PA 15213 USA 

出 版 物:《MACHINE LEARNING》 (机器学习)

年 卷 期:1999年第34卷第1-3期

页      面:177-210页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:ATR Interpreting Telecommunications Research Laboratories National Science Foundation, NSF, (IRI-9314969) Defense Advanced Research Projects Agency, DARPA, (DAAH04-95-1-075) International Business Machines Corporation, IBM 

主  题:exponential models text segmentation maximum entropy inductive learning natural language processing decision trees language modeling 

摘      要:This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, which may he domain-specific, that tend to be used near segment boundaries, Assessment of our approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, which combines precision and recall in a natural and flexible way. This metric is used to make a quantitative assessment of the relative contributions of the different feature types, as well as a comparison with decision trees and previously proposed text segmentation algorithms.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分