咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Graph-Based Lexicon Regulariza... 收藏

Graph-Based Lexicon Regularization for PCFG With Latent Annotations

作     者:Zeng, Xiaodong Wong, Derek F. Chao, Lidia S. Trancoso, Isabel 

作者机构:Univ Macau Dept Comp & Informat Sci Lab NLP2CT Macau Peoples R China Univ Lisbon Inst Super Tecn INESC ID P-1000029 Lisbon Portugal 

出 版 物:《IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING》 (IEEE Trans. Audio Speech Lang. Process.)

年 卷 期:2015年第23卷第3期

页      面:441-450页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 0702[理学-物理学] 

基  金:Science and Technology Development Fund of Macau Research Committee of the University of Macau [057/2014/A, MYRG076 (Y1-L2)-FST13-WF, MYRG070 (Y1-L2)-FST12-CS] national funds through Fundacao para a Ciencia e a Tecnologia (FCT) [UID/CEC/50021/2013] 

主  题:Graph propagation natural language processing neural word representation syntax parsing 

摘      要:This paper aims at learning a better probabilistic context-free grammar with latent annotations (PCFG-LA) by using a graph propagation (GP) technique. We propose leveraging the GP to regularize the lexical model of the grammar. The proposed approach constructs k-nearest neighbor (k-NN) similarity graphs over words with identical pre-terminal (part-of-speech) tags, for propagating the probabilities of latent annotations given the words. The graphs demonstrate the relationship between words in syntactic and semantic levels, estimated by using a neural word representation method based on Recursive autoencoder (RAE). We modify the conventional PCFG-LA parameter estimation algorithm, expectation maximization (EM), by incorporating a GP process subsequent to the M-step. The GP encourages the smoothness among the graph vertices, where different words under similar syntactic and semantic environments should have approximate posterior distributions of nonterminal subcategories. The proposed PCFG-LA learning approach was evaluated together with a hierarchical split-and-merge training strategy, on parsing tasks for English, Chinese and Portuguese. The empirical results reveal two crucial findings: 1) regularizing the lexicons with GP results in positive effects to parsing accuracy;and 2) learning with unlabeled data can also expand the PCFG-LA lexicons.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分