咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Addressing Domain Mismatch in ... 收藏
IEEE Transactions on Audio, Speech and Language Processing

Addressing Domain Mismatch in Unsupervised Neural Machine Translation

作     者:Youyuan Lin Rui Wang Chenhui Chu 

作者机构:Graduate School of Informatics Kyoto University Kyoto Japan Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China 

出 版 物:《IEEE Transactions on Audio, Speech and Language Processing》 

年 卷 期:2025年第33卷

页      面:472-482页

基  金:JSPS KAKENHI 

主  题:Training Adaptation models Translation Neural machine translation Data models Speech processing 

摘      要:Pretrained models have taken full advantage of monolingual corpora and achieved impressive results in training Unsupervised Neural Machine Translation (UNMT) models. However, when adapting UNMT models with in-domain monolingual corpora for domain-specific translation tasks, one of the languages may lack in-domain corpora, resulting in the unequal amount and proportion of in-domain monolingual corpora in each language. This problem situation is known as Domain Mismatch (DM). This study investigates the impact of DM in UNMT. We find that DM causes a translation quality disparity. That is, while in-domain monolingual corpora of a language can enhance the in-domain translation quality into that particular language, this enhancement cannot be generalized to the other language, and the translation quality into the other language remains deficient. To address this problem, we propose Domain-Aware Adaptation (DAA), which can be embedded in the vanilla UNMT model training process. By passing sentence-level domain information to the model during training and inference, DAA gives higher weight to in-domain data from open-domain corpora related to specific domains to alleviate domain mismatch. The experimental results on German-English and Romanian-English translation tasks specified in the IT, Koran, medical, and TED2020 domains demonstrate that DAA can efficiently exploit open-domain corpora to mitigate the quality disparity of translation caused by DM.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分