版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:School of Computer Science and Technology Harbin Institute of Technology Heilongjiang Harbin150001 China
出 版 物:《Neural Computing and Applications》 (Neural Comput. Appl.)
年 卷 期:2024年第36卷第24期
页 面:15091-15102页
核心收录:
学科分类:1205[管理学-图书情报与档案管理] 08[工学] 0835[工学-软件工程] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Embeddings
摘 要:Most of the bilingual lexicon induction (BLI) models learn a mapping function that can transfer word embedding (WE) spaces from one language to another. This usually relies on the isomorphism hypothesis, which posits that words in different languages share the same structures and relationships (i.e. similar in geometric structure). However, WE’s isomorphism weakens substantially in distant language pairs, resulting in low accuracy of BLI. To address this problem, we propose a novel BLI method incorporating synonymous knowledge. The main idea is to stabilize the distance between words to optimize the monolingual WE space, yielding higher isomorphism. Specifically, we first induce monolingual synonym pairs from Wordnet and construct monolingual synonym lexicons. We then generate pseudo-sentences by substituting words in the training corpus with synonyms. Finally, the original sentences and pseudo-sentences are jointly used to generate monolingual WEs, enabling the word vectors of synonyms to be closer naturally. Comprehensive experiments on standard BLI datasets in diverse distant languages demonstrate that our method significantly outperforms the strong BLI systems in word translation. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.