版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Hassan 2 Data Sci & Artificial Intelligence Casablanca 20000 Morocco Languages & Cultures Lab Mohammadia Morocco
出 版 物:《ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING》 (ACM Trans. Asian Low Res. Lang. Inf. Process.)
年 卷 期:2023年第22卷第12期
页 面:1-17页
核心收录:
基 金:Centre National pour la Recherche Scientifique et Technique (CNRST) Morocco
主 题:Dimensionality Reduction Techniques post-processing algorithm Arabic machine translation Transformer
摘 要:Word embeddings are widely deployed in a tremendous range of fundamental natural language processing applications and are also useful for generating representations of paragraphs, sentences, and documents. In some contexts involving constrained memory, it may be beneficial to reduce the size of word embeddings since they represent a core component of several natural language processing tasks. By reducing the dimensionality of word embeddings, their usefulness in memory-limited devices can be significantly improved, yielding gains in many real-world applications. This article aims to provide a comparative study of different dimensionality reduction techniques to generate efficient lower-dimensional word vectors. Based on empirical experiments carried out on the Arabic machine translation task, we found that the post-processing algorithm combined with independent component analysis provides optimal performance over the considered dimensionality reduction techniques. Therefore, we arrive at a new combination of the post-processing algorithm and dimensionality reduction (independent component analysis) techniques, which has not been investigated before. The latter was applied to both contextual and non-contextual word embeddings to reduce the size of the vectors while achieving a better translation quality than the original ones.