This paper presents a human-in-the-loop approach to address the challenge of low-resource neural machine translation (NMT), focusing on the Tibetan-Chinese language pair. We emphasize the crucial role of human feedbac...
详细信息
ISBN:
(纸本)9789819622917;9789819622924
This paper presents a human-in-the-loop approach to address the challenge of low-resource neural machine translation (NMT), focusing on the Tibetan-Chinese language pair. We emphasize the crucial role of human feedback in both data augmentation and model optimization. First, we construct a large-scale Tibetan-Chinese parallel corpus by iteratively leveraging back-translation and incorporating human evaluation to guide the generation of high-quality synthetic data. Then, we train a multilingual NMT system using a curriculum learning strategy, progressively incorporating the augmented data. Finally, we fine-tune our model with GaLore and SimPO algorithms, directly optimizing it towards human preferences as assessed by professional translators. Experimental results on the CCMT 2024 Tibetan-Chinese translation task demonstrate that our approach significantly improves translation quality, achieving state-of-the-art performance. We provide further analysis and case studies to illustrate the effectiveness of our human-in-the-loop methodology.
There is a broad range of languages around the globe, each of which has a variety of spoken dialects. Dialect refers to a variety in a language's vocabulary, idioms, grammar, and pronunciation. Dialect Identificat...
详细信息
ISBN:
(纸本)9783031804373;9783031804380
There is a broad range of languages around the globe, each of which has a variety of spoken dialects. Dialect refers to a variety in a language's vocabulary, idioms, grammar, and pronunciation. Dialect Identification (DID) is an automatic process to detect to which dialect a speech utterance belongs. Moreover, Arabic Dialect Identification (ADI) is considered one of the most challenging DID tasks since some dialects share many vocabularies, and there is a massive overlap in the linguistic and acoustic features between the countries that belong to the same region. Furthermore, while the work is saturated on DID systems in other languages, the Arabic language still lags behind. Therefore, we present an approach that utilizes transfer learning techniques for classifying both regions and countries. We used a subset of the MGB-5 dataset, and employed Mel-Frequency Cepstral Coefficients (MFCC) as a feature extraction method, then fed the features into the ResNet-LSTM model. However, in this method, we proposed a multi-output approach that combines the classification of regions and countries into one single model, which proved to achieve better results than the single-output approach. After that, we utilized a pre-trained audio model that proved to be highly effective for language-speech tasks, called Wav2Vec2. We used the XLSR variant fine-tuned on the MGB-3 dataset, to train the model on the same small subset and it achieved an accuracy of 93.49% and 93.20% on regions and countries, respectively, which shows the potential of applying state-of-the-art pre-trained audio models on the ADI task.
This paper mainly introduces the basic situation of Lan-Bridge's participation in the CCMT 2024 machine translation evaluation project. In this evaluation, we participate in the bilingual translation projects for ...
详细信息
ISBN:
(纸本)9789819622917;9789819622924
This paper mainly introduces the basic situation of Lan-Bridge's participation in the CCMT 2024 machine translation evaluation project. In this evaluation, we participate in the bilingual translation projects for three minority languages: Mongolian to Chinese, Tibetan to Chinese, and Uygur to Chinese. We adopt the Transformer model based on the self-attention network as our foundation and train three machine translation models for these three language pairs. The paper primarily discusses the specific methods and experimental details employed by the model, providing an in-depth analysis and discussion of the model performance in the bilingual translation tasks for the three minority languages.
This study explores the impact of learner perceptions across the cognitive, affective, and psychomotor domains on educational outcomes using extended reality-based nursing simulation. A cohort of 113 nursing students ...
详细信息
ISBN:
(纸本)9783031804748;9783031804755
This study explores the impact of learner perceptions across the cognitive, affective, and psychomotor domains on educational outcomes using extended reality-based nursing simulation. A cohort of 113 nursing students participated in this research. These students engaged in simulation exercises and concurrently evaluated their experiences from cognitive, affective, and psychomotor perspectives. This research assessed how these perceptions influenced vital learning outcomes, including self-efficacy, focus of attention, learning satisfaction, and SIM-TLX scores. Notably, our results indicate that perceptions within the psychomotor domain significantly enhanced the focus of attention. Conversely, perceptions in the cognitive and affective domains did not significantly impact other learning outcomes. These findings suggest that recognizing and incorporating learners' self-assessed competencies in designing and implementing extended reality-based simulations in nursing education can significantly enhance the educational impact. This approach may lead to improved targeted outcomes by tailoring educational experiences that reflect learners' strengths and needs, thus optimizing overall educational effectiveness.
Virtual Reality (VR) presents a promising avenue for enhancing student engagement and learning outcomes in education. This work aimed to identify key design principles for effective VR learning environments and evalua...
详细信息
ISBN:
(纸本)9783031804748;9783031804755
Virtual Reality (VR) presents a promising avenue for enhancing student engagement and learning outcomes in education. This work aimed to identify key design principles for effective VR learning environments and evaluate VR's impact on student engagement, motivation, and knowledge retention. The research revealed considerable potential for VR-based learning in the STEM field, particularly in the area of human-computer interaction (HCI). This was achieved through a comprehensive approach involving online surveys, expert and student interviews, and mixed-methods VR studies. The results show that memory, attention, relevance, confidence, and satisfaction improve compared to traditional methods described in the Attention, Relevance, Confidence, and Satisfaction (ARCS) model. The Technology Acceptance Model further highlighted VR's high acceptance and usability among students, suggesting its strategic incorporation into educational frameworks. Leveraging VR's immersive capabilities can invigorate student learning experiences, creating dynamic environments that resonate with modern learners through experiential learning and real-world applications. The VR system developed here effectively engages students with a "learning by doing" approach. However, it also highlights ongoing issues, including time-consuming setups, limited user control, and accessibility challenges in VR learning.
Research on the effectiveness of immersive VR for science learning has found mixed results when VR learning is compared to traditional learning. While media comparison studies have been criticised for their methodolog...
详细信息
ISBN:
(纸本)9783031804748;9783031804755
Research on the effectiveness of immersive VR for science learning has found mixed results when VR learning is compared to traditional learning. While media comparison studies have been criticised for their methodological problems, they provide important information for policy makers as well as for educators. We believe the mixed results were found because many VR learning environments do not take advantage of the unique affordances of VR. We therefore designed Looking Inside Cells (LIC), a set of VR simulations that are interactive, immersive, use spatial features of VR, and take advantage of emotional design. We compared LIC to a slide show of the same content in a quasi-experimental between-subject control group design (N = 63). Outcome variables were recall, comprehension, interest, motivation, and experienced emotions. Results indicate that learners in the VR group scored significantly higher than learners in the control group on some dimensions of the post assessment. The treatment group reported higher levels of positive affect and interest. The results endorse the idea that adopting an affordances approach can enhance the effectiveness of VR in a learning context.
This paper examines the impact of Authenticator Assurance Levels (AAL) on user efficiency in authentication processes, aiming to balance security and usability. The central research question investigates how AAL level...
详细信息
ISBN:
(纸本)9783031720406;9783031720413
This paper examines the impact of Authenticator Assurance Levels (AAL) on user efficiency in authentication processes, aiming to balance security and usability. The central research question investigates how AAL levels influence user efficiency, particularly concerning the number of required "keys". Using S-BPM notation, the study offers insights into the authentication landscape through a subject- and process-oriented comparison. Considering how subjects handle authenticators, it's evident that an authenticator app may be preferable to a hardware key in certain contexts, as users are typically vigilant about their smartphones. The analysis concludes that higher AAL levels, like AAL3, offer a more streamlined authentication experience with fewer required "keys", aligning with users' daily interaction patterns and enhancing overall usability. Integrating these findings into cybersecurity practices can enhance multi-factor authentication methods, fostering a more secure and user-centric environment.
Under the framework of Language Transfer and Prosodic Learning Interference Hypothesis, this study explores the influence of mother tongue on Chinese EFL learners' L2 English prosodic disambiguation on the basis o...
详细信息
ISBN:
(纸本)9789819610440;9789819610457
Under the framework of Language Transfer and Prosodic Learning Interference Hypothesis, this study explores the influence of mother tongue on Chinese EFL learners' L2 English prosodic disambiguation on the basis of Chinese EFL learners' disambiguation in their L1 Chinese. The four parameters, duration, pause (if present), pitch (converted to semitone), and amplitude integral, are extracted. The results indicate (1) duration is the cue that functions efficiently. English natives produce C2 in NP longer than in VP, and this difference is significant. (2) In their mother language, Chinese EFL learners can disambiguate VP/NP ambiguity effectively. To be specific, participants produce both C1 and C2 in VP with longer duration, larger pitch range and greater amplitude integral, and this difference reach the significance level. (3) In their L2 English, Chinese EFL learners, like native English speakers, can effectively utilize duration to achieve the differentiating goal, but their specific performance is different from English natives' in that English natives show significant difference in duration ratio and C2's duration, while L2 learners show the difference in C1, which is partly influenced by their L1 Chinese and can be explained by Prosodic Learning Interference Hypothesis.
This study aims to extract meaningful information and explore some well-known Arabic-based transformer models in masked language modeling while investigating the interplay of the minimal length of the sentence and the...
详细信息
ISBN:
(纸本)9783031804373;9783031804380
This study aims to extract meaningful information and explore some well-known Arabic-based transformer models in masked language modeling while investigating the interplay of the minimal length of the sentence and the percentage of masking. To this aim, we have used, for the first time, the opinions expressed in the comments of some trend YouTube channels in Morocco. By controlling the granularity of the input sequences during training, our findings indicate that employing 15 tokens as a minimal value can serve as an effective parameter during the fine-tuning of Arabic masked language models across the 15% and 40% masking rates.
Moroccan Arabic (MA) dialect is a low resource language. To perform any NLP task, we have to develop the necessary resources from scratch. This paper introduces our work on MAOffens, the first MA dataset for offensive...
详细信息
ISBN:
(纸本)9783031804373;9783031804380
Moroccan Arabic (MA) dialect is a low resource language. To perform any NLP task, we have to develop the necessary resources from scratch. This paper introduces our work on MAOffens, the first MA dataset for offensive language detection. The dataset will serve to build predictive models to detect offensive content widely present on social media and hence help ensure online safety. We built the dataset with a mixture of comments in Arabic and Latin scripts to cover offensiveness in both cases. The resulting dataset consists of 23k comments totally balanced. The dataset is open to the public (https://***/datasets/randa/maoffens). We evaluated the annotation and classification power of the dataset through various classifier architectures. Our best performing classifier was based on a MA transformer model.
暂无评论