作者:
Zhang, YaoHunan Univ
Coll Chinese Language & Literature Changsha 410012 Peoples R China
Yu Zhi ZengDingQingWen Jian was the first official Manchu-Chinese bilingual dictionary of the Qing Dynasty. It added Chinese translation to the entries, employed Chinese characters to indicate the pronunciation of Man...
ISBN:
(纸本)9789819705825;9789819705832
Yu Zhi ZengDingQingWen Jian was the first official Manchu-Chinese bilingual dictionary of the Qing Dynasty. It added Chinese translation to the entries, employed Chinese characters to indicate the pronunciation of Manchu words, and utilized Manchu words to mark the pronunciation of Chinese words based on Yu Zhi Qing Wen Jian. Yu Zhi Zeng Ding Qing Wen Jian added 6,904 new entries, accounting for 37.05% of the total number of words. After the dictionary changed from Manchu monolinguals to Manchu-Chinese bilingual mode, it brought about great changes in the editing classification pattern and word collection, which not only inherited ancient Chinese dictionaries, but also had a profound impact on the editing of later Minority-Chinese bilingual and multilingual dictionaries.
Based on the theory of sentiment analysis and semantic prosody, this paper investigates the semantic shift of uncivilized words in bullet-screen comments by analyzing the sentiment polarity of co-occurrence emotional ...
ISBN:
(纸本)9789819705825;9789819705832
Based on the theory of sentiment analysis and semantic prosody, this paper investigates the semantic shift of uncivilized words in bullet-screen comments by analyzing the sentiment polarity of co-occurrence emotional words. According to the swear level and frequency of uncivilized words in the bullet-screen dataset, six kinds of words are selected for analysis. Within both the bulletscreen dataset and the BCC corpus, the semantic shifts of prototypes and their alternative forms are gauged using the sentiment lexicon and the PMI-IR algorithm. The paper offers a statistical analysis of the semantic shifts observed in these uncivilized terms and subsequently delves into the underlying motivations for these shifts.
This paper describes a study of unsupervised identification of Chinese VO idioms by examining the Verb-Object (VO) pairs derived from the dependency structure of sentences. We test several statistical measures, includ...
ISBN:
(纸本)9789819705856;9789819705863
This paper describes a study of unsupervised identification of Chinese VO idioms by examining the Verb-Object (VO) pairs derived from the dependency structure of sentences. We test several statistical measures, including Point-wise Mutual Information (PMI), P(o| v), P(v| o), Salience, and Selectional Association. The experiments show that PMI performs the best in automatically identifying real VO idioms, which is consistent with previous studies on other languages. On the other hand, PMI tends to rank low-frequency items (very often noise) high. It obtained a 36% F1 score in the successful identification of real VO idioms among the top 100 of the ranked VO pairs. We thus suggest that syntactic features are not enough to identify VO idioms in an unsupervised framework, and more sophisticated methods with consideration of more semantic information are required.
Word sense disambiguation (WSD) is a core task in computational linguistics that involves interpreting polysemous words in context by identifying senses from a predefined sense inventory. Despite the dominance of BERT...
ISBN:
(纸本)9783031705625;9783031705632
Word sense disambiguation (WSD) is a core task in computational linguistics that involves interpreting polysemous words in context by identifying senses from a predefined sense inventory. Despite the dominance of BERT and its derivatives in WSD evaluation benchmarks, their effectiveness in encoding and retrieving word senses, especially in languages other than English, remains relatively unexplored. This paper provides a detailed quantitative analysis, comparing various BERT-based models for Russian, and examines two primary WSD strategies: fine-tuning and feature-based nearest-neighbor classification. The best results are obtained with the ruBERT model coupled with the feature-based nearest neighbor strategy. This approach adeptly captures even fine-grained meanings with limited data and diverse sense distributions.
Evaluating multiple-choice questions (MCQs) involves either labor-intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we...
ISBN:
(纸本)9783031642982;9783031642999
Evaluating multiple-choice questions (MCQs) involves either labor-intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer Science, Humanities, and Healthcare, showing how it effectively distinguishes between flawed and flawless questions, providing a level of analysis beyond what is achievable with traditional metrics. With an accuracy rate of over 94% in detecting the presence of flaws identified by human evaluators, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.
Alzheimer's dementia (AD) has significant negative impacts on patients, their families, and society as a whole, both psychologically and economically. Recent research has explored combining speech and transcript m...
ISBN:
(纸本)9783031705656;9783031705663
Alzheimer's dementia (AD) has significant negative impacts on patients, their families, and society as a whole, both psychologically and economically. Recent research has explored combining speech and transcript modalities to leverage linguistic and acoustic features. However, many existing multimodal studies simply combine speech and text representations, use majority voting, or average predictions from separately trained text and speech models. To overcome these limitations, our article focuses on explainability and investigates the fusion of speech and text modalities using cross-attention. We convert audio to Log-Mel spectrograms and utilize text and image transformers (RoBERTa and ViT) for processing transcripts and spectrograms, respectively. By incorporating a cross-attention layer, we analyze the impact on accuracy. Our multimodal fusion model achieves 90.01% accuracy on the ADReSS Challenge dataset. Additionally, we explore the explainability of both modalities through transformer visualization techniques and an analysis of the vocabulary used by dementia and non-dementia classes.
Audiovisual speech synthesis is an important topic from different points of view: visualization helps to understand speech more easily in noisy environments, conversation is more natural, and clarity is much better fo...
ISBN:
(纸本)9783031705656;9783031705663
Audiovisual speech synthesis is an important topic from different points of view: visualization helps to understand speech more easily in noisy environments, conversation is more natural, and clarity is much better for hearing-impaired as well as other users. At the same time, its availability is limited to a much narrower selection of languages than speech-only synthesis, and language-independent methods of adding the visual part are not thoroughly tested for most languages. This paper presents the development of two methods of adapting audiovisual speech synthesis to Estonian. We reuse an existing neural speech synthesis model and adapt a speech-driven and text-driven approach to adding the visual part. We contrast the two developed solutions with pure audio in conditions with different noise levels and evaluate the clarity, naturalness, and pleasantness of the test samples via MOS scores. We also present a comparison of how computationally expensive these methods are. Our results show that while speech-driven visual counterpart generation is deemed more natural, the text-driven approach is computationally less demanding and can be used for real-time audiovisual speech synthesis. Also, according to the results all the presented models help to improve the clarity of synthesized speech in noisy conditions.
Group addressing terms are a linguistic phenomenon commonly used to reference groups in everyday speech. These terms not only reflect the cultural nuances within a language but also serve as valuable keywords in natur...
ISBN:
(纸本)9789819705856;9789819705863
Group addressing terms are a linguistic phenomenon commonly used to reference groups in everyday speech. These terms not only reflect the cultural nuances within a language but also serve as valuable keywords in natural language processing for examining various instances of bias and discrimination against disadvantaged groups in artificialintelligence. This paper presents a comprehensive Chinese group addressing terms dataset, constructed by collecting and annotating 2,483 such terms from diverse sources. The dataset encompasses 10 categories, including gender, race, and religion. Subsequently, the offensiveness of these group addressing terms is annotated through a combination of expert evaluations and crowdsourcing. In general, factors such as gender, age, educational background, and empathy do not exhibit a significant correlation with the perception of offensive group addressing terms. However, there are discernible differences in the perception of offensiveness when individuals evaluate terms that relate to their own respective groups. Offensiveness in group addressing terms shows both commonalities across different categories and distinctive characteristics unique to specific categories. Various linguistic traits can either amplify or diminish the perceived offensiveness. Beyond serving as a means of catharsis, offensive group addressing terms can also play a role in identity construction. When different group addressing terms are used as prompts, the text generated by language models reveals certain biases and stereotypes towards particular groups. In the future, this dataset can be leveraged not only for sociolinguistic research but also for the creation of fairness datasets in the field of natural language processing.
Question Difficulty Estimation (QDE) is a crucial task in many educational settings. Previous research focused on Natural Language Processing (NLP) to overcome the limitations of traditional QDE methods, but no work e...
ISBN:
(纸本)9783031642982;9783031642999
Question Difficulty Estimation (QDE) is a crucial task in many educational settings. Previous research focused on Natural Language Processing (NLP) to overcome the limitations of traditional QDE methods, but no work experimented the use of Knowledge Graphs (KGs) to provide a taxonomy of the topics assessed in exams. We propose two ways of incorporating KG information into existing models for QDE from text and, by experimenting on a publicly available dataset, show that they outperform the models that use text information exclusively, with a decrease in MAE of up to 8% with respect to the best-performing baseline (BERT-based QDE). We study how the models generalise to topics different from those used for training, and observe that while in most cases KGs are still capable of outperforming the baselines, a simpler model such as DistilBERT is more robust to previously unseen topics.
Knowledge Tracing (KT) plays a pivotal role in artificialintelligence in Education (AIED) by modeling and predicting learners' mastery of skills over time. While AIED Unplugged aims to adapt AI solutions for reso...
详细信息
ISBN:
(纸本)9783031643019;9783031643026
Knowledge Tracing (KT) plays a pivotal role in artificialintelligence in Education (AIED) by modeling and predicting learners' mastery of skills over time. While AIED Unplugged aims to adapt AI solutions for resource-constrained environments, integrating KT in such scenarios is challenging due to limited digital interaction and the feasibility of exploring advanced algorithms. This paper introduces KT Unplugged, addressing the gap in prior research by exploring and experimenting with creating, refining, and implementing state-of-the-art KT models within resource-limited contexts. The contributions of this paper are threefold. Firstly, we present and perform a procedure for simulating data collection in unplugged contexts, resulting in a dataset and replicable methodology for future field studies. Secondly, an empirical study focused on developing and validating KT models for resource-constrained devices, employing sophisticated (deep learning) and classical (Bayesian) algorithms. This contribution provides empirical evidence on the performance of KT unplugged, including a pre-trained model for numeracy education. Finally, a technical study assesses the deployment of the pre-trained model on disconnected, low-cost mobile devices, demonstrating the technical feasibility of KT Unplugged with acceptable inference times and maintained predictive power. By addressing the challenges of KT integration in unplugged scenarios, this research opens new avenues for personalized learning, adaptive instruction, and targeted interventions in education settings with limited infrastructure.
暂无评论