The automatic speech recognition (ASR) domain has advanced considerably with the emergence of large transformer-based models, such as OpenAI'sWhisper. This paper presents an experimental-based evaluation of the Wh...
ISBN:
(纸本)9783031779602;9783031779619
The automatic speech recognition (ASR) domain has advanced considerably with the emergence of large transformer-based models, such as OpenAI'sWhisper. This paper presents an experimental-based evaluation of the Whisper models, focusing on its performance under various acoustic conditions and input configurations. We specifically examine the effects of audio transformations such as white and Gaussian noise, reverberation, time stretch, and pitch shift, as well as the impact of varying chunk lengths. The findings suggest that while Whisper models are capable of dealing with minimal background noise and demonstrate commendable performance in clean audio conditions, their performance degrades rapidly when subjected to more severe audio transformations and noise, particularly when using shorter chunk lengths. This study contributes valuable insights into the Whisper model's capabilities and limitations, particularly when it comes to real-time speech recognition, offering guidance for future improvements in ASR technology.
With the recent rise of large language models (LLMs), in-context learning (ICL) has shown remarkable performance, eliminating the need for fine-tuning parameters and reducing the reliance on extensive labeled data. Ho...
ISBN:
(纸本)9789819794362;9789819794379
With the recent rise of large language models (LLMs), in-context learning (ICL) has shown remarkable performance, eliminating the need for fine-tuning parameters and reducing the reliance on extensive labeled data. However, the intricacies of cross-lingual ICL remain underexplored. Prior studies on cross-lingual ICL overlooked the significance of language-specific nuances, neglecting the intrinsic linguistic properties of sentences and the interlingual connections between sentences in different languages. In this paper, we propose a novel cross-lingual prompt structure: Language-Emphasized cross-lingual In-context learning (LEI). LEI teaches LLMs how to adapt to language conversion by adding explicit language conversion examples in demonstrations. Specifically, LEI introduces a third language (example language) as an example of language conversion to adapt LLMs to language conversion in cross-lingual tasks. In addition, language alignment of demonstrations is achieved by adding language aligners and label aligners. Extensive experiments validate the state-of-the-art performance of LEI on 42 cross-lingual tasks.
This study aims to investigate the perception of speech rate in Russian. The goal is to examine the impact of pauses duration, articulation rate, as well as speaker's and listener's gender on speech rate perce...
ISBN:
(纸本)9783031779602;9783031779619
This study aims to investigate the perception of speech rate in Russian. The goal is to examine the impact of pauses duration, articulation rate, as well as speaker's and listener's gender on speech rate perception. The relevance of this study lies in its potential to improve speech synthesis systems by understanding how speech rate perception varies, particularly between genders. The novelty of the research is in its comprehensive analysis of the impact of gender, articulation speed, and pauses on speech rate perception based on Russian language material. The material of the current study consists of recordings of 2 texts read by 55 native Russian speakers. The series of auditory perceptual experiments contained both natural and modified stimuli. The results of the experiments revealed the impact of speaker's gender on speech rate perception, but no impact of listener's gender. We also found the effect of pauses duration on overall speech rate perception. The results of the study showed that we can describe the speech rate perception through its intrinsic characteristics, such as pausing and articulation rate. We also proposed a model of speech rate perception in Russian language. Although this study did not focused on the effect that speech style and other factors may produce, the data obtained allow identifying important patterns in listener's behavior, that can be applied in natural language processing, as well as in artificialintelligence systems.
Relation classification (RC) is commonly the second step in a relation extraction pipeline, which asserts the relation of two identified entities based on their context. The latest trend for dealing with the task reso...
ISBN:
(纸本)9789819794331;9789819794348
Relation classification (RC) is commonly the second step in a relation extraction pipeline, which asserts the relation of two identified entities based on their context. The latest trend for dealing with the task resorts to pre-trained language models (PLMs). It transforms the discriminative RC into a linguistics problem and fully induces the language knowledge PLMs derived from pre-training. Despite the visible progress, existing approaches handle only one relation between each entity pair while workless in real cases where multiple relations may be valid, i.e., entity pair overlap (EPO), leading to their limited applications. In this paper, we introduce ConFit, a novel contrastive learning based approach that fine-tunes text-to-text PLM for relation classification. ConFit reformulates RC as a restoration problem of textualized relations, and it learns to bind sequential words of a candidate relation with a probability mass above or below a threshold, corresponding to whether the relation truly holds. As a result, the learned model adaptively phrases diverse relations through a decoding, scoring, and selection workflow to fit EPO scenarios. Extensive experiments on four widely used datasets evidence that T5-large fine-tuned with ConFit significantly outperforms previous methods, whether single or multiple relations exist.
This paper presents an overview of the second Chinese Essay Discourse Logic Evaluation and Integration (CEDLEI) competition, which is organized as NLPCC 2024 Shared Task 4. As the second edition of the competition, th...
ISBN:
(纸本)9789819794423;9789819794430
This paper presents an overview of the second Chinese Essay Discourse Logic Evaluation and Integration (CEDLEI) competition, which is organized as NLPCC 2024 Shared Task 4. As the second edition of the competition, this task expands and improves upon the success of the first edition. Based on the foundation of inter-sentence and inter-paragraph logical relation identification in the first edition, this year's competition introduces the addition of discourse coherence feedback generation. These improvements aim to reduce the burden of essay marking on teachers while providing students with more consistent and objective assessments. Through these tracks, participants will explore and apply the latest natural language processing techniques to enhance the accuracy and efficiency of Chinese composition assessment. Task information is available at https://***/cubenlp/NLPCC-2024-Shared-Task4.
As code-switching on social media becomes more common, the amount of bilingual users and code-switched data is rapidly increasing. This presents unique challenges for sentiment analysis, which has traditionally focuse...
ISBN:
(纸本)9783031780134;9783031780141
As code-switching on social media becomes more common, the amount of bilingual users and code-switched data is rapidly increasing. This presents unique challenges for sentiment analysis, which has traditionally focused on monolingual text. Sentiment analysis involves categorizing sentiments from comments, reviews, or tweets into positive, negative, or neutral. To address the scarcity of research in code-switched sentiment analysis, this paper contributes the first code-switched corpus for Egyptian Arabic-English sentiment analysis (EESA), featuring 4,100 annotated YouTube comments. Another contribution is the implementation of various sentiment analysis models, including traditional neural models (BiLSTM-Attention and Hybrid-Transformer) and the utilization of advanced language models (Gemini and GPT). Traditional models, using non-contextual, contextual, and character embeddings with additional word features, achieved an ensemble test F1-score of 92.54%. Advanced models, evaluated in zero-shot and fine-tuned configurations, showed significant potential. Gemini-1.5 and GPT-4o performed well as zero-shot models, and the fine-tuned GPT-3.5 model achieved the highest test F1-score of 92.67%. This work compares traditional and advanced models, highlighting advancements in code-switched sentiment analysis.
Dialogue Discourse Parsing aims to identify the discourse links and relations between utterances, which has attracted more interest in recent years. Previous studies either adopt local optimization to independently se...
ISBN:
(纸本)9789819794300;9789819794317
Dialogue Discourse Parsing aims to identify the discourse links and relations between utterances, which has attracted more interest in recent years. Previous studies either adopt local optimization to independently select one parent for each utterance or use global optimization to directly get the tree representing the dialogue structure. However, the influence of these two optimization methods remains less explored. In this paper, we aim to systematically inspect their performance. Specifically, for local optimization, we use local loss during the training stage and a greedy strategy during the inference stage. For global optimization, We implement optimization of unlabeled and labeled trees by structured losses including Max-Margin and TreeCRF, and exploit Chu-Liu-Edmonds algorithm during the inference stage. Experiments shows that the performance of these two optimization methods is closely related to the characteristics of the dataset, and global optimization can reduce the burden of identifying long-range dependency relations.
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics ...
ISBN:
(纸本)9789819794331;9789819794348
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct PersonalityEdit, a new benchmark dataset to address this task. Drawing on the theory in Social Psychology [10], we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that align with a specified topic and embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can stimulate further annotation in model editing and personality-related research.
Humor, a fundamental aspect of human communication, poses a formidable challenge for computational systems. Drawing inspiration from the theory of incongruity, we introduce a novel Dual Graph Attention Network-based F...
ISBN:
(纸本)9789819794393;9789819794409
Humor, a fundamental aspect of human communication, poses a formidable challenge for computational systems. Drawing inspiration from the theory of incongruity, we introduce a novel Dual Graph Attention Network-based Feature Extraction Model (DGFEM) tailored specifically for humor recognition. This model constructs a comprehensive relational graph among linguistic tokens, thereby capturing the essence of incongruity that often underlies humorous content. Furthermore, it integrates WordNet for leveraging synonym relationships, enhancing the model's capacity to identify ambiguity, a key characteristic of humor. Through rigorous experimentation conducted on two benchmark datasets, our DGFEM model has demonstrated remarkable effectiveness in humor recognition, underscoring its potential to advance the state-of-the-art in this domain.
Although multi-span question-answering tasks align more closely with the complex demands of the real world, existing models often struggle to effectively model the dependencies and overall semantic structure between m...
ISBN:
(纸本)9789819794300;9789819794317
Although multi-span question-answering tasks align more closely with the complex demands of the real world, existing models often struggle to effectively model the dependencies and overall semantic structure between multiple answer spans. Therefore, we propose a concise and effective method for modeling span interactions, which primarily includes: 1) a Span Representation Module that utilizes SpanBERT to enhance span information within tokens;and 2) a Span Interaction Module that leverages two contrastive learning tasks to reinforce answer spans' interaction within token representation. On one hand, we use the [CLS] token as an intermediary variable to carry information of span interaction, and on the other hand, we employ prompt-based tasks to further strengthen the multi-span question-answering reasoning capabilities of encoder and the span aggregation ability of the CLS token. Experiments demonstrate that baselines, on MultiSpanQA, incorporating our strategy achieved an improvement in EM F1 ranging from 2.88 to 11.27, achieving state-of-the-art (SOTA) results at equivalent model scales.
暂无评论