Multi-party Dialogue Reading Comprehension is a reading comprehension task that involves comprehending dialogue with multiple interlocutors and answering questions. Research on MDRC faces enormous challenges because o...
ISBN:
(纸本)9789819794300;9789819794317
Multi-party Dialogue Reading Comprehension is a reading comprehension task that involves comprehending dialogue with multiple interlocutors and answering questions. Research on MDRC faces enormous challenges because of the multiple parties involved and the frequent changes in the chat topic. Previous work has explored in mining and modeling dialogue context features on the basis of pre-trained models and graph-based models. However, two issues exist: insufficient connection between the behavioral events of dialogue participants and excessive information irrelevant to the question during reasoning. In this paper, we propose a dual-graph reasoning with integrated key-clue parsing approach. We utilize the dual-graph reasoning strategy to capture the global structure and internal dynamics of the dialogue. Moreover, we design the key-clue parsing module to prioritize essential dialogue content, which significantly reduces the burden on the model and enhances the accuracy of our model. The experiments on the benchmark dataset show that our approach yields stable and substantial improvements, and outperforms the state-of-the-art methods.
Target-oriented multimodal sentiment classification (TMSC) aims to determine the sentiment polarity associated with each target within a sentence-image pair. Previous research has not distinguished between the status ...
ISBN:
(纸本)9789819794393;9789819794409
Target-oriented multimodal sentiment classification (TMSC) aims to determine the sentiment polarity associated with each target within a sentence-image pair. Previous research has not distinguished between the status of textual and visual modalities or has subjectively given primary status to the textual modality. However, in diverse contexts, the impact of each modality on predicting the sentiment polarity of the target word varies. Given the pivotal role of the target word in TMSC, we introduce a framework with adaptive modality weighting to detect target-related information. Specifically, this framework adaptively determines the importance of each modality based on the generated contribution weights for sentiment prediction towards the target word. The modality with relatively larger weights is considered the primary modality and leveraged to enhance the multimodal representation during the fusion stage. To further acquire information related to the target word, the large vision-language model is used to generate external target-specific knowledge descriptions as a supplementary textual modality, helping to identify the sentiment of each target term accurately. Experimental results on the multimodal Twitter-2015 and Twitter-2017 datasets show that our proposed method outperforms other competitive baselines.
Engagement recognition enables more natural and responsive human-computer interaction by allowing systems to monitor and adapt to a person's engagement levels. However, developing efficient real-time engagement re...
ISBN:
(纸本)9783031780134;9783031780141
Engagement recognition enables more natural and responsive human-computer interaction by allowing systems to monitor and adapt to a person's engagement levels. However, developing efficient real-time engagement recognition systems remains challenging. This research proposes a multi-modal engagement recognition approach enhanced with affective embeddings to address current limitations. Several computationally efficient deep learning models are developed to process facial, body, and emotional cues from video. Additionally, a novel cross-multimodal fusion approach is applied to combine various modalities using a cross-attention mechanism. We conducted extensive experiments on two datasets to analyze the impact of temporal context, showing that longer sequences significantly improve recognition performance. Furthermore, the results demonstrate that the proposed multi-modal approach achieves notably higher efficiency than individual modalities and outperforms modern engagement recognition frameworks, having comparable recognition performance with the winner of the Multimediate'23 challenge. Thus, by appropriately modeling visual engagement dynamics, the introduced multi-modal framework enhances real-time engagement recognition to advance human-computer interactions.
作者:
Zhang, HuWu, ZengtaiShanxi Univ
Sch Comp & Informat Technol Taiyuan Peoples R China Shanxi Univ
Key Lab Computat Intelligence & Chinese Informat Minist Educ Taiyuan Peoples R China
Interactive Argument Pair Identification is an emerging research task for argument mining, with the goal of identifying whether two arguments are interactively related. However, existing methods solely focus on the in...
ISBN:
(纸本)9789819794393;9789819794409
Interactive Argument Pair Identification is an emerging research task for argument mining, with the goal of identifying whether two arguments are interactively related. However, existing methods solely focus on the interaction representation among arguments or between arguments and context, neglecting the interaction between their two representations and do not specifically investigate the distinctiveness in representations between positive and negative samples of argument pairs. In this paper, we propose a Contrastive-Enhanced and Multi-Scale Semantic-Aware Framework to solve this problem. We employ Multi-Scale Semantic-Aware module to facilitate semantic interactions among the context of arguments and the debating parties, which aims to comprehensively understand the complete argumentation process. Additionally, we utilize Contrastive-Enhanced module to minimize the distance for positive samples of arguments and maximize the distance for negative samples, which assists the model in better distinguishing the relationships between arguments. The experimental results show that our method achieves the SOTA performance on the benchmark dataset. Further analysis demonstrates the effectiveness of our proposed modules.
Currently, the application of Large Language Models (LLMs) faces significant security threats. Harmful questions and adversarial attack prompts can induce the LLMs to generate toxic responses. Therefore, detoxifying L...
ISBN:
(纸本)9789819794423;9789819794430
Currently, the application of Large Language Models (LLMs) faces significant security threats. Harmful questions and adversarial attack prompts can induce the LLMs to generate toxic responses. Therefore, detoxifying LLMs is a critical research topic to ensure their safe and widespread application. In this paper, we propose an alignment-based detoxification method for LLMs. We utilize Kahneman-Tversky Optimization (KTO) to align LLMs. During the construction of the training dataset, we take into account both the detoxification performance and the potential side effect on the LLMs. For detoxification, we make the LLM preferentially generate safe responses rather than toxic contents when asked with harmful questions and attack prompts. To mitigate the potential side effect on the conversational capabilities of LLMs, we incorporate normal questions into the training data, and ensure that the LLM generate normal answers, rather than safety refusals or unsafe responses. Experimental results show that our method showcase the best detoxification performance among all baseline methods while exerting little negative impact on the LLMs. Moreover, our method even enhance the LLMs' general abilities such as question answering and language understanding. Our proposed method achieve the first place in the NLPCC 2024 Share Task 10 Track 2 with an average score of 52.31.
The Causal Emotion Entailment (CEE) is a sub-task of sentiment analysis field, which aims to discover the cause utterances that trigger speakers' emotion in a conversation. Current cause utterance recognizing is s...
ISBN:
(纸本)9789819794423;9789819794430
The Causal Emotion Entailment (CEE) is a sub-task of sentiment analysis field, which aims to discover the cause utterances that trigger speakers' emotion in a conversation. Current cause utterance recognizing is still unsatisfactory, particularly suffering from long distance between cause utterances and emotion utterances. In this paper, we propose an emotion-cause relation enhanced model (EmoCRT) to better solve the long-distance issue by utilizing four emotion-cause relation types. The experimental results on the RECCON dataset show that the proposed model outperforms the benchmark model by 1.41% in terms of Macro-F1. In addition, we reveal the defects of the large language model (LLM) on this task.
With the relentless growth in the volume of academic publications and the accelerating speed of scholarly communication, the time researchers dedicate to literature surveys has become increasingly substantial. Automat...
ISBN:
(纸本)9789819794423;9789819794430
With the relentless growth in the volume of academic publications and the accelerating speed of scholarly communication, the time researchers dedicate to literature surveys has become increasingly substantial. Automatic literature survey generation offers a valuable solution, liberating researchers from the time-intensive task of manually surveying the literature. We organized the NLPCC2024 Shared Task 6 for scientific literature survey generation. This paper will summarize the task information, the data set, the methods used by participants and the final results. Furthermore, we will discuss key findings and challenges for scientific literature survey generation in the scientific domain.
Word-level completion can automatically complete words as the translator types character sequences. Word-level completion can accelerate the editing process of human translation and ensure the translation quality. Alt...
ISBN:
(纸本)9789819794393;9789819794409
Word-level completion can automatically complete words as the translator types character sequences. Word-level completion can accelerate the editing process of human translation and ensure the translation quality. Although significant progress has been made in the field, there may be multiple candidate words when models predict words. Multiple words make up a list of candidate words. We improve the existing model by determining the most credible word in the candidate word list. We propose a multi-model fusion method to increase the accuracy of word-level completion. The improved model can use multiple evaluation criteria (Lesk method, WordNet knowledge base, and pre-training model) to calculate the scores of words by classification and weighting. The word with the highest score is selected as the most credible word. The experimental results prove that our proposed method is effective. In De-.En, our method improves the accuracy by 2.83%. In Zh-.En, our method improves the accuracy by 2.77%.
The Multimodal Abstractive Summarization task aims to generate a concise summary using given multimodal data (textual and visual). Existing related research is still simple splicing and blending of information from mu...
ISBN:
(纸本)9789819794393;9789819794409
The Multimodal Abstractive Summarization task aims to generate a concise summary using given multimodal data (textual and visual). Existing related research is still simple splicing and blending of information from multiple modalities, without considering the interaction between image and corresponding text and the contextual structural relationship of the image and text. We believe that these existing models can't fully integrate multimodal information and leverage the Transformer's ability to process sequential data. To this end, for MAS task, we use image captions that are highly correlated with the image for image fusion;and design image-text alignment tasks to improve the effectiveness of visual modalities in embedding text summary tasks;and propose a sequential structured image-text fusion method to enhance the model's ability of sequences semantic understanding. Through these methods, we can give full play to the contribution of visual modality information to the summary task to enhance the MAS model, thereby generating more accurate summaries. We conducted experiments on related dataset and found that ROUGE-1, ROUGE-2, and ROUGE-L improved by 1.34, 1.64, and 1.32 compared to the baseline model. Additionally, we contributed a large-scale sequential structured multimodal abstractive summarization dataset.
The OpenASR21 evaluation consisted of speech recognition for low resource languages in 3 evaluation conditions: constrained, contrained plus, and unconstrained. In this paper we investigate the constrained plus condit...
ISBN:
(纸本)9783031779602;9783031779619
The OpenASR21 evaluation consisted of speech recognition for low resource languages in 3 evaluation conditions: constrained, contrained plus, and unconstrained. In this paper we investigate the constrained plus condition. In the constrained plus condition, we can use any self supervised learning (SSL) model to reduce the word error rate (WER). The idea was to get good speech recognition accuracy with only 10 h of acoustic training data for the 15 low resource languages in OpenASR21. In this paper, we show that we reduce WER for all the 15 languages when we increase the temporal resolution of feature parameters computed from the speech SSL models from 20 ms to 10 ms. The temporal resolution of the SSL models is in general 20 ms. This increase in temporal resolution is done without retraining the SSL models. The resulting feature parameters with increased temporal resolution lead to 3.9% average absolute reduction in WER (from 1.2% for Javanese to 7.8% for Amharic) for the development set of the 15 languages in the OpenASR21 evaluation. We also compareWER for 5 different pre-trained SSL models in the low resource OpenASR21 languages scenario.
暂无评论