Accurate career path prediction can support many stakeholders, like job seekers, recruiters, HR, and project managers. However, publicly available data and tools for career path prediction are scarce. In this work, we...
详细信息
Sentiment analysis is an important research area in Natural languageprocessing (NLP). With the explosion of multimodal data, Multimodal Sentiment Analysis (MSA) attracts more and more attention in recent years. How t...
Sentiment analysis is an important research area in Natural languageprocessing (NLP). With the explosion of multimodal data, Multimodal Sentiment Analysis (MSA) attracts more and more attention in recent years. How to Effectively harnessing the interplay between diverse modalities is paramount to achieving comprehensive fusion of MSA. However, current research predominantly emphasizes modality interaction, while overlooking unimodal information, thus neglecting the inherent disparities between modalities. To address these issues, we propose a novel model for multimodal sentiment analysis based on gated fusion and multi-task learning. The model adopts multi-task learning to concurrently address both multimodal and unimodal sentiment analysis tasks. Specifically, for the multimodal task, we leverage cross-modal Transformers with gating mechanisms to facilitate modality fusion. Subsequently, the fused representations are harnessed to generate sentiment labels for the unimodal tasks. Experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that our model outperforms the existing methods and achieves the state-of-the art performance.
We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and morphological form-of relations, resulting in 300K unique translations and over 4 millio...
详细信息
The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visu...
详细信息
The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visual keyword spotting, which leverages complementary relationships over multiple modalities, has recently gained much attention. However, current studies mainly focus on combining the exclusively learned representations of different modalities, instead of exploring the modal relationships during each respective modeling. In this paper, we propose a novel visual modality enhanced end-to-end KWS framework (VE-KWS), which fuses audio and visual modalities from two aspects. The first one is utilizing the speaker location information obtained from the lip region in videos to assist the training of multi-channel audio beamformer. By involving the beamformer as an audio enhancement module, the acoustic distortions, caused by the far field or noisy environments, could be significantly suppressed. The other one is conducting cross-attention between different modalities to capture the inter-modal relationships and help the representation learning of each modality. Experiments on the MSIP challenge corpus show that our proposed model achieves a 2.79% false rejection rate and a 2.95% false alarm rate on the Eval set, resulting in a new SOTA performance compared with the top-ranking systems in the ICASSP2022 MISP challenge.
language understanding is a multi-faceted cognitive capability, which the Natural languageprocessing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence ...
详细信息
Instruction tuning has become an integral part of training pipelines for Large language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection...
详细信息
Recent advances in automatic speech recognition (ASR) technology have boosted the viability of fully automated Alzheimer’s disease (AD) detection via ASR transcripts. However, there is a lack of understanding of how ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recent advances in automatic speech recognition (ASR) technology have boosted the viability of fully automated Alzheimer’s disease (AD) detection via ASR transcripts. However, there is a lack of understanding of how ASR errors affect the performance of AD detection. This paper addresses that gap. First, we fine-tune 18 ASR models on three datasets from DementiaBank, generating 36 ASR transcripts on the ADReSS dataset (18 from original and 18 from fine-tuned ASR models). We then employ two AD detection methods using either ASR or manual transcripts: fine-tuning four large language models (LLMs) and fusing LLMs with pre-trained language models (PLMs). The results show that certain ASR transcripts outperform manual transcripts, suggesting that ASR errors provide valuable clues for AD detection. Finally, we conduct an interpretability study, including linguistic and SHapley Additive exPlanations (SHAP) analyses. This study reveals that greater word distribution differences between AD and healthy control (HC) groups in ASR transcripts may be linked to these valuable clues. This paper highlights the potential of ASR as a powerful tool for developing fully automated AD detection systems.
This paper describes the participation of the DUTH-ATHENA team of Democritus University of Thrace and Athena Research center in the eRisk 2021 task, which focuses on measuring the level of depression based on Reddit u...
详细信息
Extraction of supportive premises for a mathematical problem can contribute to profound success in improving automatic reasoning systems. One bottleneck in automated theorem proving is the lack of a proper semantic in...
详细信息
Contextual-LAS (CLAS) has been shown effective in improving Automatic speech Recognition (ASR) of rare words. It relies on phrase-level contextual modeling and attention-based relevance scoring without explicit contex...
详细信息
暂无评论