The multimodal emotion-cause pair extraction (MECPE) task aims to detect the emotions, causes, and emotion-cause pairs from multimodal conversations. Existing methods for this task typically concatenate representation...
详细信息
The multimodal emotion-cause pair extraction (MECPE) task aims to detect the emotions, causes, and emotion-cause pairs from multimodal conversations. Existing methods for this task typically concatenate representations of each utterance from distinct modalities and then predict emotion-cause pairs directly. This approach struggles to effectively integrate multimodal features and capture the subtleties of emotion transitions, which are crucial for accurately identifying causes—thereby limiting overall performance. To address these challenges, we propose a novel model that captures holistic interaction and label constraint (HiLo) features for the MECPE task. HiLo facilitates cross-modality and cross-utterance feature interaction with various attention mechanisms, establishing a robust foundation for precise cause extraction. Notably, our model innovatively leverages emotion transition features as pivotal cues to enhance causal inference within conversations. The experimental results demonstrate the superior performance of HiLo, evidenced by an increase of more than 2% in the F1 score compared to existing benchmarks. Further analysis reveals that our approach adeptly utilizes multimodal and dialogue features, making a significant contribution to the field of emotion-cause analysis. Our code is publicly available at https://***/MVdYmx.
Nowadays, research on session-based recommender systems (SRSs) is one of the hot spots in the recommendation domain. Existing methods make recommendations based on the user’s current intention (also called short-term...
详细信息
Nowadays, research on session-based recommender systems (SRSs) is one of the hot spots in the recommendation domain. Existing methods make recommendations based on the user’s current intention (also called short-term preference) during a session, often overlooking the specific preferences associated with these intentions. In reality, users usually exhibit diverse preferences for different intentions, and even for the same intention, individual preferences can vary significantly between users. As users interact with items throughout a session, their intentions can shift accordingly. To enhance recommendation quality, it is crucial not only to consider the user’s intentions but also to dynamically learn their varying preferences as these intentions change. In this paper, we propose a novel Intention-sensitive Preference Learning Network (IPLN) including three main modules: intention recognizer, preference detector, and prediction layer. Specifically, the intention recognizer infers the user’s underlying intention within his/her current session by analyzing complex relationships among items. Based on the acquired intention, the preference detector learns the intention-specific preference by selectively integrating latent features from items in the user’s historical sessions. Besides, the user’s general preference is utilized to refine the obtained preference to reduce the potential noise carried from historical records. Ultimately, the fine-tuned preference and intention collaborate to instruct the next-item recommendation in the prediction layer. To prove the effectiveness of the proposed IPLN, we perform extensive experiments on two real-world datasets. The experiment results demonstrate the superiority of IPLN compared with other state-of-the-art models.
The Anchor-based Multi-view Subspace Clustering (AMSC) has turned into a favourable tool for large-scale multi-view clustering. However, there still exist some limitations to the current AMSC approaches. First, they t...
详细信息
The Anchor-based Multi-view Subspace Clustering (AMSC) has turned into a favourable tool for large-scale multi-view clustering. However, there still exist some limitations to the current AMSC approaches. First, they typically recover anchor graph structure in the original linear space, restricting their feasibility for nonlinear scenarios. Second, they usually overlook the potential benefits of jointly capturing the inter-view and intra-view information for enhancing the anchor representation learning. Third, these approaches mostly perform anchor-based subspace learning by a specific matrix norm, neglecting the latent high-order correlation across different views. To overcome these limitations, this paper presents an efficient and effective approach termed Large-scale Tensorized Multi-view Kernel Subspace Clustering (LTKMSC). Different from the existing AMSC approaches, our LTKMSC approach exploits both inter-view and intra-view awareness for anchor-based representation building. Concretely, the low-rank tensor learning is leveraged to capture the high-order correlation (i.e., the inter-view complementary information) among distinct views, upon which the \(l_{1,2}\) norm is imposed to explore the intra-view anchor graph structure in each view. Moreover, the kernel learning technique is leveraged to explore the nonlinear anchor-sample relationships embedded in multiple views. With the unified objective function formulated, an efficient optimization algorithm that enjoys low computational complexity is further designed. Extensive experiments on a variety of multi-view datasets have confirmed the efficiency and effectiveness of our approach when compared with the other competitive approaches.
作者:
Xin ZhangHongzhi FengM. Shamim HossainYinzhuo ChenHongbo WangYuyu YinHangzhou Dianzi University
China Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education China Zhoushan Tongbo Marine Electronic Information Research Institute Hangzhou Dianzi University China and Yunnan Key Laboratory of Service Computing Yunnan University of Finance and Economics China Hangzhou Dianzi University
China Department of Software Engineering
College of Computer and Information Sciences King Saud University Saudi Arabia Hangzhou Dianzi University
China Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education China and Zhoushan Tongbo Marine Electronic Information Research Institute Hangzhou Dianzi University China
Action Quality Assessment (AQA) has become crucial in video analysis, finding wide applications in various domains, such as healthcare and sports. A significant challenge faced by AQA is the background bias due to the...
详细信息
Action Quality Assessment (AQA) has become crucial in video analysis, finding wide applications in various domains, such as healthcare and sports. A significant challenge faced by AQA is the background bias due to the dominance of the background in videos. Especially, the background bias tends to overshadow subtle foreground differences, which is crucial for precise action evaluation. To address the background bias issue, we propose a novel data augmentation method named Scaled Background Swap. Firstly, the background regions between different video samples are swapped to guide models focus toward the dynamic foreground regions and mitigate its sensitivity to the background during training. Secondly, the video’s foreground region is up-scaled to further enhance models’ attention to the critical foreground action information for AQA tasks. In particular, the proposed Scaled Background Swap method can effectively improve models’ accuracy and generalization by prioritizing foreground motion and swapping backgrounds. It can be flexibly applied with various video analysis models. Extensive experiments on AQA benchmarks demonstrate that Scaled Background Swap method achieves better performance than baselines. Specifically, the Spearman’s rank correlation on datasets AQA-7 and MTL-AQA reaches 0.8870 and 0.9526, respectively. The code is available at: https://***/Emy-cv/Scaled-Background Swap.
暂无评论