Multimodal sentiment analysis leverages information from multiple sensors to achieve a comprehensive interpretation of emotions. However, different modalities do not always boost each other as expected. They compete w...
详细信息
Multimodal sentiment analysis leverages information from multiple sensors to achieve a comprehensive interpretation of emotions. However, different modalities do not always boost each other as expected. They compete with each other, leading to some modalities being under-optimized during the training process. To address this issue, we propose Adaptive Gradient Scaling with Sparse Mixture-of-Experts (AGS-SMoE). We first discuss the issue of modal preemption in unified multimodal learning from the perspective of causal preemption. Driven by actual cause, we use the gradient norms from different encoders at two fusion stages as evidence, estimating the current modal preemption state using a parameter-free method. Then, based on the dynamic preemption factor, we design a gradient scaling method to balance optimization for different encoders. Furthermore, we use Mixture-of-Experts to sparsify and perceive multimodal tokens in different preemption states. As a result, our experiments on four multimodal sentiment analysis datasets have achieved state-of-the-art results. Moreover, our method improves modal representation learning at different stages. Extensive experiments confirm that our method can alleviate the modal preemption problem in a plug-and-play manner. Our code is available at https://***/TheShy-Dream/AGS-SMoE.
暂无评论