Existing approaches for all-in-one weather-degraded image restoration suffer from inefficiencies in leveraging degradation-aware priors, resulting in sub-optimal performance in adapting to different weather conditions...
详细信息
Caching popular files at the small base stations has proved to be an effective strategy for reducing the content delivery delay in cellular networks and alleviating backhaul congestion. The challenging characteristics...
详细信息
Scientific community understanding of the variance in severity of infectious disease like COVID-19 across patients is an important area of focus. The article presents an innovative voting ensemble GenoCare Prognostica...
详细信息
Track irregularities can significantly reduce the comfort and safety of train operation. If the development trend of track irregularities can be predicted, the railway management department can issue early warnings to...
详细信息
Traditional autonomous navigation methods for mobile robots mainly rely on geometric feature-based LiDAR scan-matching algorithms, but in complex environments, this method is often affected due to the presence of movi...
详细信息
This letter considers a hybrid reconfigurable intelligent surface (RIS) assisted integrated sensing and communication (ISAC) system, where each RIS element can flexibly switch between the active and passive modes. Sub...
详细信息
Human-centric Emotional Video Captioning (H-EVC) aims to generate fine-grained, emotion-related sentences for human-based videos, enhancing the understanding of human emotions and facilitating human-computer emotional...
详细信息
Human-centric Emotional Video Captioning (H-EVC) aims to generate fine-grained, emotion-related sentences for human-based videos, enhancing the understanding of human emotions and facilitating human-computer emotional interaction. However, existing video captioning methods primarily focus on overall event content, often overlooking sufficient subtle emotional clues and interactions in videos. As a result, the generated captions frequently lack emotional information. To address this, we propose a novel Emotion-oriented Cross-modal Prompting and Alignment (ECPA) approach for large foundation models to enhance H-EVC accuracy by effectively modeling fine-grained visual-textual emotion clues and interactions. Using large foundation models, our ECPA introduces two learnable prompting strategies: visual emotion prompting (VEP) and textual emotion prompting (TEP), as well as an emotion-oriented cross-modal alignment (ECA) module. In VEP, we develop two-level learnable visual prompts, i.e., emotion recognition (ER)-level and action unit (AU)-level prompting, to assist pre-trained vision-language foundation models to attend to both coarse and fine emotion-related visual information in videos. In TEP, we correspondingly devise two-level learnable textual prompts, i.e., sentence-level emotional tokens, and word-level masked tokens, for obtaining both whole and local textual prompt representations related to emotions. To further facilitate the interaction and alignment of visual-textual emotion prompt representations, our ECA introduces another two levels of emotion-oriented prompt alignment learning mechanisms: the ER-sentence level and the AU-word level alignment losses. Both enhance the model's ability to capture and integrate both global and local cross-modal emotion semantics, thereby enabling the generation of fine-grained emotional linguistic descriptions in video captioning. Extensive experiments not only demonstrate that our ECPA outperforms existing state-of-the-art ap
The increasing instances of animals encroaching on human settlements, as well as the illicit trafficking of wildlife, have prompted immediate actions to protect the natural heritage. In addition to this, the difficult...
详细信息
Music source separation aims to disentangle individual sources from the mixture of musical signals. Existing generative adversarial network (GAN) based methods generally work on the spectrogram domain only. However, t...
详细信息
Recently, Li et al. proposed an identity-based linearly homomorphic network coding signature (IB-HNCS) scheme for secure data delivery in Internet of Things (IoT) networks, and they claimed that the IB-HNCS scheme can...
详细信息
暂无评论