The growing adoption of social virtual reality (VR) platforms underscores the importance of safeguarding personal VR space to maintain user privacy and security. Teleportation, a prevalent instantaneous locomotion met...
详细信息
In real-world physiological and psychological scenarios, there often exists a robust complementary correlation between audio and visual signals. Audio-Visual Event Localization (AVEL) aims to identify segments with Au...
详细信息
In real-world physiological and psychological scenarios, there often exists a robust complementary correlation between audio and visual signals. Audio-Visual Event Localization (AVEL) aims to identify segments with Audio-Visual Events (AVEs) that contain both audio and visual tracks in unconstrained videos. Prior studies have predominantly focused on audio-visual cross-modal fusion methods, overlooking the fine-grained exploration of the cross-modal information fusion mechanism. Moreover, due to the inherent heterogeneity of multi-modal data, inevitable new noise is introduced during the audio-visual fusion process. To address these challenges, we propose a novel Cross-modal Contrastive Learning Network (CCLN) for AVEL, comprising a backbone network and a branch network. In the backbone network, drawing inspiration from physiological theories of sensory integration, we elucidate the process of audio-visual information fusion, interaction, and integration from an information-flow perspective. Notably, the Self-constrained Bi-modal Interaction (SBI) module is a bi-modal attention structure integrated with audio-visual fusion information, and through gated processing of the audio-visual correlation matrix, it effectively captures inter-modal correlation. The Foreground Event Enhancement (FEE) module emphasizes the significance of event-level boundaries by elongating the distance between scene events during training through adaptive weights. Furthermore, we introduce weak video-level labels to constrain the cross-modal semantic alignment of audio-visual events and design a weakly supervised cross-modal contrastive learning loss (WCCL Loss) function, which enhances the quality of fusion representation in the dual-branch contrastive learning framework. Extensive experiments conducted on the AVE dataset for both fully supervised and weakly supervised event localization, as well as Cross-Modal Localization (CML) tasks, demonstrate the superior performance of our model compa
In this paper, we present a class of codes, referred to as random staircase generator matrix codes (SGMCs), which have staircase-like generator matrices. In the infinite-length region, we prove that the random SGMC is...
详细信息
Flexible capacitive pressure sensors have garnered considerable interest across diverse applications, including medical monitoring, electronic skin, and robotic tactile systems, owing to their straightforward fabricat...
详细信息
Security of system behavior is a kind of information flow security, which is achieved by confusing the intruders via the indistinguishability of system behaviors. Noninterference is a typical notion to describe inform...
详细信息
Bayesian optimization (BO) is more efficient in automatically synthesizing operational amplifier (opamp) topologies compared to conventional methods. However, the design space for behavior-level opamp topologies invol...
详细信息
This article introduces a novel mechatronic system for coupling the stems of seedlings and plants to wooden stakes or ropes, a crucial process for supporting them during growth, transportation, and fruiting in plant p...
详细信息
As one of the essential issues when designing optimal control policies for automated manufacturing systems (AMSs) with unreliable resources, maximally permissive behavior should be ensured for the controlled system. T...
详细信息
In recent years, vision-language tracking has drawn emerging attention in the tracking field. The critical challenge for the task is to fuse semantic representations of language information and visual representations ...
详细信息
Real-time advisory systems (RTASs) are a crucial milestone on the road toward fully autonomous driving. Current proof-of-concept RTASs, which observe drivers and their environment to provide advice, are effective in a...
详细信息
暂无评论