Weakly supervised object localization (WSOL) strives to localize objects with only image-level supervision. WSOL often faces challenges such as incomplete localization due to classifier bias and over-localization in r...
详细信息
Weakly supervised object localization (WSOL) strives to localize objects with only image-level supervision. WSOL often faces challenges such as incomplete localization due to classifier bias and over-localization in real scenes where objects and backgrounds are strongly associated or structurally similar. While the latest Transformer-based methods effectively enhance localization performance by leveraging long-range feature dependencies, they may inadvertently amplify divergent background activation and remain susceptible to classification bias. To this end, we proposed a novel Se mantic-Constraint C onstraint M atching (SeCM) plug-in module tailored for transformer-based approaches. In detail, a local patch shuffle strategy is first introduced to disentangle partial contextual linkages, thereby creating image pairs. Then a semantic matching module extracts co-object knowledge from the primal-shuffled image pairs, drives the network to identify the association of foreground with semantic label to suppress divergent activation. Moreover, to alleviate incomplete localization and prevent excessive suppression of activation, we propose leveraging multi-modal class-specific textual representations to guide object localization by complementing intra-class priori diverse knowledge. Extensive experimental results conducted on CUB-200-2011 and ILSVRC datasets show that our method can achieve the new state-of-the-art performance.
The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once...
详细信息
A novel frequency and polarization reconfigurable water patch antenna is proposed for radio communication in the UHF band. Based on theoretical analysis and simulation results, water is an ideal material for designing...
详细信息
An entailment tree is a structured reasoning path that clearly demonstrates the process of deriving hypotheses through multiple steps of inference from known premises. It enhances the interpretability of QA systems. E...
详细信息
Generating coherent and credible explanations remains a significant challenge in the field of AI. In recent years, researchers have delved into the utilization of entailment trees to depict explanations, which exhibit...
详细信息
Zero-shot remote sensing image scene classification (ZS-RSISC) aims to identify remote sensing (RS) image scenes of unseen classes whose samples are unavailable in the training stage. To transfer knowledge from seen R...
详细信息
Zero-shot remote sensing image scene classification (ZS-RSISC) aims to identify remote sensing (RS) image scenes of unseen classes whose samples are unavailable in the training stage. To transfer knowledge from seen RS classes to unseen RS classes, existing methods either rely on laborious manual labeling to learn semantic features or directly use the word embeddings learned based on the general corpus and independently of zero-shot models. They ignore the complex interclass correlation information, which plays a vital role in communicating seen with unseen classes. Besides, current studies in ZS-RSISC impose the same penalty to equally constrain each class for the interclass separation and intraclass compactness, which results in unclear classification boundaries. In this article, we tackle ZS-RSISC via graph-based semantic embedding refinement (GSER) in an end-to-end manner. We propose semantic graph convolutional networks (S-GCNs) to explore the correlation structure among classes in a unified framework. The semantic graph embeddings are further refined by the learning of the semantic-guided class patterns and component patterns. Specifically, we propose adaptive additive separation (AAS) loss to adaptively adjust the appropriate penalty for each class and explicitly promote intraclass compactness and interclass separation. Further, instance-level alignment and class-level alignment are proposed to enhance the discriminative ability of the semantic-guided class patterns. To alleviate model bias toward seen classes, semantic-guided component patterns shared by seen and unseen classes are exploited via feature reconstruction. Extensive experiments of both the zero-shot and generalized zero-shot settings demonstrate the effectiveness of our proposed GSER.
Much of commonsense knowledge in real world is in the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable atte...
详细信息
Sketch education is an essential component of arts education. In recent years, with the development of society, the demand for sketch courses has been steadily increasing. However, the existing teaching resources are ...
详细信息
Modern advanced large language model (LLM) applications often prepend long contexts before user queries to improve model output quality. These contexts frequently repeat, either partially or fully, across multiple que...
The lack of facial features caused by wearing masks degrades the performance of facial recognition systems. Traditional occluded face recognition methods cannot integrate the computational resources of the edge layer ...
详细信息
The lack of facial features caused by wearing masks degrades the performance of facial recognition systems. Traditional occluded face recognition methods cannot integrate the computational resources of the edge layer and the device layer. Besides, previous research fails to consider the facial characteristics including occluded and unoccluded parts. To solve the above problems, we put forward a device-edge collaborative occluded face recognition method based on cross-domain feature fusion. Specifically, the device-edge collaborative face recognition architecture gets the utmost out of maximizes device and edge resources for real-time occluded face recognition. Then, a cross-domain facial feature fusion method is presented which combines both the explicit domain and the implicit domain facial. Furthermore, a delay-optimized edge recognition task scheduling method is developed that comprehensively considers the task load, computational power, bandwidth, and delay tolerance constraints of the edge. This method can dynamically schedule face recognition tasks and minimize recognition delay while ensuring recognition accuracy. The experimental results show that the proposed method achieves an average gain of about 21% in recognition latency, while the accuracy of the face recognition task is basically the same compared to the baseline method.
暂无评论