检索结果-内蒙古大学图书馆

Semantic-Constraint Matching for transformer-based weakly supervised object localization

PATTERN RECOGNITION 2025年 158卷

作者： Cao, Yiwen Su, Yukun Wang, Wenjun Liu, Yanxia Wu, Qingyao South China Univ Technol Sch Software Engn Guangzhou Peoples R China Peng Cheng Lab Shenzhen Peoples R China Minist Educ Key Lab Big Data & Intelligent Robot Guangzhou Peoples R China

Weakly supervised object localization (WSOL) strives to localize objects with only image-level supervision. WSOL often faces challenges such as incomplete localization due to classifier bias and over-localization in real scenes where objects and backgrounds are strongly associated or structurally similar. While the latest Transformer-based methods effectively enhance localization performance by leveraging long-range feature dependencies, they may inadvertently amplify divergent background activation and remain susceptible to classification bias. To this end, we proposed a novel Se mantic-Constraint C onstraint M atching (SeCM) plug-in module tailored for transformer-based approaches. In detail, a local patch shuffle strategy is first introduced to disentangle partial contextual linkages, thereby creating image pairs. Then a semantic matching module extracts co-object knowledge from the primal-shuffled image pairs, drives the network to identify the association of foreground with semantic label to suppress divergent activation. Moreover, to alleviate incomplete localization and prevent excessive suppression of activation, we propose leveraging multi-modal class-specific textual representations to guide object localization by complementing intra-class priori diverse knowledge. Extensive experimental results conducted on CUB-200-2011 and ILSVRC datasets show that our method can achieve the new state-of-the-art performance.

关键词： Weakly-supervised object localization Vision transformer Image matching Vision language model

来源：评论

学校读者我要写书评

暂无评论

TOWARDS ROBUST AND EFFICIENT CLOUD-EDGE ELASTIC MODEL ADAPTATION VIA SELECTIVE ENTROPY DISTILLATION 12

TOWARDS ROBUST AND EFFICIENT CLOUD-EDGE ELASTIC MODEL ADAPTA...

引用

12th International Conference on Learning Representations, ICLR 2024

作者： Chen, Yaofo Niu, Shuaicheng Wang, Yaowei Xu, Shoukai Song, Hengjie Tan, Mingkui South China University of Technology China Pengcheng Laboratory China Nanyang Technological University Singapore Key Laboratory of Big Data and Intelligent Robot Ministry of Education China Pazhou Laboratory China

The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environments may change dynamically (known as distribution shifts), which often results in degraded performance. Thus, one has to adapt the edge models promptly to attain promising performance. Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance. To address these, we encounter two primary challenges: 1) the edge model has limited computation power and may only support forward propagation;2) the data transmission budget between cloud and edge devices is limited in latency-sensitive scenarios. In this paper, we establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation and the edge models can be adapted online. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud, i.e., dynamic unreliable and low-informative sample exclusion. Based on the uploaded samples, we update and distribute the affine parameters of normalization layers by distilling from the stronger foundation model to the edge model with a sample replay strategy. Extensive experimental results on ImageNet-C and ImageNet-R verify the effectiveness of our CEMA. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

关键词： Budget control

来源：评论

学校读者我要写书评

暂无评论

A Frequency and Polarization Reconfigurable Transparent Water Antenna

引用

Progress in Electromagnetics Research C 2025年 152卷 177-186页

作者： Li, Lei Gao, Jing Nan, Jingchang School of Electronics and Information Engineering Liaoning Technical University Huludao125105 China Liaoning Key Laboratory of Radio Frequency and Big Data for Intelligent Applications Huludao125105 China

A novel frequency and polarization reconfigurable water patch antenna is proposed for radio communication in the UHF band. Based on theoretical analysis and simulation results, water is an ideal material for designing transparent liquid ground. Water enhances outstanding transparency, excellent aesthetics, and high optical stealth performance for a wider range of application scenarios. The entire structure is made with polyvinyl chloride material and distilled water, except for the feed structure. By filling different cavities with liquid water, five different operating states are obtained in 1.924–2.5 GHz (26.1%), 1.67–2.33 GHz (33%), 0.644–2.288 GHz (112.1%), 1.975–2.54 GHz (25%), and 1.748–2.108 GHz (18.7%), achieving frequency reconfigurability. The antenna can be flexibly switched between linear polarization (LP) and two right-handed circular polarization (RHCP) states. The results show that the 3 dB axial ratio (AR) bandwidth covers 1.93–2.08 GHz (7.5%) and 2.06–2.132 GHz (3.5%). The antenna achieves high optical transparency of 100% and a peak gain of 7.97 dBi. © 2025, Electromagnetics Academy. All rights reserved.

关键词： Circular polarization

来源：评论

学校读者我要写书评

暂无评论

Step Feasibility-Aware and Error-Correctable Entailment Tree Generation 30

Step Feasibility-Aware and Error-Correctable Entailment Tree...

引用

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024

作者： Song, Junyue Wu, Xin Cai, Yi School of Software Engineering South China University of Technology China Key Laboratory of Big Data and Intelligent Robot South China University of Technology Ministry of Education China

ISBN: (纸本)9782493814104

An entailment tree is a structured reasoning path that clearly demonstrates the process of deriving hypotheses through multiple steps of inference from known premises. It enhances the interpretability of QA systems. Existing methods for generating entailment trees typically employ iterative frameworks to ensure reasoning faithfulness. However, they often suffer from the issue of false feasible steps, where selected steps appear feasible but actually lead to incorrect intermediate conclusions. Moreover, the existing iterative frameworks do not consider error-prone search branches, resulting in error propagation. In this work, we propose SPEH: an iterative entailment tree generation framework with Step fesibility Perception and state Error Handling mechanisms. Step Feasibility Perception enables the model to learn how to choose steps that are not false feasible. State Error Handling includes error detection and backtracking, allowing the model to correct errors when entering incorrect search branches. Experimental results demonstrate the effectiveness of our approach in improving the generation of entailment trees. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

A Logical Pattern Memory Pre-trained Model for Entailment Tree Generation 30

A Logical Pattern Memory Pre-trained Model for Entailment Tr...

引用

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024

作者： Yuan, Li Cai, Yi Ren, Haopeng Wang, Jiexin School of Software Engineering South China University of Technology China Key Laboratory of Big Data and Intelligent Robot South China University of Technology Ministry of Education China

ISBN: (纸本)9782493814104

Generating coherent and credible explanations remains a significant challenge in the field of AI. In recent years, researchers have delved into the utilization of entailment trees to depict explanations, which exhibits a reasoning process of how a hypothesis is deduced from the supporting facts. However, existing models often overlook the importance of generating intermediate conclusions with logical consistency from the given facts, leading to inaccurate conclusions and undermining the overall credibility of entailment trees. To address this limitation, we propose the logical pattern memory pre-trained model (LMPM). LMPM incorporates an external memory structure to learn and store the latent representations of logical patterns, which aids in generating logically consistent conclusions. Furthermore, to mitigate the influence of logically irrelevant domain knowledge in the Wikipedia-based data, we introduce an entity abstraction approach to construct the dataset for pre-training LMPM. The experimental results highlight the effectiveness of our approach in improving the quality of entailment tree generation. By leveraging logical entailment patterns, our model produces more coherent and reasonable conclusions that closely align with the underlying premises. Code and data are released at https://***/YuanLi95/T5-LMPM. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

关键词： Domain Knowledge

来源：评论

学校读者我要写书评

暂无评论

Graph-Based Semantic Embedding Refinement for Zero-Shot Remote Sensing Image Scene Classification

引用

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS 2024年第1期60卷 644-657页

作者： Shang, Junyuan Niu, Chang Zhou, Wenlve Zhou, Zhiheng Yang, Junmei South China Univ Technol Sch Elect & Informat Engn Guangzhou 510641 Peoples R China SouthChina Univ Technol Key Laboratoryof Big Data & Intelligent Robot Minist Educ Guangzhou 510641 Peoples R China

Zero-shot remote sensing image scene classification (ZS-RSISC) aims to identify remote sensing (RS) image scenes of unseen classes whose samples are unavailable in the training stage. To transfer knowledge from seen RS classes to unseen RS classes, existing methods either rely on laborious manual labeling to learn semantic features or directly use the word embeddings learned based on the general corpus and independently of zero-shot models. They ignore the complex interclass correlation information, which plays a vital role in communicating seen with unseen classes. Besides, current studies in ZS-RSISC impose the same penalty to equally constrain each class for the interclass separation and intraclass compactness, which results in unclear classification boundaries. In this article, we tackle ZS-RSISC via graph-based semantic embedding refinement (GSER) in an end-to-end manner. We propose semantic graph convolutional networks (S-GCNs) to explore the correlation structure among classes in a unified framework. The semantic graph embeddings are further refined by the learning of the semantic-guided class patterns and component patterns. Specifically, we propose adaptive additive separation (AAS) loss to adaptively adjust the appropriate penalty for each class and explicitly promote intraclass compactness and interclass separation. Further, instance-level alignment and class-level alignment are proposed to enhance the discriminative ability of the semantic-guided class patterns. To alleviate model bias toward seen classes, semantic-guided component patterns shared by seen and unseen classes are exploited via feature reconstruction. Extensive experiments of both the zero-shot and generalized zero-shot settings demonstrate the effectiveness of our proposed GSER.

关键词： Semantics Visualization Feature extraction Convolutional neural networks Training Task analysis Remote sensing

来源：评论

学校读者我要写书评

暂无评论

Grounded Multimodal Procedural Entity Recognition for Procedural Documents: A New dataset and Baseline 30

Grounded Multimodal Procedural Entity Recognition for Proced...

引用

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024

作者： Ren, Haopeng Zeng, Yushi Cai, Yi Ye, Zhenqi Yuan, Li Zhu, Pinli School of Software Engineering South China University of Technology China Key Laboratory of Big Data and Intelligent Robot South China University of Technology Ministry of Education China

ISBN: (纸本)9782493814104

Much of commonsense knowledge in real world is in the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable attention. However, they often focus on procedural text but ignore a common multimodal scenario in the real world. Images and text can complement each other semantically, alleviating the semantic ambiguity suffered in text-only modality. Motivated by these, in this paper, we explore a problem of grounded multimodal procedural entity recognition (GMPER), aiming to detect the procedural entity and the corresponding bounding box groundings in images (i.e., visual entities). A new dataset (Wiki-GMPER) is built and extensive experiments are conducted to evaluate the effectiveness of our proposed model. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Sketch Teaching System Based on Human-Computer Hybrid Enhanced Intelligence 18th

Sketch Teaching System Based on Human-Computer Hybrid Enhan...

引用

18th International Conference on Computer Science and Education, ICCSE 2023

作者： Fu, Shiyu Dai, Dawei Yang, Le Liao, Zhenchun Wang, Guoyin Key Laboratory of Big Data Intelligent Computing Chongqing University of Posts and Telecommunications Chongqing400065 China

ISBN: (纸本)9789819707362

Sketch education is an essential component of arts education. In recent years, with the development of society, the demand for sketch courses has been steadily increasing. However, the existing teaching resources are severely lacking, and unable to meet the requirements of high-quality teaching. This paper has designed and implemented a sketching smart teaching system. The system utilizes artificial intelligence technology to assist teaching, providing modules for image cross-modal transformation and style transfer to broaden users’ creative thinking and fulfill personalized learning needs. Furthermore, the system supports a step-by-step image generation process, aiding users in learning drawing techniques effectively. Additionally, our system can collect users’ drawing processes and analyze the gathered data to correct users’ drawing habits. These data also serve as valuable resources for the development of artificial intelligence. This platform has propelled the transformation of classroom teaching from traditional methods to interactive teaching models inside and outside the classroom, achieving mutual empowerment between artificial intelligence and smart education. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

关键词： Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

IMPRESS: An Importance-Informed Multi-Tier Prefix KV Storage System for Large Language Model Inference 23

IMPRESS: An Importance-Informed Multi-Tier Prefix KV Storage...

引用

23rd USENIX Conference on File and Storage Technologies, FAST 2025

作者： Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Ruidong Yang, Siling Chen, Ping Zheng, Yi Huai, Baoxing Chen, Gang The State Key Laboratory of Blockchain and Data Security Zhejiang University China Institute of Blockchain and Data Security China Zhejiang Key Laboratory of Big Data Intelligent Computing China Huawei Cloud China

ISBN: (纸本)9781939133458

Modern advanced large language model (LLM) applications often prepend long contexts before user queries to improve model output quality. These contexts frequently repeat, either partially or fully, across multiple queries. Existing systems typically store and reuse the keys and values of these contexts (referred to as prefix KVs) to reduce redundant computation and time to first token (TTFT). When prefix KVs need to be stored on disks due to insufficient CPU memory, reusing them does not always reduce TTFT, as disk I/O latency is high. In this paper, we propose IMPRESS, an importance-informed multi-tier prefix KV storage system to reduce I/O delay for LLM inference by only loading important prefix KVs. IMPRESS first leverages the insight that there is significant similarity in important token index sets across attention heads and introduces an I/O-efficient important KV identification algorithm. It then optimizes prefix KV storage and caching through importance-informed KV management, reducing TTFT during model inference. Our experimental results show that IMPRESS can reduce TTFT by up to 2.8× compared to state-of-the-art systems, while maintaining comparable inference accuracy. © 2025 FAST. All Rights Reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Device-edge collaborative occluded face recognition method based on cross-domain feature fusion

引用

Digital Communications and Networks 2025年第2期11卷 482-492页

作者： Puning Zhang Lei Tan Zhigang Yang Fengyi Huang Lijun Sun Haiying Peng School of Communications and Information Engineering Chongqing University of Posts and Telecommunications Advanced Network and Intelligent Connection Technology Key Laboratory of Chongqing Education Commission of China Chongqing Key Laboratory of Ubiquitous Sensing and Networking Chongqing Innovation Center of Industrial Big-Data Co.Lt

The lack of facial features caused by wearing masks degrades the performance of facial recognition systems. Traditional occluded face recognition methods cannot integrate the computational resources of the edge layer and the device layer. Besides, previous research fails to consider the facial characteristics including occluded and unoccluded parts. To solve the above problems, we put forward a device-edge collaborative occluded face recognition method based on cross-domain feature fusion. Specifically, the device-edge collaborative face recognition architecture gets the utmost out of maximizes device and edge resources for real-time occluded face recognition. Then, a cross-domain facial feature fusion method is presented which combines both the explicit domain and the implicit domain facial. Furthermore, a delay-optimized edge recognition task scheduling method is developed that comprehensively considers the task load, computational power, bandwidth, and delay tolerance constraints of the edge. This method can dynamically schedule face recognition tasks and minimize recognition delay while ensuring recognition accuracy. The experimental results show that the proposed method achieves an average gain of about 21% in recognition latency, while the accuracy of the face recognition task is basically the same compared to the baseline method.

关键词： Occluded face recognition Cross-domain feature fusion Device-edge collaboration

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：