版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Hong Kong Baptist Univ Dept Comp Sci Hong Kong Peoples R China Univ Sci & Technol China Natl Engn Lab Brain Inspired Intelligence Technol Hefei 230000 Anhui Peoples R China
出 版 物:《KNOWLEDGE-BASED SYSTEMS》 (Knowl Based Syst)
年 卷 期:2025年第310卷
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:NSFC
主 题:Prompt learning Vision-language models Domain generalization Domain adaptation
摘 要:Pre-trained vision-language models, such as CLIP, have shown remarkable capabilities across various downstream tasks by learning prompts that consist of context concatenated with a class name;for example, a photo of a (dog] with (dog] as a class prior. Advanced prompt-learning methods typically initialize and optimize the context;for example, a photo of a for downstream task adaptation. However, context optimization typically leads to poor generalization performance over novel classes or datasets sampled from different distributions. This maybe attributed to prompt inconsistency;namely, prompts optimized using one image distribution may differ from those optimized using a different image distribution. To improve the generalization performance of optimized prompts, we propose the novel consistent prompt learning (CPL) approach that identifies and addresses the image distribution that causes prompt inconsistency by performing distributional exploration. CPL identifies and mitigates prompt inconsistency in an adversarial training scheme, in which prompt inconsistency is measured as the similarity discrepancy between images and two different prompts. Specifically, CPL calculates two similarities between a query image and two prompts, and determines the prompt inconsistency through the discrepancy between these two similarities. Subsequently, CPL performs distributional exploration to enlarge the discrepancy and uses an adversarial-training approach to mitigate the discrepancy. Consequently, the model predictions are insensitive to prompt changes. The optimized prompt performs well under various image distributions. Comprehensive experiments show that the proposed CPL method performs favorably on four types of representative tasks across 11 datasets, which improves on existing prompt-learning methods, achieving state-of-the-art performance.