咨询与建议

限定检索结果

文献类型

  • 85 篇 期刊文献
  • 77 篇 会议

馆藏范围

  • 162 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 155 篇 工学
    • 127 篇 计算机科学与技术...
    • 56 篇 电气工程
    • 22 篇 软件工程
    • 19 篇 信息与通信工程
    • 15 篇 控制科学与工程
    • 10 篇 电子科学与技术(可...
    • 9 篇 生物医学工程(可授...
    • 4 篇 仪器科学与技术
    • 3 篇 交通运输工程
    • 2 篇 机械工程
    • 2 篇 土木工程
    • 2 篇 测绘科学与技术
    • 1 篇 材料科学与工程(可...
    • 1 篇 水利工程
    • 1 篇 农业工程
    • 1 篇 环境科学与工程(可...
  • 25 篇 医学
    • 12 篇 临床医学
    • 9 篇 特种医学
    • 7 篇 基础医学(可授医学...
  • 17 篇 理学
    • 8 篇 物理学
    • 4 篇 化学
    • 3 篇 数学
    • 3 篇 生物学
    • 2 篇 地理学
    • 1 篇 天文学
    • 1 篇 大气科学
    • 1 篇 地球物理学
    • 1 篇 地质学
  • 10 篇 管理学
    • 8 篇 管理科学与工程(可...
    • 2 篇 公共管理
  • 1 篇 农学

主题

  • 162 篇 vision-language ...
  • 13 篇 visualization
  • 12 篇 prompt learning
  • 11 篇 large language m...
  • 10 篇 clip
  • 10 篇 training
  • 10 篇 prompt tuning
  • 9 篇 object detection
  • 8 篇 adaptation model...
  • 7 篇 deep learning
  • 7 篇 semantics
  • 7 篇 few-shot learnin...
  • 6 篇 knowledge distil...
  • 5 篇 multimodal learn...
  • 5 篇 task analysis
  • 5 篇 continual learni...
  • 5 篇 zero-shot learni...
  • 5 篇 feature extracti...
  • 5 篇 foundation model
  • 4 篇 tuning

机构

  • 4 篇 shanghai ai lab ...
  • 4 篇 peng cheng lab p...
  • 3 篇 emory univ dept ...
  • 3 篇 univ sci & techn...
  • 3 篇 chinese univ hon...
  • 3 篇 emory univ winsh...
  • 3 篇 sensetime res pe...
  • 2 篇 univ michigan an...
  • 2 篇 hong kong polyte...
  • 2 篇 sun yat sen univ...
  • 2 篇 shanghai univ pe...
  • 2 篇 emory univ dept ...
  • 2 篇 beijing univ tec...
  • 2 篇 univ chinese aca...
  • 2 篇 wuhan univ sch c...
  • 2 篇 harbin inst tech...
  • 2 篇 tongji univ coll...
  • 2 篇 northeastern uni...
  • 2 篇 chinese acad sci...
  • 2 篇 emory univ dept ...

作者

  • 5 篇 qiao yu
  • 3 篇 yang xiaofeng
  • 3 篇 gao peng
  • 3 篇 obinata yoshiki
  • 3 篇 inaba masayuki
  • 3 篇 kawaharazuka ken...
  • 3 篇 wang ruixuan
  • 3 篇 dai jifeng
  • 3 篇 okada kei
  • 3 篇 kanazawa naoaki
  • 2 篇 trivedi hari
  • 2 篇 zhou jie
  • 2 篇 wang lei
  • 2 篇 li xin
  • 2 篇 chen zhe
  • 2 篇 guo tao
  • 2 篇 luo ping
  • 2 篇 zhang tong
  • 2 篇 yang xi
  • 2 篇 liu liangchen

语言

  • 161 篇 英文
  • 1 篇 德文
  • 1 篇 法文
  • 1 篇 其他
检索条件"主题词=Vision-language Model"
162 条 记 录,以下是61-70 订阅
排序:
Robotic environmental state recognition with pre-trained vision-language models and black-box optimization
收藏 引用
ADVANCED ROBOTICS 2024年 第18期38卷 1255-1264页
作者: Kawaharazuka, Kento Obinata, Yoshiki Kanazawa, Naoaki Okada, Kei Inaba, Masayuki Univ Tokyo Grad Sch Informat Sci & Technol Dept Mechanoinformat Bunkyo Ku Tokyo Japan
In order for robots to autonomously navigate and operate in diverse environments, it is essential for them to recognize the state of their environment. On the other hand, the environmental state recognition has tradit... 详细信息
来源: 评论
Multi-task prompt tuning with soft context sharing for vision-language models
收藏 引用
NEUROCOMPUTING 2024年 603卷
作者: Ding, Kun Wang, Ying Liu, Pengzhang Yu, Qiang Zhang, Haojian Xiang, Shiming Pan, Chunhong Chinese Acad Sci Inst Automat State Key Lab Multimodal Artificial Intelligence S Beijing Peoples R China Chinese Acad Sci Inst Automat Engn Lab Intelligent Ind Vis Beijing Peoples R China Chinese Acad Sci Inst Automat Res Ctr Aerosp Informat Beijing Peoples R China JD Com Beijing Peoples R China
vision-language models have recently shown great potential on many tasks in computer vision. Meanwhile, prior work demonstrates prompt tuning designed for vision-language models could acquire superior performance on f... 详细信息
来源: 评论
Rectify representation bias in vision-language models for long-tailed recognition
收藏 引用
NEURAL NETWORKS 2024年 172卷 106134页
作者: Li, Bo Yao, Yongqiang Tan, Jingru Gong, Ruihao Lu, Jianwei Luo, Ye Tongji Univ 4800 Caoan Rd Shanghai 201804 Peoples R China Sensetime Res 1900 Hongmei Rd Shanghai 201103 Peoples R China Cent South Univ 932 South Lushan Rd Changsha 410083 Hunan Peoples R China Shanghai Univ Tradit Chinese Med 530 Lingling Rd Shanghai 201203 Peoples R China
Natural data typically exhibits a long-tailed distribution, presenting great challenges for recognition tasks. Due to the extreme scarcity of training instances, tail classes often show inferior performance. In this p... 详细信息
来源: 评论
Integrating vision-language models for Accelerated High-Throughput Nutrition Screening
收藏 引用
ADVANCED SCIENCE 2024年 第34期11卷 e2403578页
作者: Ma, Peihua Wu, Yixin Yu, Ning Jia, Xiaoxue He, Yiyang Zhang, Yang Backes, Michael Wang, Qin Wei, Cheng-, I Univ Maryland Coll Agr & Nat Resources Dept Nutr & Food Sci College Pk MD 20742 USA CISPA Helmholtz Ctr Informat Secur D-66123 Saarbrucken Germany Netflix Eyeline Studios Los Angeles CA 90028 USA
Addressing the critical need for swift and precise nutritional profiling in healthcare and in food industry, this study pioneers the integration of vision-language models (VLMs) with chemical analysis techniques. A cu... 详细信息
来源: 评论
vision-language pre-training for graph-based handwritten mathematical expression recognition
收藏 引用
PATTERN RECOGNITION 2025年 162卷
作者: Guo, Hong-Yu Wang, Chuang Yin, Fei Li, Xiao-Hui Liu, Cheng-Li Univ Chinese Acad Sci Sch Artificial Intelligence Beijing 100049 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Multimodal Artificial Intelligence S Beijing 100190 Peoples R China
vision-language pre-training models have shown promise in improving various downstream tasks. However, handwritten mathematical expression recognition (HMER), as atypical structured learning problem, can hardly benefi... 详细信息
来源: 评论
Attention head purification: A new perspective to harness CLIP for domain generalization
收藏 引用
IMAGE AND vision COMPUTING 2025年 157卷
作者: Wang, Yingfan Kang, Guoliang Beihang Univ 37 Xueyuan Rd Beijing 100191 Peoples R China
Domain Generalization (DG) aims to learn a model from multiple source domains to achieve satisfactory performance on unseen target domains. Recent works introduce CLIP to DG tasks due to its superior image-text alignm... 详细信息
来源: 评论
Generalizable Prompt Learning via Gradient Constrained Sharpness-Aware Minimization
收藏 引用
IEEE TRANSACTIONS ON MULTIMEDIA 2025年 27卷 1100-1113页
作者: Liu, Liangchen Wang, Nannan Zhou, Dawei Liu, Decheng Yang, Xi Gao, Xinbo Liu, Tongliang Xidian Univ Sch Telecommun Engn State Key Lab Integrated Serv Networks Xian 710071 Peoples R China Xidian Univ Sch Cyber Engn State Key Lab Integrated Serv Networks Xian 710071 Peoples R China Chongqing Univ Posts & Telecommun Chongqing Key Lab Image Cognit Chongqing 400065 Peoples R China Univ Sydney Fac Engn Sydney AI Ctr Sch Comp Sci Darlington NSW 2008 Australia
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM), i.e., improving the performance on unseen classes while maintaining the performance on seen classes. Comp... 详细信息
来源: 评论
IndVisSGG: VLM-based scene graph generation for industrial spatial intelligence
收藏 引用
ADVANCED ENGINEERING INFORMATICS 2025年 65卷
作者: Wang, Zuoxu Yan, Zhijie Li, Shufei Liu, Jihong Beihang Univ Sch Mech Engn & Automat Beijing Peoples R China Nanyang Technol Univ Ctr Adv Robot Technol Innovat Singapore Singapore Huazhong Univ Sci & Technol State Key Lab Digital Mfg Equipment & Technol Wuhan Peoples R China
Industrial spatial intelligence enables robots and machine tools to understand environmental settings and their relationships, allowing them to manipulate target components. A crucial aspect of this process is scene g... 详细信息
来源: 评论
Multi-MELO: Unified multimodal model editing with dynamic LoRA
收藏 引用
EXPERT SYSTEMS WITH APPLICATIONS 2025年 273卷
作者: Chen, Qin Yin, Jianghao Yu, Lang Zhou, Jie He, Liang East China Normal Univ Sch Comp Sci & Technol Shanghai 200062 Peoples R China East China Normal Univ Shanghai Inst AI Educ Shanghai 200062 Peoples R China
model editing aims to correct hallucinations or incorporate new knowledge into the pre-trained neural networks. Most previous researches focus on model editing with merely the textual modality, while editing for multi... 详细信息
来源: 评论
Inference Calibration of vision-language Foundation models for Zero-Shot and Few-Shot Learning
收藏 引用
PATTERN RECOGNITION LETTERS 2025年 192卷 15-21页
作者: Hu, Minyang Chang, Hong Shan, Shiguang Chen, Xilin Chinese Acad Sci Chinese Acad Sci CAS Inst Comp Technol Key Lab Intelligent Informat Proc Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China
Contrastive language-Image Pre-training (CLIP) models exhibit impressive zero-shot performance across various downstream cross-modal tasks by simply computing the dot product between image and text features. CLIP is p... 详细信息
来源: 评论