咨询与建议

限定检索结果

文献类型

  • 90 篇 会议
  • 58 篇 期刊文献
  • 1 篇 学位论文

馆藏范围

  • 149 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 137 篇 工学
    • 114 篇 计算机科学与技术...
    • 25 篇 电气工程
    • 10 篇 软件工程
    • 7 篇 生物医学工程(可授...
    • 6 篇 信息与通信工程
    • 6 篇 控制科学与工程
    • 6 篇 测绘科学与技术
    • 4 篇 仪器科学与技术
    • 4 篇 电子科学与技术(可...
    • 3 篇 机械工程
    • 2 篇 材料科学与工程(可...
    • 2 篇 交通运输工程
    • 1 篇 航空宇航科学与技...
    • 1 篇 环境科学与工程(可...
    • 1 篇 生物工程
  • 28 篇 医学
    • 19 篇 临床医学
    • 8 篇 特种医学
    • 4 篇 基础医学(可授医学...
    • 1 篇 中西医结合
    • 1 篇 医学技术(可授医学...
  • 18 篇 理学
    • 7 篇 地球物理学
    • 6 篇 物理学
    • 5 篇 生物学
    • 4 篇 化学
    • 2 篇 地理学
    • 1 篇 天文学
    • 1 篇 地质学
  • 4 篇 管理学
    • 3 篇 管理科学与工程(可...
    • 1 篇 图书情报与档案管...
  • 1 篇 哲学
    • 1 篇 哲学
  • 1 篇 农学

主题

  • 149 篇 vision-language ...
  • 15 篇 large language m...
  • 12 篇 prompt learning
  • 10 篇 clip
  • 9 篇 few-shot learnin...
  • 6 篇 contrastive lear...
  • 6 篇 foundation model...
  • 6 篇 visualization
  • 5 篇 deep learning
  • 4 篇 object detection
  • 4 篇 long-tailed reco...
  • 4 篇 remote sensing
  • 4 篇 image classifica...
  • 4 篇 artificial intel...
  • 4 篇 computer vision
  • 4 篇 domain generaliz...
  • 4 篇 prompt tuning
  • 3 篇 multimodal learn...
  • 3 篇 representation l...
  • 3 篇 image captioning

机构

  • 4 篇 carnegie mellon ...
  • 4 篇 univ chinese aca...
  • 3 篇 inesc tec porto
  • 3 篇 sichuan univ col...
  • 3 篇 univ chinese aca...
  • 3 篇 chinese univ hon...
  • 3 篇 chinese acad sci...
  • 2 篇 shanghai ai lab ...
  • 2 篇 ecole polytech f...
  • 2 篇 tsinghua univ de...
  • 2 篇 harbin inst tech...
  • 2 篇 zhejiang univ pe...
  • 2 篇 univ porto fac e...
  • 2 篇 beijing univ pos...
  • 2 篇 city univ hong k...
  • 2 篇 sichuan univ col...
  • 2 篇 tech univ munich...
  • 2 篇 westlake univ sc...
  • 2 篇 univ elect sci &...
  • 2 篇 johns hopkins un...

作者

  • 4 篇 banerjee biplab
  • 4 篇 zhang yi
  • 4 篇 jha ankit
  • 3 篇 wang donglin
  • 3 篇 singha mainak
  • 3 篇 zhang ce
  • 3 篇 tuia devis
  • 2 篇 men aidong
  • 2 篇 zhang min
  • 2 篇 liu xuyang
  • 2 篇 chen honggang
  • 2 篇 guo miaotian
  • 2 篇 yang yang
  • 2 篇 ricci elisa
  • 2 篇 ye mao
  • 2 篇 tian liang
  • 2 篇 patricio cristia...
  • 2 篇 wang haiying
  • 2 篇 teixeira luis f.
  • 2 篇 mukhopadhyay sou...

语言

  • 148 篇 英文
  • 1 篇 其他
检索条件"主题词=Vision-Language Models"
149 条 记 录,以下是31-40 订阅
排序:
Enhancing vision-language models with Scene Graphs for Traffic Accident Understanding
Enhancing Vision-Language Models with Scene Graphs for Traff...
收藏 引用
2024 International Automated Vehicle Validation Conference
作者: Lohner, Aaron Compagno, Francesco Francis, Jonathan Oltramari, Alessandro Carnegie Mellon Univ Sch Comp Sci Pittsburgh PA 15213 USA Univ Trento Dept Ind Innovat Trento Italy Bosch Ctr Artificial Intelligence Pittsburgh PA USA
Recognizing a traffic accident is an essential part of any autonomous driving or road monitoring system. An accident can appear in a wide variety of forms, and understanding what type of accident is taking place may b... 详细信息
来源: 评论
CLIPping the Deception: Adapting vision-language models for Universal Deepfake Detection  24
CLIPping the Deception: Adapting Vision-Language Models for ...
收藏 引用
4th Annual International Conference on Multimedia Retrieval (ICMR)
作者: Khan, Sohail Ahmed Duc-Tien Dang-Nguyen Univ Bergen Bergen Norway
The recent advancements in Generative Adversarial Networks (GANs) and the emergence of Diffusion models have significantly streamlined the production of highly realistic and widely accessible synthetic content. As a r... 详细信息
来源: 评论
MoPE-CLIP: Structured Pruning for Efficient vision-language models with Module-wise Pruning Error Metric
MoPE-CLIP: Structured Pruning for Efficient Vision-Language ...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Lin, Haokun Bai, Haoli Liu, Zhili Hou, Lu Sung, Muyi Song, Linqi Wei, Ying Sun, Zhenan Univ Chinese Acad Sci Sch Artificial Intelligence Beijing Peoples R China Chinese Acad Sci CRIPAC Beijing Peoples R China Chinese Acad Sci MAIS Inst Automat Beijing Peoples R China Huawei Noahs Ark Lab Hong Kong Peoples R China City Univ Hong Kong Hong Kong Peoples R China City Univ Hong Kong Shenzhen Res Inst Hong Kong Peoples R China Nanyang Technol Univ Singapore Singapore
vision-language pre-trained models have achieved impressive performance on various downstream tasks. However, their large model sizes hinder their utilization on platforms with limited computational resources. We find... 详细信息
来源: 评论
MMA: Multi-Modal Adapter for vision-language models
MMA: Multi-Modal Adapter for Vision-Language Models
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Yang, Lingxiao Zhang, Ru-Yuan Wang, Yanchen Xie, Xiaohua Sun Yat Sen Univ Guangzhou Peoples R China Shanghai Jiao Tong Univ Shanghai Peoples R China Stanford Univ Stanford CA USA
Pre-trained vision-language models (VLMs) have served as excellent foundation models for transfer learning in diverse downstream tasks. However, tuning VLMs for few-shot generalization tasks faces a discrimination - g... 详细信息
来源: 评论
The Neglected Tails in vision-language models
The Neglected Tails in Vision-Language Models
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Parashar, Shubham Lin, Zhiqiu Liu, Tian Dong, Xiangjue Li, Yanan Ramanan, Deva Caverlee, James Kong, Shu Texas A&M Univ College Stn TX 77840 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA Zhejiang Lab Hangzhou Peoples R China Univ Macau Taipa Macao Peoples R China
vision-language models (VLMs) excel in zero-shot recognition but their performance varies greatly across different visual concepts. For example, although CLIP achieves impressive accuracy on ImageNet (60-80%), its per... 详细信息
来源: 评论
On the use of vision-language models for Visual Sentiment Analysis: a study on CLIP  11
On the use of Vision-Language models for Visual Sentiment An...
收藏 引用
11th International Conference on Affective Computing and Intelligent Interaction (ACIIW)
作者: Bustos, Cristina Civit, Carles Du, Brian Sole-Ribalta, Albert Lapedriza, Agata Univ Oberta Catalunya Barcelona Spain Northeastern Univ Boston MA 02115 USA
This work presents a study on how to exploit the CLIP embedding space to perform Visual Sentiment Analysis. We experiment with two architectures built on top of the CLIP embedding space, which we denote by CLIP-E. We ... 详细信息
来源: 评论
PromptKD: Unsupervised Prompt Distillation for vision-language models
PromptKD: Unsupervised Prompt Distillation for Vision-Langua...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Li, Zheng Li, Xiang Fu, Xinyi Zhang, Xin Wang, Weiqiang Chen, Shuo Yang, Jian Nankai Univ Coll Comp Sci PCA Lab VCIP Tianjin Peoples R China NKIARI Shenzhen Futian Peoples R China Ant Grp Tiansuan Lab Hangzhou Peoples R China RIKEN Wako Saitama Japan
Prompt learning has emerged as a valuable technique in enhancing vision-language models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly focuses on designing various learning forms of... 详细信息
来源: 评论
ENvisionING MEDCLIP: A DEEP DIVE INTO EXPLAINABILITY FOR MEDICAL vision-language models  21
ENVISIONING MEDCLIP: A DEEP DIVE INTO EXPLAINABILITY FOR MED...
收藏 引用
21st IEEE International Symposium on Biomedical Imaging (ISBI)
作者: Hashmi, Anees Ur Rehman Mahapatra, Dwarikanath Yaqub, Mohammad Mohamed bin Zayed Univ Artificial Intelligence Abu Dhabi U Arab Emirates Incept Inst Artificial Intelligence Abu Dhabi U Arab Emirates
Explaining Deep Learning models is becoming increasingly important in the face of daily emerging multimodal models, particularly in safety-critical domains like medical imaging. However, the lack of detailed investiga... 详细信息
来源: 评论
Fine-Grained Visual Prompt Learning of vision-language models for Image Recognition  23
Fine-Grained Visual Prompt Learning of Vision-Language Model...
收藏 引用
31st ACM International Conference on Multimedia (MM)
作者: Sun, Hongbo He, Xiangteng Zhou, Jiahuan Peng, Yuxin Peking Univ Wangxuan Inst Comp Technol Beijing Peoples R China Peking Univ Natl Key Lab Multimedia Informat Proc Beijing Peoples R China
Large-scale pre-trained vision-language (VL) models have shown powerful generic representation capabilities for adapting to down-stream tasks with limited training data, which are data-efficient solutions to various a... 详细信息
来源: 评论
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating vision-language models  23
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Eval...
收藏 引用
31st ACM International Conference on Multimedia (MM)
作者: Ma, Zheng Pan, Mianzhi Wu, Wenhan Cheng, Kanzhi Zhang, Jianbing Huang, Shujian Chen, Jiajun Nanjing Univ Natl Key Lab Novel Software Technol Nanjing Peoples R China
vision-language models (VLMs) have shown impressive performance in substantial downstream multi-modal tasks. However, only comparing the fine-tuned performance on downstream tasks leads to the poor interpretability of... 详细信息
来源: 评论