咨询与建议

限定检索结果

文献类型

  • 23 篇 会议
  • 17 篇 期刊文献

馆藏范围

  • 40 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 38 篇 工学
    • 33 篇 计算机科学与技术...
    • 8 篇 电气工程
    • 6 篇 软件工程
    • 3 篇 信息与通信工程
    • 3 篇 生物医学工程(可授...
    • 2 篇 控制科学与工程
    • 1 篇 光学工程
    • 1 篇 仪器科学与技术
    • 1 篇 材料科学与工程(可...
    • 1 篇 测绘科学与技术
    • 1 篇 网络空间安全
  • 9 篇 医学
    • 7 篇 临床医学
    • 2 篇 基础医学(可授医学...
    • 1 篇 特种医学
  • 7 篇 理学
    • 2 篇 数学
    • 2 篇 物理学
    • 2 篇 化学
    • 2 篇 生物学
    • 1 篇 地球物理学
  • 1 篇 法学
    • 1 篇 社会学
  • 1 篇 教育学
    • 1 篇 教育学

主题

  • 40 篇 vision language ...
  • 6 篇 large language m...
  • 3 篇 deep learning
  • 3 篇 pedestrian attri...
  • 3 篇 computer vision
  • 2 篇 clip
  • 2 篇 semantic segment...
  • 2 篇 visual question ...
  • 2 篇 mixed / augmente...
  • 2 篇 visual prompt
  • 2 篇 prompt learning
  • 2 篇 generative ai
  • 2 篇 augmented realit...
  • 2 篇 robotics
  • 1 篇 surveys
  • 1 篇 prompting roboti...
  • 1 篇 prompt engineeri...
  • 1 篇 multimodal learn...
  • 1 篇 graphs
  • 1 篇 segmentation

机构

  • 3 篇 univ amsterdam
  • 1 篇 aws ai labs seat...
  • 1 篇 natl chengchi un...
  • 1 篇 tud dresden univ...
  • 1 篇 duke univ durham...
  • 1 篇 acad sinica inst...
  • 1 篇 universidad de l...
  • 1 篇 univ hosp heidel...
  • 1 篇 sun yat sen univ...
  • 1 篇 chongqing univ c...
  • 1 篇 univ salerno i-8...
  • 1 篇 prince sultan un...
  • 1 篇 icar cent inst a...
  • 1 篇 technion haifa
  • 1 篇 univ sydney sch ...
  • 1 篇 mansoura univ fa...
  • 1 篇 purdue univ weld...
  • 1 篇 duke univ dept e...
  • 1 篇 infyz solut hyde...
  • 1 篇 indian council a...

作者

  • 3 篇 rudinac stevan
  • 2 篇 santana oliverio...
  • 2 篇 scargill tim
  • 2 篇 zhu hongyi
  • 2 篇 shi kunyu
  • 2 篇 tu zhuowen
  • 2 篇 gorlatova maria
  • 2 篇 xiu yanming
  • 2 篇 soatto stefano
  • 2 篇 lorenzo-navarro ...
  • 1 篇 belwafi kais
  • 1 篇 saggese alessia
  • 1 篇 van minh-hao
  • 1 篇 banerjee biplab
  • 1 篇 elbeltagi ahmed
  • 1 篇 shen wei
  • 1 篇 freire-obregón d...
  • 1 篇 salem ali
  • 1 篇 ghazal mohammed
  • 1 篇 mirmehdi majid

语言

  • 40 篇 英文
  • 1 篇 中文
检索条件"主题词=Vision Language Models"
40 条 记 录,以下是1-10 订阅
排序:
An Experimental Evaluation of Smart Sensors for Pedestrian Attribute Recognition Using Multi-Task Learning and vision language models
收藏 引用
SENSORS 2025年 第6期25卷
作者: Greco, Antonio Saggese, Alessia Sansone, Carlo Vento, Bruno Univ Salerno I-84084 Fisciano Italy Univ Naples Federico II I-80125 Naples Italy
This paper presents the experimental evaluation and analyzes the results of the first edition of the pedestrian attribute recognition (PAR) contest, the international competition which focused on smart visual sensors ... 详细信息
来源: 评论
Mutual Prompt Leaning for vision language models
收藏 引用
INTERNATIONAL JOURNAL OF COMPUTER vision 2025年 第3期133卷 1258-1276页
作者: Long, Sifan Zhao, Zhen Yuan, Junkun Tan, Zichang Liu, Jiangjiang Feng, Jingyuan Wang, Shengsheng Wang, Jingdong Jilin Univ Coll Comp Sci & Technol 2699 Qianjin St Changchun 130012 Jilin Peoples R China Jilin Univ Key Lab Symbol Computat & Knowledge Engn Minist Educ 2699 Qianjin St Changchun 130012 Jilin Peoples R China Baidu Inc Dept Comp Vis Technol VIS Beijing Peoples R China Univ Sydney Sch Elect & Informat Engn Sydney Australia Zhejiang Univ Coll Comp Sci & Technol Hangzhou Peoples R China
Large pre-trained vision language models (VLMs) have demonstrated impressive representation learning capabilities, but their transferability across various downstream tasks heavily relies on prompt learning. Since VLM... 详细信息
来源: 评论
Assessing the spatial accuracy of geocoding flood-related imagery using vision language models
收藏 引用
SPATIAL INFORMATION RESEARCH 2025年 第2期33卷
作者: Schmidt, Sebastian Fragachan, Eleonor Diaz Arifi, Dorian Hanny, David Resch, Bernd Univ Salzburg Dept Geoinformat Z GIS Schillerstr 30 A-5020 Salzburg Austria Res & Innovat Eviden C Albarracin25 Madrid 28037 Spain IT U Interdisciplinary Transformat Univ Austria Geosocial Artificial Intelligence Altenberger Str 66c A-4040 Linz Austria Harvard Univ Ctr Geog Anal 1737 Cambridge St Cambridge MA 02138 USA
While the capabilities of large language models and visual language models for various classification tasks have advanced significantly, their potential for location inference remains largely underexplored. Therefore,... 详细信息
来源: 评论
Perceptual visual security index: Analyzing image content leakage for vision language models
收藏 引用
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2025年 89卷
作者: Hu, Lishuang Xiang, Tao Guo, Shangwei Li, Xiaoguo Yang, Ying Chongqing Univ Coll Comp Sci Chongqing 401331 Peoples R China Agcy Sci Technol & Res Singapore 138632 Singapore
During the training phase of vision language models (VLMs), the privacy storage and sharing of images are of paramount importance. While the Visual Security Index (VSI) is commonly used for content leakage analysis, i... 详细信息
来源: 评论
Learning with Enriched Inductive Biases for vision-language models
收藏 引用
INTERNATIONAL JOURNAL OF COMPUTER vision 2025年 第6期133卷 3746-3761页
作者: Yang, Lingxiao Zhang, Ru-Yuan Chen, Qi Xie, Xiaohua Sun Yat sen Univ Sch Syst Sci & Engn Guangzhou Peoples R China Shanghai Jiao Tong Univ Brain Hlth Inst Natl Ctr Mental Disorders Shanghai Mental Hlth CtrSch Med Shanghai Peoples R China Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China Guangdong Prov Key Lab Informat Secur Technol Guangzhou Peoples R China Pazhou Lab Huangpu Guangzhou Peoples R China
vision-language models, pre-trained on large-scale image-text pairs, serve as strong foundation models for transfer learning across a variety of downstream tasks. For few-shot generalization tasks, i.e., when the mode... 详细信息
来源: 评论
Probing Fundamental Visual Comprehend Capabilities on vision language models via Visual Phrases from Structural Data
收藏 引用
COGNITIVE COMPUTATION 2024年 第6期16卷 3484-3504页
作者: Xie, Peijin Liu, Bingquan Harbin Inst Technol Fac Comp Harbin Peoples R China
Does the model demonstrate exceptional proficiency in "item counting,""color recognition," or other Fundamental Visual Comprehension Capability (FVCC)? There have been remarkable advancements in th... 详细信息
来源: 评论
Integrating Text-to-Image and vision language models for Synergistic Dataset Generation: The Creation of Synergy-General-Multimodal Pairs  2nd
Integrating Text-to-Image and Vision Language Models for Syn...
收藏 引用
2nd International Workshop on Generalizing from Limited Resources in the Open World (GLOW)
作者: Huang, Mao Xun Huang, Hen-Hsen Natl Chengchi Univ Dept Management Informat Syst Taipei Taiwan Acad Sinica Inst Informat Sci Taipei Taiwan
This study presents the creation of the Synergy-General-Multimodal Pairs dataset through an innovative integration of vision language models (VLMs) and text-to-image (T2I) technologies. The code and dataset used in th... 详细信息
来源: 评论
Enhancing Interactive Image Retrieval With Query Rewriting Using Large language models and vision language models  24
Enhancing Interactive Image Retrieval With Query Rewriting U...
收藏 引用
4th Annual International Conference on Multimedia Retrieval (ICMR)
作者: Zhu, Hongyi Huang, Jia-Hong Rudinac, Stevan Kanoulas, Evangelos Univ Amsterdam Amsterdam Netherlands
Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by a... 详细信息
来源: 评论
Beyond Human vision: The Role of Large vision language models in Microscope Image Analysis
Beyond Human Vision: The Role of Large Vision Language Model...
收藏 引用
2024 IEEE International Conference on Big Data, BigData 2024
作者: Verma, Prateek Van, Minh-Hao Wu, Xintao University of Arkansas Department of Electrical Engineering and Computer Science Fayetteville United States
vision language models (VLMs) such as LLaVA, ChatGPT-4, and Gemini have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data showing impressive performa... 详细信息
来源: 评论
Non-autoregressive Sequence-to-Sequence vision-language models
Non-autoregressive Sequence-to-Sequence Vision-Language Mode...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Shi, Kunyu Dong, Qi Goncalves, Luis Tu, Zhuowen Soatto, Stefano AWS AI Labs Seattle WA 98101 USA
Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions. We propose a parallel decoding ... 详细信息
来源: 评论