咨询与建议

限定检索结果

文献类型

  • 95 篇 会议
  • 74 篇 期刊文献
  • 1 篇 学位论文

馆藏范围

  • 170 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 162 篇 工学
    • 130 篇 计算机科学与技术...
    • 37 篇 电气工程
    • 17 篇 软件工程
    • 13 篇 信息与通信工程
    • 13 篇 控制科学与工程
    • 11 篇 生物医学工程(可授...
    • 9 篇 电子科学与技术(可...
    • 9 篇 测绘科学与技术
    • 4 篇 机械工程
    • 4 篇 仪器科学与技术
    • 4 篇 材料科学与工程(可...
    • 4 篇 生物工程
    • 2 篇 交通运输工程
    • 1 篇 航空宇航科学与技...
    • 1 篇 环境科学与工程(可...
  • 35 篇 医学
    • 25 篇 临床医学
    • 12 篇 特种医学
    • 4 篇 基础医学(可授医学...
    • 4 篇 医学技术(可授医学...
    • 1 篇 中西医结合
  • 26 篇 理学
    • 11 篇 物理学
    • 10 篇 地球物理学
    • 9 篇 化学
    • 5 篇 生物学
    • 3 篇 地理学
    • 1 篇 天文学
    • 1 篇 地质学
  • 7 篇 管理学
    • 6 篇 管理科学与工程(可...
    • 1 篇 图书情报与档案管...
  • 1 篇 哲学
    • 1 篇 哲学
  • 1 篇 农学

主题

  • 170 篇 vision-language ...
  • 17 篇 large language m...
  • 15 篇 prompt learning
  • 12 篇 few-shot learnin...
  • 11 篇 clip
  • 10 篇 visualization
  • 7 篇 contrastive lear...
  • 6 篇 foundation model...
  • 6 篇 remote sensing
  • 6 篇 training
  • 6 篇 adaptation model...
  • 5 篇 object detection
  • 5 篇 deep learning
  • 5 篇 feature extracti...
  • 5 篇 image classifica...
  • 4 篇 long-tailed reco...
  • 4 篇 computational mo...
  • 4 篇 artificial intel...
  • 4 篇 computer vision
  • 4 篇 domain generaliz...

机构

  • 4 篇 chinese acad sci...
  • 4 篇 carnegie mellon ...
  • 4 篇 univ chinese aca...
  • 3 篇 inesc tec porto
  • 3 篇 sichuan univ col...
  • 3 篇 univ chinese aca...
  • 3 篇 zhejiang univ pe...
  • 3 篇 chinese univ hon...
  • 2 篇 shanghai ai lab ...
  • 2 篇 ecole polytech f...
  • 2 篇 tsinghua univ de...
  • 2 篇 harbin inst tech...
  • 2 篇 univ porto fac e...
  • 2 篇 cent south univ ...
  • 2 篇 beijing univ pos...
  • 2 篇 city univ hong k...
  • 2 篇 china univ geosc...
  • 2 篇 sichuan univ col...
  • 2 篇 tech univ munich...
  • 2 篇 westlake univ sc...

作者

  • 4 篇 banerjee biplab
  • 4 篇 zhang yi
  • 4 篇 jha ankit
  • 3 篇 wang donglin
  • 3 篇 singha mainak
  • 3 篇 ding kun
  • 3 篇 zhang ce
  • 3 篇 tuia devis
  • 2 篇 men aidong
  • 2 篇 li haifeng
  • 2 篇 mahapatra dwarik...
  • 2 篇 zhang min
  • 2 篇 liu xuyang
  • 2 篇 chen honggang
  • 2 篇 ma chao
  • 2 篇 guo miaotian
  • 2 篇 yang yang
  • 2 篇 ricci elisa
  • 2 篇 ye mao
  • 2 篇 tian liang

语言

  • 164 篇 英文
  • 5 篇 其他
检索条件"主题词=Vision-language Models"
170 条 记 录,以下是111-120 订阅
排序:
VTR: Bidirectional Video-Textual Transmission Rail for CLIP-based Video Recognition
VTR: Bidirectional Video-Textual Transmission Rail for CLIP-...
收藏 引用
IEEE International Conference on Multimedia and Expo (ICME)
作者: Yu, Shaoqi Chen, Lili Zhang, Xiaolin Li, Jiamao Chinese Acad Sci Shanghai Inst Microsyst & Informat Technol Shanghai Peoples R China Univ Chinese Acad Sci Beijing Peoples R China ShanghaiTech Univ Shanghai Peoples R China
There are two key issues when transferring visionlanguage model like CLIP for video recognition: bidirectional video-textual transmission and temporal modeling. To address the issues, we propose a novel framework name... 详细信息
来源: 评论
ViP-LLaVA: Making Large Multimodal models Understand Arbitrary Visual Prompts
ViP-LLaVA: Making Large Multimodal Models Understand Arbitra...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Cai, Mu Liu, Haotian Mustikovela, Siva Karthik Meyer, Gregory P. Chai, Yuning Park, Dennis Lee, Yong Jae Univ Wisconsin Madison WI 53706 USA Cruise LLC San Francisco CA USA
While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatia... 详细信息
来源: 评论
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Dorkenwald, Michael Barazani, Nimrod Snoek, Cees G. M. Asano, Yuki M. Univ Amsterdam Amsterdam Netherlands
vision-language models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems. Nevertheless, these models face challenges in the fundamental computer... 详细信息
来源: 评论
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
CLIP as RNN: Segment Countless Visual Concepts without Train...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Sun, Shuyang Li, Runjia Torr, Philip Gu, Xiuye Li, Siyang Univ Oxford Oxford England Google Res Mountain View CA 94043 USA
Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets... 详细信息
来源: 评论
TRAINING VISUAL language models WITH OBJECT DETECTION: GROUNDED CHANGE DESCRIPTIONS IN SATELLITE IMAGES
TRAINING VISUAL LANGUAGE MODELS WITH OBJECT DETECTION: GROUN...
收藏 引用
IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
作者: Prado, Joao Luis Montariol, Syrielle Castillo-Navarro, Javiera Tuia, Devis Bosselut, Antoine Ecole Polytech Fed Lausanne EPFL Lausanne Switzerland
Recently, generalist vision language models (VLMs) have shown exceptional progress in tasks previously dominated by specialized computer vision models. This becomes more prevalent when visual grounding capabilities, s... 详细信息
来源: 评论
FASN: Feature Aggregate Side-Network for Open-Vocabulary Semantic Segmentation
FASN: Feature Aggregate Side-Network for Open-Vocabulary Sem...
收藏 引用
International Joint Conference on Neural Networks (IJCNN)
作者: Jia, Daixi Chen, Lipeng Su, Xingzhe Wu, Fengge Zhao, Junsuo Univ Chinese Acad Sci Chinese Acad Sci Inst Software Beijing 100190 Peoples R China
In this paper, we introduce an Feature Aggregate Side Network (FASN), a simple, efficient, and easy-to-train method for open-vocabulary semantic segmentation. Building upon existing models based on the CLIP-Side Netwo... 详细信息
来源: 评论
DARA: DOMAIN- AND RELATION-AWARE ADAPTERS MAKE PARAMETER-EFFICIENT TUNING FOR VISUAL GROUNDING
DARA: DOMAIN- AND RELATION-AWARE ADAPTERS MAKE PARAMETER-EFF...
收藏 引用
IEEE International Conference on Multimedia and Expo (ICME)
作者: Liu, Ting Liu, Xuyang Huang, Siteng Chen, Honggang Yin, Quanjun Qin, Long Wang, Donglin Hu, Yue Natl Univ Def Technol Coll Syst Engn Changsha Peoples R China Sichuan Univ Coll Elect & Informat Engn Chengdu Peoples R China Westlake Univ Sch Engn Hangzhou Peoples R China
Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a signif... 详细信息
来源: 评论
Spuriousness-Aware Meta-Learning for Learning Robust Classifiers  24
Spuriousness-Aware Meta-Learning for Learning Robust Classif...
收藏 引用
30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
作者: Zheng, Guangtao Ye, Wenqian Zhang, Aidong Univ Virginia Charlottesville VA 22903 USA
Spurious correlations are brittle associations between certain attributes of inputs and target variables, such as the correlation between an image background and an object class. Deep image classifiers often leverage ... 详细信息
来源: 评论
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Classes Are Not Equal: An Empirical Study on Image Recogniti...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Cui, Jiequan Zhu, Beier Wen, Xin Qi, Xiaojuan Yu, Bei Zhang, Hanwang Nanyang Technol Univ Singapore Singapore Univ Hong Kong Hong Kong Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China
In this paper, we present an empirical study on image unfairness, i.e., extreme class accuracy disparity on balanced data like ImageNet. We demonstrate that are not equal and unfairness is prevalent for image classifi... 详细信息
来源: 评论
VGDIFFZERO: TEXT-TO-IMAGE DIFFUSION models CAN BE ZERO-SHOT VISUAL GROUNDERS  49
VGDIFFZERO: TEXT-TO-IMAGE DIFFUSION MODELS CAN BE ZERO-SHOT ...
收藏 引用
49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
作者: Liu, Xuyang Huang, Siteng Kang, Yachen Chen, Honggang Wang, Donglin Sichuan Univ Coll Elect & Informat Engn Chengdu Peoples R China Westlake Univ Sch Engn Hangzhou Peoples R China
Large-scale text-to-image diffusion models have shown impressive capabilities for generative tasks by leveraging strong vision-language alignment from pre-training. However, most vision-language discriminative tasks r... 详细信息
来源: 评论