咨询与建议

限定检索结果

文献类型

  • 8 篇 期刊文献
  • 7 篇 会议

馆藏范围

  • 15 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 14 篇 工学
    • 8 篇 计算机科学与技术...
    • 4 篇 电气工程
    • 3 篇 控制科学与工程
    • 2 篇 电子科学与技术(可...
    • 1 篇 仪器科学与技术
    • 1 篇 材料科学与工程(可...
    • 1 篇 信息与通信工程
    • 1 篇 测绘科学与技术
    • 1 篇 生物医学工程(可授...
  • 2 篇 理学
    • 2 篇 化学
    • 1 篇 物理学
    • 1 篇 生物学
  • 2 篇 医学
    • 2 篇 临床医学
  • 1 篇 教育学
    • 1 篇 教育学
  • 1 篇 文学
    • 1 篇 外国语言文学

主题

  • 15 篇 visual language ...
  • 4 篇 large language m...
  • 2 篇 visual perceptio...
  • 2 篇 task planning
  • 1 篇 group activity r...
  • 1 篇 decision-making
  • 1 篇 gpt-4
  • 1 篇 generative artif...
  • 1 篇 clip
  • 1 篇 score matching
  • 1 篇 genai
  • 1 篇 error correction...
  • 1 篇 adversarial robu...
  • 1 篇 vlms
  • 1 篇 error correction
  • 1 篇 image captioning
  • 1 篇 transfer learnin...
  • 1 篇 scene understand...
  • 1 篇 gpt-3
  • 1 篇 yolov7

机构

  • 2 篇 fudan univ acad ...
  • 1 篇 korea elect tech...
  • 1 篇 department of el...
  • 1 篇 hubei engn univ ...
  • 1 篇 univ oxford oxfo...
  • 1 篇 korea adv inst s...
  • 1 篇 yildiz tech univ...
  • 1 篇 hisar hlth res c...
  • 1 篇 kocaeli univ inf...
  • 1 篇 hubei univ techn...
  • 1 篇 univ bergen dept...
  • 1 篇 xi an jiao tong ...
  • 1 篇 univ politecn va...
  • 1 篇 agcy sci technol...
  • 1 篇 univ oberta cata...
  • 1 篇 imperial coll lo...
  • 1 篇 chinese acad sci...
  • 1 篇 meituan inc peop...
  • 1 篇 univ bergen berg...
  • 1 篇 auckland univ te...

作者

  • 2 篇 mei aoran
  • 2 篇 zhu guo-niu
  • 2 篇 gan zhongxue
  • 1 篇 luxton-reilly an...
  • 1 篇 sun jiahao
  • 1 篇 wunsche burkhard...
  • 1 篇 feng tony haoran
  • 1 篇 ding caichang
  • 1 篇 gumuskaynak enes
  • 1 篇 liu yang
  • 1 篇 jia xiaojun
  • 1 篇 bayraktar ertugr...
  • 1 篇 pang shanmin
  • 1 篇 dai wei
  • 1 篇 zhu yan
  • 1 篇 geng jiajia
  • 1 篇 denny paul
  • 1 篇 guo qing
  • 1 篇 de zarza i.
  • 1 篇 cheng zehua

语言

  • 15 篇 英文
检索条件"主题词=Visual Language Models"
15 条 记 录,以下是1-10 订阅
排序:
MDAPT: Multi-Modal Depth Adversarial Prompt Tuning to Enhance the Adversarial Robustness of visual language models
收藏 引用
SENSORS 2025年 第1期25卷 258页
作者: Li, Chao Liao, Yonghao Ding, Caichang Ye, Zhiwei Hubei Univ Technol Sch Comp Sci Wuhan 430068 Peoples R China Hubei Engn Univ Sch Comp & Informat Sci Xiaogan 432000 Peoples R China
Large visual language models like Contrastive language-Image Pre-training (CLIP), despite their excellent performance, are highly vulnerable to the influence of adversarial examples. This work investigates the accurac... 详细信息
来源: 评论
The future of action recognition: are multi-modal visual language models the key?
收藏 引用
SIGNAL IMAGE AND VIDEO PROCESSING 2025年 第4期19卷 1-12页
作者: Gumuskaynak, Enes Eken, Suleyman Hisar Hlth Res Ctr Med Biochem Istanbul Turkiye Kocaeli Univ Informat Syst Engn TR-41001 Izmit Kocaeli Turkiye
This study investigates the potential of visual language models for action recognition, a critical task in video analysis. Traditional action recognition methods predominantly rely on visual features, often struggling... 详细信息
来源: 评论
ReplanVLM: Replanning Robotic Tasks With visual language models
收藏 引用
IEEE ROBOTICS AND AUTOMATION LETTERS 2024年 第11期9卷 10201-10208页
作者: Mei, Aoran Zhu, Guo-Niu Zhang, Huaxiang Gan, Zhongxue Fudan Univ Acad Engn & Technol Shanghai 200433 Peoples R China
Large language models (LLMs) have gained increasing popularity in robotic task planning due to their exceptional abilities in text analytics and generation, as well as their broad knowledge of the world. However, they... 详细信息
来源: 评论
GameVLM: A Decision-making Framework for Robotic Task Planning Based on visual language models and Zero-sum Games  21
GameVLM: A Decision-making Framework for Robotic Task Planni...
收藏 引用
21st IEEE International Conference on Mechatronics and Automation (IEEE ICMA)
作者: Mei, Aoran Wang, Jianhua Zhu, Guo-Niu Gan, Zhongxue Fudan Univ Acad Engn & Technol Shanghai 200433 Peoples R China
With their prominent scene understanding and reasoning capabilities, pre-trained visual-language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task planning. Compared with traditional tas... 详细信息
来源: 评论
A Design of Interface for visual-Impaired People to Access visual Information from Images Featuring Large language models and visual language models
A Design of Interface for Visual-Impaired People to Access V...
收藏 引用
CHI Conference on Human Factors in Computing Sytems (CHI)
作者: Zhang, Zhe-Xin Univ Tsukuba Digital Nat Grp Tsukuba Ibaraki Japan
We propose a design of interface for visual-impaired People to access visual information from images utilizing Large language models(LLMs), visual language models (VLMs), and Segment-Anything. We use Semantic-Segment-... 详细信息
来源: 评论
Unbiased Scene Graph Generation via visual language models  24
Unbiased Scene Graph Generation via Visual Language Models
收藏 引用
24th International Conference on Control, Automation and Systems
作者: Kim, Eunseo Park, Han-Mu Korea Elect Technol Inst Artificial Intelligence Res Ctr Seoul 13488 South Korea
Scene graph generation becomes significantly important as it bridges the gap between linguistic and visual information of scenes, facilitating a high-dimensional understanding of scenes. In this paper, we analyze the ... 详细信息
来源: 评论
Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-language models via Diffusion models
收藏 引用
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2025年 20卷 1333-1348页
作者: Guo, Qi Pang, Shanmin Jia, Xiaojun Liu, Yang Guo, Qing Xi An Jiao Tong Univ Sch Software Engn Xian 710049 Peoples R China Agcy Sci Technol & Res Ctr Frontier AI Res Singapore 138632 Singapore Nanyang Technol Univ Coll Comp & Data Sci Singapore 639798 Singapore Agcy Sci Technol & Res Inst High Performance Comp Singapore 138632 Singapore
Adversarial attacks, particularly targeted transfer-based attacks, can be used to assess the adversarial robustness of large visual-language models (VLMs), allowing for a more thorough examination of potential securit... 详细信息
来源: 评论
ReTrackVLM: Transformer-Enhanced Multi-Object Tracking with Cross-Modal Embeddings and Zero-Shot Re-Identification Integration
收藏 引用
APPLIED SCIENCES-BASEL 2025年 第4期15卷 1907-1907页
作者: Bayraktar, Ertugrul Yildiz Tech Univ Dept Mechatron Engn TR-34349 Istanbul Turkiye
Multi-object tracking (MOT) is an important task in computer vision, particularly in complex, dynamic environments with crowded scenes and frequent occlusions. Traditional tracking methods often suffer from identity s... 详细信息
来源: 评论
visualising the language practices of lower secondary students: outlines for practice-based models of multilingualism
收藏 引用
APPLIED LINGUISTICS REVIEW 2024年 第5期15卷 2035-2059页
作者: Storto, Andre Haukas, Asta Tiurikova, Irina Univ Bergen Bergen Norway Univ Bergen Dept Foreign Languages Bergen Norway
The multilingual turn in applied linguistics has produced a number of models that approach multilingualism from a variety of disciplinary and theoretical perspectives. However, fully developed models of multilingualis... 详细信息
来源: 评论
Semantic Scene Understanding with Large language models on Unmanned Aerial Vehicles
收藏 引用
DRONES 2023年 第2期7卷 114-114页
作者: de Curto, J. de Zarza, I. Calafate, Carlos T. Ctr Intelligent Multidimens Data Anal HK Sci Pk Shatin Hong Kong Peoples R China Univ Politecn Valencia Dept Informat Sistemas & Comp Valencia 46022 Spain GOETHE Univ Frankfurt Main Informat & Math D-60323 Frankfurt Germany Univ Oberta Catalunya Estudis Informat Multimedia & Telecomun Barcelona 08018 Spain
Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. I... 详细信息
来源: 评论