咨询与建议

限定检索结果

文献类型

  • 7 篇 会议
  • 6 篇 期刊文献

馆藏范围

  • 13 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 13 篇 工学
    • 11 篇 计算机科学与技术...
    • 6 篇 电气工程
    • 3 篇 电子科学与技术(可...
    • 3 篇 信息与通信工程
    • 3 篇 软件工程
    • 2 篇 控制科学与工程
    • 1 篇 力学(可授工学、理...
    • 1 篇 机械工程

主题

  • 13 篇 large vision-lan...
  • 2 篇 hallucination
  • 2 篇 llm
  • 2 篇 computer vision
  • 2 篇 adaptation model...
  • 1 篇 internet of thin...
  • 1 篇 image fusion
  • 1 篇 vision transform...
  • 1 篇 offline reinforc...
  • 1 篇 generative pre-t...
  • 1 篇 adversarial visu...
  • 1 篇 medical question...
  • 1 篇 qwen-vl 7b model
  • 1 篇 cognition
  • 1 篇 glass box
  • 1 篇 analog gauge tra...
  • 1 篇 instruction tuni...
  • 1 篇 real-time data p...
  • 1 篇 transformers
  • 1 篇 large language m...

机构

  • 1 篇 guangdong key la...
  • 1 篇 dalian univ fore...
  • 1 篇 univ illinois co...
  • 1 篇 univ illinois ma...
  • 1 篇 univ southern ca...
  • 1 篇 shanghai ai lab ...
  • 1 篇 xi an jiao tong ...
  • 1 篇 sun yat sen univ...
  • 1 篇 osaka univ inst ...
  • 1 篇 northwestern pol...
  • 1 篇 robotics program...
  • 1 篇 univ illinois ho...
  • 1 篇 beijing jiaotong...
  • 1 篇 beijing jiaotong...
  • 1 篇 chinese univ hon...
  • 1 篇 shanghai artific...
  • 1 篇 northeastern uni...
  • 1 篇 tcl corp res hon...
  • 1 篇 sun yat sen univ...
  • 1 篇 xi an jiao tong ...

作者

  • 1 篇 zhang shanbo
  • 1 篇 tian beitong
  • 1 篇 kaufman robert b...
  • 1 篇 yin jian
  • 1 篇 yang hao
  • 1 篇 zhang ruixiao
  • 1 篇 luo ping
  • 1 篇 xu wanru
  • 1 篇 liu xin
  • 1 篇 wu mingyuan
  • 1 篇 zhong zhusi
  • 1 篇 espenhahn leah
  • 1 篇 zhang kaipeng
  • 1 篇 lee donghoon
  • 1 篇 sardela mauro
  • 1 篇 kan shichao
  • 1 篇 trivedi shiv
  • 1 篇 al-kadi omar
  • 1 篇 nahrstedt klara
  • 1 篇 yoo chang d.

语言

  • 13 篇 英文
  • 1 篇 日文
检索条件"主题词=Large Vision-language Model"
13 条 记 录,以下是1-10 订阅
排序:
B-AVIBench: Toward Evaluating the Robustness of large vision-language model on Black-Box Adversarial Visual-Instructions
收藏 引用
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2025年 20卷 1434-1446页
作者: Zhang, Hao Shao, Wenqi Liu, Hong Ma, Yongqiang Luo, Ping Qiao, Yu Zheng, Nanning Zhang, Kaipeng Xi An Jiao Tong Univ Natl Engn Res Ctr Visual Informat & Applicat Natl Key Lab Human Machine Hybrid Augmented Intell Xian 710049 Shaanxi Peoples R China Xi An Jiao Tong Univ Inst Artificial Intelligence & Robot Xian 710049 Shaanxi Peoples R China Shanghai Artificial Intelligence Lab Shanghai 200000 Peoples R China Osaka Univ Inst Databil Sci Suita Osaka 5650871 Japan
large vision-language models (LVLMs) have shown significant progress in responding well to visual-instructions from users. However, these instructions, encompassing images and text, are susceptible to both intentional... 详细信息
来源: 评论
MiniMedGPT: Efficient large vision-language model for medical Visual Question Answering
收藏 引用
PATTERN RECOGNITION LETTERS 2025年 189卷 8-16页
作者: Alsabbagh, Abdel Rahman Mansour, Tariq Al-Kharabsheh, Mohammad Ebdah, Abdel Salam Al-Emaryeen, Roa'a Al-Nahhas, Sara Mahafza, Waleed Al-Kadi, Omar Univ Jordan King Abdullah Sch Informat Technol 2 Amman 11942 Jordan Jordan Univ Hosp Diagnost Radiol Dept Amman 11942 Jordan
While large vision-language models (LVLMs) like GPT-4 and Gemini demonstrate significant potential, utilization in the medical domain remains largely unexplored. This is due to challenges attributed to prolonged train... 详细信息
来源: 评论
Cross-scene visual context parsing with large vision-language model
收藏 引用
PATTERN RECOGNITION 2025年 166卷
作者: Zhang, Guoqing Kan, Shichao Shi, Lu Xu, Wanru An, Gaoyun Cen, Yigang Beijing Jiaotong Univ State Key Lab Adv Rail Autonomous Operat Beijing 100044 Peoples R China Beijing Jiaotong Univ Sch Comp Sci & Technol Beijing 100044 Peoples R China Beijing Jiaotong Univ Visual Intelligence X Int Cooperat Joint Lab MOE Beijing 100044 Peoples R China Cent South Univ Sch Comp Sci & Engn Changsha 410083 Hunan Peoples R China
Relation analysis is crucial for image-based applications such as visual reasoning and visual question answering. Current relation analysis such as scene graph generation (SGG) only focuses on building relationships a... 详细信息
来源: 评论
Enhancing Multi-Label Deep Hashing for Image and Audio With Joint Internal Global Loss Constraints and large vision-language model
收藏 引用
IEEE SIGNAL PROCESSING LETTERS 2024年 31卷 2550-2554页
作者: Liu, Ye Pan, Yan Yin, Jian Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou 510006 Peoples R China Lizhi Inc Artificial Intelligence & Big Data Dept Guangzhou 510630 Peoples R China Guangdong Key Lab Big Data Anal & Proc Guangzhou 510006 Peoples R China Sun Yat Sen Univ Sch Artificial Intelligence Zhuhai 519000 Peoples R China
Deep hashing algorithms can transform high-dimensional features into low-dimensional hash codes, which can reduce storage space and improve computational efficiency in traditional information retrieval (IR) and large ... 详细信息
来源: 评论
FashionGPT: A large vision-language model for Enhancing Fashion Understanding  33rd
FashionGPT: A Large Vision-Language Model for Enhancing Fash...
收藏 引用
33rd International Conference on Artificial Neural Networks and Machine Learning (ICANN)
作者: Song, Duanxiao Gao, Dehong Liu, Gongshen Li, Xiaoyong Shanghai Jiao Tong Univ Shanghai Peoples R China Northwestern Polytech Univ Xian Peoples R China
Fashion understanding is a challenging multi-modal task of interpreting multi aspects of fashion images. While traditional computer vision or multi-modal algorithms fall short in providing a comprehensive understandin... 详细信息
来源: 评论
Reward Generation via large vision-language model in Offline Reinforcement Learning
Reward Generation via Large Vision-Language Model in Offline...
收藏 引用
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
作者: Lee, Younghwan Luu, Tung M. Lee, Donghoon Yoo, Chang D. Electrical Engineering KAIST Daejeon Korea Republic of Robotics Program KAIST Daejeon Korea Republic of
In offline reinforcement learning (RL), learning from fixed datasets presents a promising solution for domains where real-time interaction with the environment is expensive or risky. However, designing dense reward si... 详细信息
来源: 评论
DiViCo: Disentangled Visual Token Compression for Efficient large vision-language model
收藏 引用
IEEE Transactions on Circuits and Systems for Video Technology 2025年
作者: Wang, Xin Pan, Zirui Chen, Hong Zhu, Wenwu Tsinghua University Beijing National Research Center for Information Science and Technology Department of Computer Science and Technology Beijing China
large vision-language models have drawn much attention and become increasingly applicable in complicated multimodal tasks such as visual question answering, video grounding, etc. However, it still suffers from ineffic... 详细信息
来源: 评论
Applications of large vision-language models in Visual Inspection
收藏 引用
Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering 2025年 第3期91卷 333-336页
作者: Kato, Kunihito Ueno, Shiryu Yoshida, Haruto
来源: 评论
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of large vision-language models
THRONE: An Object-based Hallucination Benchmark for the Free...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Kaul, Prannay Li, Zhizhong Yang, Hao Dukler, Yonatan Swaminathan, Ashwin Taylor, C. J. Soatto, Stefano Univ Oxford VGG Oxford England AWS AI Labs Oxford England
Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations&q... 详细信息
来源: 评论
OPERA: Alleviating Hallucination in Multi-Modal large language models via Over-Trust Penalty and Retrospection-Allocation
OPERA: Alleviating Hallucination in Multi-Modal Large Langua...
收藏 引用
IEEE/CVF Conference on Computer vision and Pattern Recognition (CVPR)
作者: Hu, Qidong Dong, Xiaoyi Zhang, Pan Wang, Bin He, Conghui Wang, Jiaqi Lin, Dahua Zhang, Weiming Yu, Nenghai Univ Sci & Technol China Anhui Prov Key Lab Digital Secur Hefei Peoples R China Shanghai AI Lab Shanghai Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China
Hallucination, posed as a pervasive challenge of multi-modal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with eit... 详细信息
来源: 评论