咨询与建议

限定检索结果

文献类型

  • 27 篇 会议
  • 25 篇 期刊文献

馆藏范围

  • 52 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 40 篇 工学
    • 31 篇 计算机科学与技术...
    • 9 篇 电气工程
    • 4 篇 机械工程
    • 3 篇 信息与通信工程
    • 3 篇 土木工程
    • 2 篇 材料科学与工程(可...
    • 2 篇 建筑学
    • 2 篇 生物医学工程(可授...
    • 1 篇 仪器科学与技术
    • 1 篇 控制科学与工程
    • 1 篇 石油与天然气工程
    • 1 篇 环境科学与工程(可...
    • 1 篇 城乡规划学
  • 13 篇 医学
    • 7 篇 基础医学(可授医学...
    • 6 篇 临床医学
    • 1 篇 口腔医学
    • 1 篇 公共卫生与预防医...
    • 1 篇 特种医学
    • 1 篇 护理学(可授医学、...
  • 9 篇 管理学
    • 5 篇 公共管理
    • 4 篇 管理科学与工程(可...
    • 1 篇 工商管理
  • 5 篇 理学
    • 4 篇 物理学
    • 2 篇 化学
    • 1 篇 地理学
  • 1 篇 教育学
    • 1 篇 教育学
  • 1 篇 文学
    • 1 篇 外国语言文学
  • 1 篇 艺术学
    • 1 篇 美术学

主题

  • 52 篇 multimodal large...
  • 9 篇 large language m...
  • 5 篇 artificial intel...
  • 3 篇 generative ai
  • 2 篇 generative desig...
  • 2 篇 gpt-4
  • 2 篇 multimodal learn...
  • 2 篇 generative artif...
  • 2 篇 human-ai interac...
  • 2 篇 deep learning
  • 2 篇 llms
  • 2 篇 natural language...
  • 2 篇 visual question ...
  • 2 篇 gpt-4v
  • 2 篇 vision and langu...
  • 2 篇 chatgpt
  • 2 篇 machine learning
  • 2 篇 context modeling
  • 2 篇 large multimodal...
  • 1 篇 physics-based re...

机构

  • 2 篇 chinese univ hon...
  • 2 篇 shanghai artific...
  • 2 篇 sun yat sen univ...
  • 1 篇 guangdong key la...
  • 1 篇 univ waterloo wa...
  • 1 篇 google res ca 94...
  • 1 篇 department of or...
  • 1 篇 sun yat sen univ...
  • 1 篇 zhejiang univ te...
  • 1 篇 department of co...
  • 1 篇 univ granada gra...
  • 1 篇 school of comput...
  • 1 篇 natl & kapodistr...
  • 1 篇 tech univ munich...
  • 1 篇 samsung r&d inst...
  • 1 篇 zhejiang univ co...
  • 1 篇 hokkaido univ da...
  • 1 篇 east china norma...
  • 1 篇 nanchang univ ar...
  • 1 篇 chinese univ hon...

作者

  • 2 篇 zhong shu
  • 2 篇 akrout mohamed
  • 1 篇 liang hao
  • 1 篇 yan zhiyuan
  • 1 篇 cirone katrina d...
  • 1 篇 martin-fernandez...
  • 1 篇 kumar sidharth
  • 1 篇 cun xiaodong
  • 1 篇 pang ning
  • 1 篇 sheikh javaid
  • 1 篇 yan hanbing
  • 1 篇 cucchiara rita
  • 1 篇 kiamilev fouad
  • 1 篇 aizawa kiyoharu
  • 1 篇 yamakata yoko
  • 1 篇 elmallah raley
  • 1 篇 hormozdiari farh...
  • 1 篇 li xin
  • 1 篇 panagoulias dimi...
  • 1 篇 huang rui

语言

  • 50 篇 英文
  • 2 篇 其他
检索条件"主题词=Multimodal Large Language Models"
52 条 记 录,以下是31-40 订阅
排序:
Leveraging Generative Vision models for Extracting Process models from Documents  22nd
Leveraging Generative Vision Models for Extracting Process M...
收藏 引用
22nd International Conference on Business Process Management (BPM)
作者: Voelter, Marvin Hadian, Raheleh Kampik, Timotheus Breitmayer, Marius Reichert, Manfred SAP Berlin Germany Ulm Univ Ulm Germany
This paper investigates the vision capabilities of multimodal Generative Pre-trained Transformers (GPTs) to auto-generate structured process models from diagram- and text-based documents. We introduce a dataset of 123... 详细信息
来源: 评论
PhysID: Physics-based Interactive Dynamics from a Single-view Image
PhysID: Physics-based Interactive Dynamics from a Single-vie...
收藏 引用
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
作者: Gothe, Sourabh Vasant Chattopadhyay, Ayon Kiran, Gunturi Venkata Sai Phani Pratik Agarwal, Vibhav Vachhani, Jayesh Rajkumar Ghosh, Sourav Parameswaranath, V.M. Barath Raj, K.R. Samsung R&D Institute India - Bangalore India
Transforming static images into interactive experiences remains a challenging task in computer vision. Tackling this challenge holds the potential to elevate mobile user experiences, notably through interactive and AR... 详细信息
来源: 评论
User-in-the-Loop Evaluation of multimodal LLMs for Activity Assistance
User-in-the-Loop Evaluation of Multimodal LLMs for Activity ...
收藏 引用
2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
作者: Verghese, Mrinal Chen, Brian Eghbalzadeh, Hamid Nagarajan, Tushar Desai, Ruta Carnegie Mellon University United States Samsung Research America United States Meta Reality Labs Research United States Meta Fundamental Ai Research United States
Our research investigates the capability of modern multimodal reasoning models, powered by large language models (LLMs), to facilitate vision-powered assistants for multi-step daily activities. Such assistants must be... 详细信息
来源: 评论
Perceive. Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries
Perceive. Query & Reason: Enhancing Video QA with Question-G...
收藏 引用
2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
作者: Amoroso, Roberto Zhang, Gengyuan Koner, Rajat Baraldi, Lorenzo Cucchiara, Rita Tresp, Volker Lmu Munich Germany Mcml Germany University of Modena and Reggio Emilia Italy IIT-CNR Italy
Video Question Answering (Video QA) is a challenging video understanding task that requires models to compre-hend entire videos, identify the most relevant information based on contextual cues from a given question, a... 详细信息
来源: 评论
Simignore: Exploring and enhancing multimodal large model complex reasoning via similarity computation
收藏 引用
NEURAL NETWORKS 2025年 184卷 107059页
作者: Zhang, Xiaofeng Zeng, Fanshuo Gu, Chaochen Shanghai Jiao Tong Univ 800 Dongchuan Rd Shanghai 200240 Peoples R China Cent South Univ 932 South Lushan Rd Changsha 410083 Hunan Peoples R China
Recently, the field of multimodal large language models (MLLMs) has grown rapidly, with many large Vision- language models (LVLMs) relying on sequential visual representations. In these models, images are broken down ... 详细信息
来源: 评论
MS-RRBR: A Multi-Model Synergetic Framework for Restricted and Repetitive Behavior Recognition in Children with Autism
收藏 引用
APPLIED SCIENCES-BASEL 2025年 第3期15卷 1577-1577页
作者: Wang, Yonggu Shao, Yifan Yu, Zengyi Wang, Zihan Zhejiang Univ Technol Coll Educ Hangzhou 310023 Peoples R China
Restricted and Repetitive Behaviors (RRBs) are hallmark features of children with autism spectrum disorder (ASD) and are also one of the diagnostic criteria for the condition. Traditional methods of RRBs assessment th... 详细信息
来源: 评论
From Vision to Perception: Transforming Art Experience for the Blind with C-ArtQA
收藏 引用
JOURNAL OF IMAGING SCIENCE AND TECHNOLOGY 2025年 第1期69卷 1-11页
作者: Guo, Jia Hsieh, Yung-Cheng Zhejiang Univ Hangzhou 310058 Peoples R China
Blind and low vision (BLV) individuals face unique challenges due to a lack of objective explanations and shared artistic vocabulary. This study introduces Cultural ArtQA (C-ArtQA), a benchmark designed to assess whet... 详细信息
来源: 评论
Probing Fundamental Visual Comprehend Capabilities on Vision language models via Visual Phrases from Structural Data
收藏 引用
COGNITIVE COMPUTATION 2024年 第6期16卷 3484-3504页
作者: Xie, Peijin Liu, Bingquan Harbin Inst Technol Fac Comp Harbin Peoples R China
Does the model demonstrate exceptional proficiency in "item counting,""color recognition," or other Fundamental Visual Comprehension Capability (FVCC)? There have been remarkable advancements in th... 详细信息
来源: 评论
Human 0, MLLM 1: Unlocking New Layers of Automation in language-Conditioned Robotics with multimodal LLMs  21
Human 0, MLLM 1: Unlocking New Layers of Automation in Langu...
收藏 引用
21st International Conference on Mechatronics-Mechatronika
作者: ElMallah, Raley Zamani, Nima Lee, Chi-Guhn Univ Toronto Mech & Ind Engn Toronto ON Canada Cobionix Corp Kitchener ON Canada
language-conditioned robotics has seen tremendous growth in frameworks that aim to improve the success rates of robots acting upon the environment according to free-form language instructions. However, most existing f... 详细信息
来源: 评论
Feeling Textiles through AI: An Exploration into multimodal language models and Human Perception Alignment  24
Feeling Textiles through AI: An Exploration into Multimodal ...
收藏 引用
Companion International Conference on multimodal Interaction
作者: Zhong, Shu Gatti, Elia Cho, Youngjun Obrist, Marianna UCL Dept Comp Sci London England
Human-artificial intelligence (AI) alignment ensures that AI systems align with human goals and behaviors. This paper introduces perceptual alignment as a critical aspect of this alignment, focusing on the concurrence... 详细信息
来源: 评论