咨询与建议

限定检索结果

文献类型

  • 7 篇 期刊文献
  • 6 篇 会议

馆藏范围

  • 13 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 11 篇 工学
    • 8 篇 计算机科学与技术...
    • 5 篇 电气工程
    • 1 篇 信息与通信工程
    • 1 篇 控制科学与工程
    • 1 篇 土木工程
    • 1 篇 交通运输工程
    • 1 篇 软件工程
  • 2 篇 医学
    • 1 篇 基础医学(可授医学...
    • 1 篇 特种医学
  • 1 篇 文学
    • 1 篇 外国语言文学
  • 1 篇 理学
    • 1 篇 物理学
  • 1 篇 管理学
    • 1 篇 管理科学与工程(可...

主题

  • 13 篇 multi-modal larg...
  • 2 篇 generative artif...
  • 1 篇 behavior control...
  • 1 篇 driver distracti...
  • 1 篇 pino
  • 1 篇 driver monitorin...
  • 1 篇 parameter optimi...
  • 1 篇 transformer
  • 1 篇 openpino
  • 1 篇 pseudo 3d percep...
  • 1 篇 cognitive develo...
  • 1 篇 deep learning
  • 1 篇 digital-twin
  • 1 篇 weakly supervise...
  • 1 篇 driver state rec...
  • 1 篇 series-elastic a...
  • 1 篇 text-to-image re...
  • 1 篇 large language m...
  • 1 篇 stolen check
  • 1 篇 force control

机构

  • 1 篇 sony comp sci la...
  • 1 篇 ixs co ltd 7-7 s...
  • 1 篇 hong kong univ s...
  • 1 篇 prince sattam bi...
  • 1 篇 nyu ny usa
  • 1 篇 01.ai
  • 1 篇 chinese acad sci...
  • 1 篇 nanjing ctr appl...
  • 1 篇 yunnan united vi...
  • 1 篇 new york univ la...
  • 1 篇 southeast univ s...
  • 1 篇 new york univ sh...
  • 1 篇 sony grp corp 1-...
  • 1 篇 tongji univ coll...
  • 1 篇 beijing jiaotong...
  • 1 篇 cuhk mmlab peopl...
  • 1 篇 sony grp corp te...
  • 1 篇 guizhou univ col...
  • 1 篇 flower robot inc...
  • 1 篇 xian univ techno...

作者

  • 1 篇 sawai kunihito
  • 1 篇 liu jiaming
  • 1 篇 geng haoran
  • 1 篇 ma jiajian
  • 1 篇 zhu jian
  • 1 篇 endo ken
  • 1 篇 zuo zhiyuan
  • 1 篇 ding changxing
  • 1 篇 warner gary
  • 1 篇 miyazawa kiyokaz...
  • 1 篇 ma fuqi
  • 1 篇 tian lu
  • 1 篇 wang fengjuan
  • 1 篇 deng jingyang
  • 1 篇 li guozhang
  • 1 篇 zhang chengcui
  • 1 篇 zhan yibing
  • 1 篇 zhao fei
  • 1 篇 liu tingwen
  • 1 篇 matsui tatsuya

语言

  • 13 篇 英文
检索条件"主题词=Multi-modal Large Language Model"
13 条 记 录,以下是1-10 订阅
排序:
Human-Centric Context and Self-Uncertainty-Driven multi-modal large language model for Training-Free Vision-Based Driver State Recognition
收藏 引用
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2025年
作者: Hu, Chuanfei Li, Xinde Southeast Univ Sch Automat Key Lab Measurement & Control CSE Nanjing 210096 Peoples R China Nanjing Ctr Appl Math Nanjing 211135 Peoples R China Southeast Univ Shenzhen Res Inst Shenzhen 518063 Peoples R China
A vision-driven driver monitoring system plays a vital role to guarantee the driving safety. Recent advances focus on modeling a learning-based method to realize the driver monitoring system, benefiting from the power... 详细信息
来源: 评论
multi-modal large language model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning
收藏 引用
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2024年 第11期34卷 11682-11694页
作者: Zhu, Jian Wang, Hanli Shi, Miaojing Tongji Univ Dept Comp Sci & Technol Shanghai 200092 Peoples R China Tongji Univ Key Lab Embedded Syst & Serv Comp Minist Educ Shanghai 200092 Peoples R China Tongji Univ Coll Elect & Informat Engn Shanghai 201804 Peoples R China
The visual commonsense reasoning (VCR) task is to choose an answer and provide a justifying rationale based on the given image and textural question. Representative works first recognize objects in images and then ass... 详细信息
来源: 评论
multi-modal large language models in radiology: principles, applications, and potential
收藏 引用
ABDOMINAL RADIOLOGY 2024年 第6期50卷 2745-2757页
作者: Shen, Yiqiu Xu, Yanqi Ma, Jiajian Rui, Wushuang Zhao, Chen Heacock, Laura Huang, Chenchan New York Univ Langone Med Ctr New York NY 10016 USA NYU New York NY USA New York Univ Shanghai Shanghai Peoples R China
large language models (LLMs) and multi-modal large language models (MLLMs) represent the cutting-edge in artificial intelligence. This review provides a comprehensive overview of their capabilities and potential impac... 详细信息
来源: 评论
ETC: Temporal Boundary Expand Then Clarify for Weakly Supervised Video Grounding With multimodal large language model
收藏 引用
IEEE TRANSACTIONS ON multiMEDIA 2025年 27卷 1772-1782页
作者: Li, Guozhang Ding, Xinpeng Cheng, De Li, Jie Wang, Nannan Gao, Xinbo Xidian Univ Sch Elect Engn State Key Lab Integrated Serv Networks Xian 710071 Peoples R China Hong Kong Univ Sci & Technol Sch Engn Hong Kong Peoples R China Xidian Univ Sch Telecommun Engn State Key Lab Integrated Serv Networks Xian 710071 Peoples R China Chongqing Univ Posts & Telecommun Chongqing Key Lab Image Cognit Chongqing 400065 Peoples R China Xidian Univ Sch Elect Engn Xian 710071 Peoples R China
Early weakly supervised video grounding (WSVG) methods often struggle with incomplete boundary detection due to the absence of temporal boundary annotations. To bridge the gap between video-level and boundary-level an... 详细信息
来源: 评论
Evaluation of Data Inconsistency for multi-modal Sentiment Analysis  19th
Evaluation of Data Inconsistency for Multi-modal Sentiment A...
收藏 引用
19th National Conference on Man-Machine Speech Communication
作者: Wang, Yufei Wu, Mengyue Shanghai Jiao Tong Univ Shanghai 200000 Peoples R China
Emotion semantic inconsistency is a ubiquitous challenge in multi-modal sentiment analysis (MSA). MSA involves analyzing sentiment expressed across various modalities like text, audio, and videos. Each modality may co... 详细信息
来源: 评论
Diagram Formalization Enhanced multi-modal Geometry Problem Solver
Diagram Formalization Enhanced Multi-Modal Geometry Problem ...
收藏 引用
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
作者: Zhang, Zeren Cheng, Jo-Ku Deng, Jingyang Tian, Lu Ma, Jinwen Qin, Ziran Zhang, Xiaokai Zhu, Na Leng, Tuo School of Mathematical Sciences Peking University Beijing100871 China 01.AI Beijing China School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University Shanghai200240 China School of Computer Engineering and Science Shanghai University Shanghai200444 China
Mathematical reasoning remains an ongoing challenge for AI models, especially for geometry problems, which require both linguistic and visual signals. As the vision encoders of most MLLMs are trained on natural scenes... 详细信息
来源: 评论
Improving Accuracy and Generalizability via multi-modal large language models Collaboration
Improving Accuracy and Generalizability via Multi-Modal Larg...
收藏 引用
International Joint Conference on Neural Networks (IJCNN)
作者: Zhang, Shuili Mu, Hongzhang Liu, Tingwen Chinese Acad Sci Inst Informat Engn Beijing Peoples R China Univ Chinese Acad Sci Sch Cyber Secur Beijing Peoples R China
With the growing interest in large language models (LLMs), integrating visual tasks has led to the development of multi-Layer language models (MLLMs). Despite their advancements, MLLMs face challenges in accuracy and ... 详细信息
来源: 评论
VCF: An effective Vision-Centric Framework for Visual Question Answering
收藏 引用
NEUROCOMPUTING 2025年 625卷
作者: Wang, Fengjuan Peng, Longkun Cao, Shan Yang, Zhaoqilin Zhang, Ruonan An, Gaoyun Beijing Jiaotong Univ Sch Comp Sci & Technol Beijing 100044 Peoples R China Capital Normal Univ Coll Informat Engn Beijing 100048 Peoples R China Guizhou Univ Coll Comp Sci & Technol Guiyang 550025 Peoples R China
Recently, the wide application of large language models in the field of Visual Question Answering(VQA) has significantly boosted the progress in this field. Despite achieved advancements, LLMs cannot fully perceive an... 详细信息
来源: 评论
SafetyGPT: An autonomous agent of electrical safety risks for monitoring workers' unsafe behaviors
收藏 引用
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS 2025年 168卷
作者: Li, Wei Ma, Fuqi Zuo, Zhiyuan Jia, Rong Wang, Bo Alharbi, Abdullah M. Xian Univ Technol Sch Elect Engn Xian 710048 Shaanxi Peoples R China Univ Manchester Sch Elect & Elect Engn Manchester M13 9PL England Wuhan Univ Sch Elect Engn & Automat Wuhan 430072 Hubei Peoples R China Prince Sattam Bin Abdulaziz Univ Al Kharj 16278 Saudi Arabia
Workers' unsafe behavior is one of the major causes of accidents in electric power production. Intelligent monitoring of workers' unsafe behaviors can effectively prevent the expansion of safety risks, thereby... 详细信息
来源: 评论
CheckGuard: Advancing Stolen Check Detection with a Cross-modal Image-Text Benchmark Dataset  24
CheckGuard: Advancing Stolen Check Detection with a Cross-Mo...
收藏 引用
33rd ACM International Conference on Information and Knowledge Management (CIKM)
作者: Zhao, Fei Chen, Jiawen Huang, Bin Zhang, Chengcui Warner, Gary Univ Alabama Birmingham Birmingham AL 35294 USA Beijing Univ Technol Beijing Dublin Int Coll Beijing Peoples R China Ji Zhi Xing Huo Technol Beijing Beijing Peoples R China
The prevalence of check fraud, particularly with stolen checks sold on platforms such as Telegram, creates significant challenges for both individuals and financial institutions. This underscores the urgent need for i... 详细信息
来源: 评论