版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构: Guangdong China College of Computer Science & VCIP Nankai University Tianjin China School of Computing Australian National University Canberra Australia Graduate School of Science and Technology Keio University Yokohama Japan Department of Electronic Engineering Tsinghua University Beijing China Mohamed bin Zayed University of Artificial Intelligence Abu Dhabi United Arab Emirates
出 版 物:《arXiv》 (arXiv)
年 卷 期:2024年
核心收录:
摘 要:Colonoscopy is currently one of the most sensitive screening methods for colorectal cancer. This study investigates the frontiers of intelligent colonoscopy techniques and their prospective implications for multimodal medical applications. With this goal, we begin by assessing the current data-centric and model-centric landscapes through four tasks for colonoscopic scene perception, including classification, detection, segmentation, and vision-language understanding. Our assessment reveals domain-specific challenges and underscores the need for further multimodal research in colonoscopy. To address these gaps, we establish three foundational initiatives: a large-scale multimodal instruction tuning dataset ColonINST, a colonoscopy-designed multimodal language model ColonGPT, and a multimodal benchmark. To facilitate continuous advancements in this rapidly evolving field, we provide a public website for the latest updates: https://***/ai4colonoscopy/IntelliScope. © 2024, CC BY.