版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Center for Biomedical Engineering School of Information Science and Technology Fudan University Shanghai200438 China Department of Ultrasound Shanghai Pulmonary Hospital Tongji University School of Medicine Shanghai200433 China Department of Biomedical Engineering The Department of Computer and Data Science Case Western Reserve University ClevelandOH44106 United States Department of Ultrasound Sun Yat-Sen University Cancer Center State Key Laboratory of Oncology in South China Collaborative Innovation Center for Cancer Medicine Guangzhou510060 China Department of Ultrasound Ruijin Hospital Shanghai Jiaotong University School of Medicine Shanghai200025 China
出 版 物:《arXiv》 (arXiv)
年 卷 期:2024年
核心收录:
主 题:Diagnosis
摘 要:In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but also boost clinical confidence and reliability of the network. However, it is an intractable challenge to automatically focus on these person- and disease-specific features in videos and to enable networks to encode bimodal information comprehensively and efficiently. This paper proposes a novel Tri-Attention Selective Learning Network (TASL-Net) to tackle this challenge and automatically embed three types of diagnostic attention of sonographers into a mutual transformer framework for intelligent diagnosis of bimodal ultrasound videos. Firstly, a time-intensity-curve-based video selector is designed to mimic the temporal attention of sonographers, thus removing a large amount of redundant information while improving computational efficiency of TASL-Net. Then, to introduce the spatial attention of the sonographers for contrast-enhanced video analysis, we propose the earliest-enhanced position detector based on structural similarity variation, on which the TASL-Net is made to focus on the differences of perfusion variation inside and outside the lesion. Finally, by proposing a mutual encoding strategy that combines convolution and transformer, TASL-Net possesses bimodal attention to structure features on gray-scale videos and to perfusion variations on contrast-enhanced videos. These modules work collaboratively and contribute to superior performance. We conduct a detailed experimental validation of TASL-Net s performance on three datasets, including lung, breast, and liver, with a total of 791 cases. A comprehensive ablation experiment and comparison with five state-of-the-