作者:
Lu, LinZou, Qingzhi
Key Laboratory of Computing Power Network and Information Se-curity Ministry of Education Shandong Computer Science Center Jinan China
Shandong Engineering Research Center of Big Data Applied Technology Faculty of Computer Science and Technology Jinan China Shandong Provincial Key Laboratory of Computer Networks
Shandong Fundamental Research Center for Computer Science Jinan China
Due to the exceptional performance of Transform-ers in 2D medical image segmentation, recent work has also introduced them into 3D medical segmentation tasks. For instance, Swin UNETR and other hierarchical Transforme...
详细信息
Due to the exceptional performance of Transform-ers in 2 $D$ medical image segmentation, recent work has also introduced them into 3D medical segmentation tasks. For instance, Swin UNETR and other hierarchical Transfo...
详细信息
ISBN:
(数字)9781665410205
ISBN:
(纸本)9781665410212
Due to the exceptional performance of Transform-ers in 2 $D$ medical image segmentation, recent work has also introduced them into 3D medical segmentation tasks. For instance, Swin UNETR and other hierarchical Transformers have reintroduced prior knowledge from several convolutional networks, further enhancing the volume segmentation capabilities of models. The efficacy of these hybrid methodologies is primarily attributed to the substantial quantity of parameters and the nonlocal self-attention mechanism with a large receptive field. We argue that the behavior of these methods' large receptive fields can be simulated by employing fewer parameters through the utilization of depth-wise convolutions with large kernel. Within this manuscript, we introduce a lightweight volume segmentation model called MedX-Net, which uses convolutional network modules to simulate hierarchical Transformers for robust volume segmentation. Firstly, inspired by the hierarchical Transformer module of Swin UNETR, we investigate large-kernel depth-wise convolutions with different sizes to achieve a reduced model parameter count while maintaining a large global receptive field. secondly, we replace the multilayer perceptron (MLP) in the hierarchical Transformer module with Inverted Bottleneck with Depthwise Convolution Enhancement(DWCE) to improve model performance with fewer activation and normalization layers, further reducing the parameter count. We validate the effectiveness and efficiency of our model for volume segmentation on three public datasets: Synapse, BTCV and ACDC. On the Synapse dataset, compared to Swin UNETR, our model achieves an improvement from 83.48% to 87.21% in Dice score. Compared to the result of 86.57% achieved by nnFormer, our model achieves superior performance while reducing the model parameter count by 64%.
暂无评论