TY - JOUR
T1 - GLIMS
T2 - Attention-guided lightweight multi-scale hybrid network for volumetric semantic segmentation
AU - Yazıcı, Ziya Ata
AU - Öksüz, İlkay
AU - Ekenel, Hazım Kemal
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/6
Y1 - 2024/6
N2 - Convolutional Neural Networks (CNNs) have become widely adopted for medical image segmentation tasks, demonstrating promising performance. However, the inherent inductive biases in convolutional architectures limit their ability to model long-range dependencies and spatial correlations. While recent transformer-based architectures address these limitations by leveraging self-attention mechanisms to encode long-range dependencies and learn expressive representations, they often struggle to extract low-level features and are highly dependent on data availability. This motivated us for the development of GLIMS, a data-efficient attention-guided hybrid volumetric segmentation network. GLIMS utilizes Dilated Feature Aggregator Convolutional Blocks (DACB) to capture local–global feature correlations efficiently. Furthermore, the incorporated Swin Transformer-based bottleneck bridges the local and global features to improve the robustness of the model. Additionally, GLIMS employs an attention-guided segmentation approach through Channel and Spatial-Wise Attention Blocks (CSAB) to localize expressive features for fine-grained border segmentation. Quantitative and qualitative results on glioblastoma and multi-organ CT segmentation tasks demonstrate GLIMS’ effectiveness in terms of complexity and accuracy. GLIMS demonstrated outstanding performance on BraTS2021 and BTCV datasets, surpassing the performance of Swin UNETR. Notably, GLIMS achieved this high performance with a significantly reduced number of trainable parameters. Specifically, GLIMS has 47.16 M trainable parameters and 72.30G FLOPs, while Swin UNETR has 61.98 M trainable parameters and 394.84G FLOPs. The code is publicly available at https://github.com/yaziciz/GLIMS.
AB - Convolutional Neural Networks (CNNs) have become widely adopted for medical image segmentation tasks, demonstrating promising performance. However, the inherent inductive biases in convolutional architectures limit their ability to model long-range dependencies and spatial correlations. While recent transformer-based architectures address these limitations by leveraging self-attention mechanisms to encode long-range dependencies and learn expressive representations, they often struggle to extract low-level features and are highly dependent on data availability. This motivated us for the development of GLIMS, a data-efficient attention-guided hybrid volumetric segmentation network. GLIMS utilizes Dilated Feature Aggregator Convolutional Blocks (DACB) to capture local–global feature correlations efficiently. Furthermore, the incorporated Swin Transformer-based bottleneck bridges the local and global features to improve the robustness of the model. Additionally, GLIMS employs an attention-guided segmentation approach through Channel and Spatial-Wise Attention Blocks (CSAB) to localize expressive features for fine-grained border segmentation. Quantitative and qualitative results on glioblastoma and multi-organ CT segmentation tasks demonstrate GLIMS’ effectiveness in terms of complexity and accuracy. GLIMS demonstrated outstanding performance on BraTS2021 and BTCV datasets, surpassing the performance of Swin UNETR. Notably, GLIMS achieved this high performance with a significantly reduced number of trainable parameters. Specifically, GLIMS has 47.16 M trainable parameters and 72.30G FLOPs, while Swin UNETR has 61.98 M trainable parameters and 394.84G FLOPs. The code is publicly available at https://github.com/yaziciz/GLIMS.
KW - Attention-guidance
KW - Convolutional neural network
KW - Medical image segmentation
KW - Multi-scale features
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85192810099&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192810099&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2024.105055
DO - 10.1016/j.imavis.2024.105055
M3 - Article
AN - SCOPUS:85192810099
SN - 0262-8856
VL - 146
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 105055
ER -