Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach

Senthilkumar S.p.; Chandramouleeswaran Muthukumarasamy

doi:10.31127/tuje.1822987

Research Article

Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach

Year 2026, Volume: 10 Issue: 2 , 378 - 395 , 01.05.2026

Senthilkumar S.p. , Chandramouleeswaran Muthukumarasamy

https://doi.org/10.31127/tuje.1822987

https://izlik.org/JA79BM87CL

Abstract

Medical image segmentation remains a critical challenge in computer-aided diagnosis systems, particularly for early disease detection where precise boundary delineation can significantly impact patient outcomes. This study presents a comprehensive comparative analysis between Vision Transformer (ViT) based architectures and the conventional U-Net model for multi-organ segmentation tasks using chest CT scans and retinal fundus images. We evaluated both architectures on three distinct datasets comprising 15,420 annotated medical images, focusing on lung nodule detection, liver lesion segmentation, and retinal vessel segmentation for diabetic retinopathy screening. Our experimental results demonstrate that while U-Net achieves superior performance on smaller datasets (Dice coefficient: 0.89 ± 0.03), Vision Transformers exhibit remarkable capabilities with larger training samples (Dice coefficient: 0.93 ± 0.02), showing 4.5% improvement in segmentation accuracy. The ViT-based approach demonstrated enhanced generalization capabilities across diverse imaging modalities, reducing false positive rates by 31% compared to U-Net in cross-dataset validation. Furthermore, computational efficiency analysis revealed that despite requiring 2.3× more training time, ViT models reduced inference time by 18% in clinical deployment scenarios. Performance evaluation across image quality levels showed ViT maintained more consistent performance across signal-to-noise ratios (Dice drop: 4.2% from high to low SNR) compared to U-Net (8.7% drop), demonstrating transformers' robustness to image degradation in clinical settings where scan quality varies. These findings suggest that the choice between architectures should be guided by dataset size, computational resources, and specific clinical requirements, with hybrid approaches showing promising potential for future development.

Keywords

Medical Image Segmentation , Vision Transformers , U-Net , Deep Learning , Attention Mechanisms

References

Armato III, S. G., McLennan, G., Bidaut, L., et al. (2011). The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915-931.
Aydın, V. A. (2024). Comparison of CNN-based methods for yoga pose classification. Turkish Journal of Engineering, 8(1), 65-75. https://doi.org/10.31127/tuje.1348210
Azad, R., Aghdam, E. K., Rauland, A., et al. (2022). Medical image segmentation review: The success of U-Net. arXiv preprint arXiv:2211.14830.
Bai, W., Sinclair, M., Tarroni, G., et al. (2018). Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance, 20(1), 65.
Bilic, P., Christ, P. F., Vorontsov, E., et al. (2019). The Liver Tumor Segmentation Benchmark (LiTS). arXiv preprint arXiv:1901.04056.
Cao, H., Wang, Y., Chen, J., et al. (2022). Swin-Unet: Unet-like pure transformer for medical image segmentation. European Conference on Computer Vision, 205-218.
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., & Zhou, Y. (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016). 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Medical Image Computing and Computer-Assisted Intervention, 424-432.
Dirik, M. (2023). Machine learning-based lung cancer diagnosis. Turkish Journal of Engineering, 7(4), 322-330. https://doi.org/10.31127/tuje.1180931
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
Esteva, A., Kuprel, B., Novoa, R. A., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.
Ghesu, F. C., Georgescu, B., Mansoor, A., et al. (2019). Quantifying and leveraging classification uncertainty for chest radiograph assessment. Medical Image Computing and Computer-Assisted Intervention, 676-684.
Gülgün, O. D., & Erol, H. (2020). Classification performance comparisons of deep learning models in pneumonia diagnosis using chest X-ray images. Turkish Journal of Engineering, 4(3), 129-141. https://doi.org/10.31127/tuje.652358
Gulshan, V., Peng, L., Coram, M., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 316(22), 2402-2410.
Hatamizadeh, A., Tang, Y., Nath, V., et al. (2022). UNETR: Transformers for 3D medical image segmentation. IEEE Winter Conference on Applications of Computer Vision, 574-584.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 770-778.
Huang, H., Lin, L., Tong, R., et al. (2020). UNet 3+: A full-scale connected UNet for medical image segmentation. IEEE International Conference on Acoustics, Speech and Signal Processing, 1055-1059.
Huang, X., Deng, Z., Li, D., & Yuan, X. (2021). MISSFormer: An effective medical image segmentation Transformer. arXiv preprint arXiv:2109.07162.
Hyder, U., & Talpur, M. R. H. (2024). Detection of cotton leaf disease with machine learning model. Turkish Journal of Engineering, 8(2), 380-393. https://doi.org/10.31127/tuje.1406755
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203-211.
Jaeger, P. F., Kohl, S. A., Bickelhaupt, S., et al. (2020). Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection. Machine Learning for Health Workshop, 171-183.
Juraev, D. A., Elsayed, E. E., Bulnes, J. J. D., Agarwal, P., & Saeed, R. K. (2023). History of ill-posed problems and their application to solve various mathematical problems. Engineering Applications, 2(3), 279-290. https://publish.mersin.edu.tr/index.php/enap/article/view/1178
Karimi, D., Dou, H., Warfield, S. K., & Gholipour, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759.
Kesikoğlu, H. M., Çiçekli, Y. S., & Kaynak, T. (2020). The identification of seasonal coastline changes from Landsat 8 satellite data using artificial neural networks and k-nearest neighbor. Turkish Journal of Engineering, 4(1), 47-56.
Litjens, G., Kooi, T., Bejnordi, B. E., et al. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60-88.
Liu, Z., Lin, Y., Cao, Y., et al. (2021). Swin Transformer: Hierarchical vision transformer using shifted windows. International Conference on Computer Vision, 10012-10022.
McKinney, S. M., Sieniek, M., Godbole, V., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89-94.
Mema, B., & Basholli, F. (2023). Internet of Things in the development of future businesses in Albania. Advanced Engineering Science, 3, 196-205. https://publish.mersin.edu.tr/index.php/ades/article/view/1325
Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. International Conference on 3D Vision, 565-571.
Mogaraju, J. K. (2024). Machine learning empowered prediction of geolocation using groundwater quality variables over YSR district of India. Turkish Journal of Engineering, 8(1), 31-45. https://doi.org/10.31127/tuje.1255863
Oktay, O., Schlemper, J., Folgoc, L. L., et al. (2018). Attention U-Net: Learning where to look for the pancreas. Medical Imaging with Deep Learning.
Othman, M. M. (2023). Modeling of daily groundwater level using deep learning neural networks. Turkish Journal of Engineering, 7(4), 331-337. https://doi.org/10.31127/tuje.1169908 Polater, S. N., & Sevli, O. (2024). Deep learning based classification for Alzheimer’s disease detection using MRI images. Turkish Journal of Engineering, 8(4), 729-740. https://doi.org/10.31127/tuje.1434866
Rajpurkar, P., Lungren, M. P., et al. (2020). AppendiXNet: Deep learning for diagnosis of appendicitis from a small dataset of CT exams using video pretraining. Scientific Reports, 10(1), 3958.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (pp. 234-241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28
Shamshad, F., Khan, S., Zamir, S. W., et al. (2023). Transformers in medical imaging: A survey. Medical Image Analysis, 88, 102802.
Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., & Van Ginneken, B. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging, 23(4), 501-509.
Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 6105-6114.
Touvron, H., Cord, M., Douze, M., et al. (2021). Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning, 10347-10357.
Valanarasu, J. M. J., Oza, P., Hacihaliloglu, I., & Patel, V. M. (2021). Medical Transformer: Gated axial-attention for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 36-46.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 5998-6008). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., & Li, J. (2021). TransBTS: Multimodal brain tumor segmentation using Transformer. Medical Image Computing and Computer-Assisted Intervention, 109-119.
Xie, Y., Zhang, J., Shen, C., & Xia, Y. (2021). CoTr: Efficiently bridging CNN and Transformer for 3D medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 171-180. Zhang, Y., Liu, H., & Hu, Q. (2021). TransFuse: Fusing transformers and CNNs for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 14-24.
Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., & Liang, J. (2018). UNet++: A nested U-Net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 3-11.

There are 43 citations in total.

Details

Primary Language	English
Subjects	Wireless Communication Systems and Technologies (Incl. Microwave and Millimetrewave)
Journal Section	Research Article
Authors	Senthilkumar S.p. 0000-0003-4696-326X Chandramouleeswaran Muthukumarasamy 0000-0001-9174-2111
Submission Date	November 13, 2025
Acceptance Date	January 4, 2026
Publication Date	May 1, 2026
DOI	https://doi.org/10.31127/tuje.1822987
IZ	https://izlik.org/JA79BM87CL
Published in Issue	Year 2026 Volume: 10 Issue: 2

Cite

APA	S.p., S., & Muthukumarasamy, C. (2026). Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. Turkish Journal of Engineering, 10(2), 378-395. https://doi.org/10.31127/tuje.1822987
AMA	1.S.p. S, Muthukumarasamy C. Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. TUJE. 2026;10(2):378-395. doi:10.31127/tuje.1822987
Chicago	S.p., Senthilkumar, and Chandramouleeswaran Muthukumarasamy. 2026. “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”. Turkish Journal of Engineering 10 (2): 378-95. https://doi.org/10.31127/tuje.1822987.
EndNote	S.p. S, Muthukumarasamy C (May 1, 2026) Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. Turkish Journal of Engineering 10 2 378–395.
IEEE	[1]S. S.p. and C. Muthukumarasamy, “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”, TUJE, vol. 10, no. 2, pp. 378–395, May 2026, doi: 10.31127/tuje.1822987.
ISNAD	S.p., Senthilkumar - Muthukumarasamy, Chandramouleeswaran. “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”. Turkish Journal of Engineering 10/2 (May 1, 2026): 378-395. https://doi.org/10.31127/tuje.1822987.
JAMA	1.S.p. S, Muthukumarasamy C. Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. TUJE. 2026;10:378–395.
MLA	S.p., Senthilkumar, and Chandramouleeswaran Muthukumarasamy. “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”. Turkish Journal of Engineering, vol. 10, no. 2, May 2026, pp. 378-95, doi:10.31127/tuje.1822987.
Vancouver	1.Senthilkumar S.p., Chandramouleeswaran Muthukumarasamy. Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. TUJE. 2026 May 1;10(2):378-95. doi:10.31127/tuje.1822987

Article Files

Full Text