Research Article
BibTex RIS Cite

Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach

Year 2026, Volume: 10 Issue: 2 , 378 - 395 , 01.05.2026
https://doi.org/10.31127/tuje.1822987
https://izlik.org/JA79BM87CL

Abstract

Medical image segmentation remains a critical challenge in computer-aided diagnosis systems, particularly for early disease detection where precise boundary delineation can significantly impact patient outcomes. This study presents a comprehensive comparative analysis between Vision Transformer (ViT) based architectures and the conventional U-Net model for multi-organ segmentation tasks using chest CT scans and retinal fundus images. We evaluated both architectures on three distinct datasets comprising 15,420 annotated medical images, focusing on lung nodule detection, liver lesion segmentation, and retinal vessel segmentation for diabetic retinopathy screening. Our experimental results demonstrate that while U-Net achieves superior performance on smaller datasets (Dice coefficient: 0.89 ± 0.03), Vision Transformers exhibit remarkable capabilities with larger training samples (Dice coefficient: 0.93 ± 0.02), showing 4.5% improvement in segmentation accuracy. The ViT-based approach demonstrated enhanced generalization capabilities across diverse imaging modalities, reducing false positive rates by 31% compared to U-Net in cross-dataset validation. Furthermore, computational efficiency analysis revealed that despite requiring 2.3× more training time, ViT models reduced inference time by 18% in clinical deployment scenarios. Performance evaluation across image quality levels showed ViT maintained more consistent performance across signal-to-noise ratios (Dice drop: 4.2% from high to low SNR) compared to U-Net (8.7% drop), demonstrating transformers' robustness to image degradation in clinical settings where scan quality varies. These findings suggest that the choice between architectures should be guided by dataset size, computational resources, and specific clinical requirements, with hybrid approaches showing promising potential for future development.

References

  • Armato III, S. G., McLennan, G., Bidaut, L., et al. (2011). The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915-931.
  • Aydın, V. A. (2024). Comparison of CNN-based methods for yoga pose classification. Turkish Journal of Engineering, 8(1), 65-75. https://doi.org/10.31127/tuje.1348210
  • Azad, R., Aghdam, E. K., Rauland, A., et al. (2022). Medical image segmentation review: The success of U-Net. arXiv preprint arXiv:2211.14830.
  • Bai, W., Sinclair, M., Tarroni, G., et al. (2018). Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance, 20(1), 65.
  • Bilic, P., Christ, P. F., Vorontsov, E., et al. (2019). The Liver Tumor Segmentation Benchmark (LiTS). arXiv preprint arXiv:1901.04056.
  • Cao, H., Wang, Y., Chen, J., et al. (2022). Swin-Unet: Unet-like pure transformer for medical image segmentation. European Conference on Computer Vision, 205-218.
  • Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., & Zhou, Y. (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306
  • Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016). 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Medical Image Computing and Computer-Assisted Intervention, 424-432.
  • Dirik, M. (2023). Machine learning-based lung cancer diagnosis. Turkish Journal of Engineering, 7(4), 322-330. https://doi.org/10.31127/tuje.1180931
  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
  • Esteva, A., Kuprel, B., Novoa, R. A., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.
  • Ghesu, F. C., Georgescu, B., Mansoor, A., et al. (2019). Quantifying and leveraging classification uncertainty for chest radiograph assessment. Medical Image Computing and Computer-Assisted Intervention, 676-684.
  • Gülgün, O. D., & Erol, H. (2020). Classification performance comparisons of deep learning models in pneumonia diagnosis using chest X-ray images. Turkish Journal of Engineering, 4(3), 129-141. https://doi.org/10.31127/tuje.652358
  • Gulshan, V., Peng, L., Coram, M., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 316(22), 2402-2410.
  • Hatamizadeh, A., Tang, Y., Nath, V., et al. (2022). UNETR: Transformers for 3D medical image segmentation. IEEE Winter Conference on Applications of Computer Vision, 574-584.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 770-778.
  • Huang, H., Lin, L., Tong, R., et al. (2020). UNet 3+: A full-scale connected UNet for medical image segmentation. IEEE International Conference on Acoustics, Speech and Signal Processing, 1055-1059.
  • Huang, X., Deng, Z., Li, D., & Yuan, X. (2021). MISSFormer: An effective medical image segmentation Transformer. arXiv preprint arXiv:2109.07162.
  • Hyder, U., & Talpur, M. R. H. (2024). Detection of cotton leaf disease with machine learning model. Turkish Journal of Engineering, 8(2), 380-393. https://doi.org/10.31127/tuje.1406755
  • Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203-211.
  • Jaeger, P. F., Kohl, S. A., Bickelhaupt, S., et al. (2020). Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection. Machine Learning for Health Workshop, 171-183.
  • Juraev, D. A., Elsayed, E. E., Bulnes, J. J. D., Agarwal, P., & Saeed, R. K. (2023). History of ill-posed problems and their application to solve various mathematical problems. Engineering Applications, 2(3), 279-290. https://publish.mersin.edu.tr/index.php/enap/article/view/1178
  • Karimi, D., Dou, H., Warfield, S. K., & Gholipour, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759.
  • Kesikoğlu, H. M., Çiçekli, Y. S., & Kaynak, T. (2020). The identification of seasonal coastline changes from Landsat 8 satellite data using artificial neural networks and k-nearest neighbor. Turkish Journal of Engineering, 4(1), 47-56.
  • Litjens, G., Kooi, T., Bejnordi, B. E., et al. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60-88.
  • Liu, Z., Lin, Y., Cao, Y., et al. (2021). Swin Transformer: Hierarchical vision transformer using shifted windows. International Conference on Computer Vision, 10012-10022.
  • McKinney, S. M., Sieniek, M., Godbole, V., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89-94.
  • Mema, B., & Basholli, F. (2023). Internet of Things in the development of future businesses in Albania. Advanced Engineering Science, 3, 196-205. https://publish.mersin.edu.tr/index.php/ades/article/view/1325
  • Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. International Conference on 3D Vision, 565-571.
  • Mogaraju, J. K. (2024). Machine learning empowered prediction of geolocation using groundwater quality variables over YSR district of India. Turkish Journal of Engineering, 8(1), 31-45. https://doi.org/10.31127/tuje.1255863
  • Oktay, O., Schlemper, J., Folgoc, L. L., et al. (2018). Attention U-Net: Learning where to look for the pancreas. Medical Imaging with Deep Learning.
  • Othman, M. M. (2023). Modeling of daily groundwater level using deep learning neural networks. Turkish Journal of Engineering, 7(4), 331-337. https://doi.org/10.31127/tuje.1169908 Polater, S. N., & Sevli, O. (2024). Deep learning based classification for Alzheimer’s disease detection using MRI images. Turkish Journal of Engineering, 8(4), 729-740. https://doi.org/10.31127/tuje.1434866
  • Rajpurkar, P., Lungren, M. P., et al. (2020). AppendiXNet: Deep learning for diagnosis of appendicitis from a small dataset of CT exams using video pretraining. Scientific Reports, 10(1), 3958.
  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (pp. 234-241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28
  • Shamshad, F., Khan, S., Zamir, S. W., et al. (2023). Transformers in medical imaging: A survey. Medical Image Analysis, 88, 102802.
  • Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., & Van Ginneken, B. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging, 23(4), 501-509.
  • Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 6105-6114.
  • Touvron, H., Cord, M., Douze, M., et al. (2021). Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning, 10347-10357.
  • Valanarasu, J. M. J., Oza, P., Hacihaliloglu, I., & Patel, V. M. (2021). Medical Transformer: Gated axial-attention for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 36-46.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 5998-6008). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
  • Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., & Li, J. (2021). TransBTS: Multimodal brain tumor segmentation using Transformer. Medical Image Computing and Computer-Assisted Intervention, 109-119.
  • Xie, Y., Zhang, J., Shen, C., & Xia, Y. (2021). CoTr: Efficiently bridging CNN and Transformer for 3D medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 171-180. Zhang, Y., Liu, H., & Hu, Q. (2021). TransFuse: Fusing transformers and CNNs for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 14-24.
  • Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., & Liang, J. (2018). UNet++: A nested U-Net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 3-11.
There are 43 citations in total.

Details

Primary Language English
Subjects Wireless Communication Systems and Technologies (Incl. Microwave and Millimetrewave)
Journal Section Research Article
Authors

Senthilkumar S.p. 0000-0003-4696-326X

Chandramouleeswaran Muthukumarasamy 0000-0001-9174-2111

Submission Date November 13, 2025
Acceptance Date January 4, 2026
Publication Date May 1, 2026
DOI https://doi.org/10.31127/tuje.1822987
IZ https://izlik.org/JA79BM87CL
Published in Issue Year 2026 Volume: 10 Issue: 2

Cite

APA S.p., S., & Muthukumarasamy, C. (2026). Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. Turkish Journal of Engineering, 10(2), 378-395. https://doi.org/10.31127/tuje.1822987
AMA 1.S.p. S, Muthukumarasamy C. Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. TUJE. 2026;10(2):378-395. doi:10.31127/tuje.1822987
Chicago S.p., Senthilkumar, and Chandramouleeswaran Muthukumarasamy. 2026. “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”. Turkish Journal of Engineering 10 (2): 378-95. https://doi.org/10.31127/tuje.1822987.
EndNote S.p. S, Muthukumarasamy C (May 1, 2026) Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. Turkish Journal of Engineering 10 2 378–395.
IEEE [1]S. S.p. and C. Muthukumarasamy, “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”, TUJE, vol. 10, no. 2, pp. 378–395, May 2026, doi: 10.31127/tuje.1822987.
ISNAD S.p., Senthilkumar - Muthukumarasamy, Chandramouleeswaran. “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”. Turkish Journal of Engineering 10/2 (May 1, 2026): 378-395. https://doi.org/10.31127/tuje.1822987.
JAMA 1.S.p. S, Muthukumarasamy C. Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. TUJE. 2026;10:378–395.
MLA S.p., Senthilkumar, and Chandramouleeswaran Muthukumarasamy. “Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach”. Turkish Journal of Engineering, vol. 10, no. 2, May 2026, pp. 378-95, doi:10.31127/tuje.1822987.
Vancouver 1.Senthilkumar S.p., Chandramouleeswaran Muthukumarasamy. Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach. TUJE. 2026 May 1;10(2):378-95. doi:10.31127/tuje.1822987
Flag Counter