tuje

Turkish Journal of Engineering

2587-1366

Murat YAKAR

10.31127/tuje.1822987

Wireless Communication Systems and Technologies (Incl. Microwave and Millimetrewave)

Kablosuz Haberleşme Sistemleri ve Teknolojileri (Mikro Dalga ve Milimetrik Dalga dahil)

Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach

https://orcid.org/0000-0003-4696-326X

S.p.

Senthilkumar

ANNAMALAI UNIVERSITY

https://orcid.org/0000-0001-9174-2111

Muthukumarasamy

Chandramouleeswaran

Annamalai University

05 01 2026

10 2 378 395 11 13 2025 01 04 2026

2017

Turkish Journal of Engineering

Medical image segmentation remains a critical challenge in computer-aided diagnosis systems, particularly for early disease detection where precise boundary delineation can significantly impact patient outcomes. This study presents a comprehensive comparative analysis between Vision Transformer (ViT) based architectures and the conventional U-Net model for multi-organ segmentation tasks using chest CT scans and retinal fundus images. We evaluated both architectures on three distinct datasets comprising 15,420 annotated medical images, focusing on lung nodule detection, liver lesion segmentation, and retinal vessel segmentation for diabetic retinopathy screening. Our experimental results demonstrate that while U-Net achieves superior performance on smaller datasets (Dice coefficient: 0.89 ± 0.03), Vision Transformers exhibit remarkable capabilities with larger training samples (Dice coefficient: 0.93 ± 0.02), showing 4.5% improvement in segmentation accuracy. The ViT-based approach demonstrated enhanced generalization capabilities across diverse imaging modalities, reducing false positive rates by 31% compared to U-Net in cross-dataset validation. Furthermore, computational efficiency analysis revealed that despite requiring 2.3× more training time, ViT models reduced inference time by 18% in clinical deployment scenarios. Performance evaluation across image quality levels showed ViT maintained more consistent performance across signal-to-noise ratios (Dice drop: 4.2% from high to low SNR) compared to U-Net (8.7% drop), demonstrating transformers' robustness to image degradation in clinical settings where scan quality varies. These findings suggest that the choice between architectures should be guided by dataset size, computational resources, and specific clinical requirements, with hybrid approaches showing promising potential for future development.

Medical Image Segmentation Vision Transformers U-Net Deep Learning Attention Mechanisms

Armato III, S. G., McLennan, G., Bidaut, L., et al. (2011). The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915-931.

Aydın, V. A. (2024). Comparison of CNN-based methods for yoga pose classification. Turkish Journal of Engineering, 8(1), 65-75. https://doi.org/10.31127/tuje.1348210

Azad, R., Aghdam, E. K., Rauland, A., et al. (2022). Medical image segmentation review: The success of U-Net. arXiv preprint arXiv:2211.14830.

Bai, W., Sinclair, M., Tarroni, G., et al. (2018). Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance, 20(1), 65.

Bilic, P., Christ, P. F., Vorontsov, E., et al. (2019). The Liver Tumor Segmentation Benchmark (LiTS). arXiv preprint arXiv:1901.04056.

Cao, H., Wang, Y., Chen, J., et al. (2022). Swin-Unet: Unet-like pure transformer for medical image segmentation. European Conference on Computer Vision, 205-218.

Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., & Zhou, Y. (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306

Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016). 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Medical Image Computing and Computer-Assisted Intervention, 424-432.

Dirik, M. (2023). Machine learning-based lung cancer diagnosis. Turkish Journal of Engineering, 7(4), 322-330. https://doi.org/10.31127/tuje.1180931

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy

Esteva, A., Kuprel, B., Novoa, R. A., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.

Ghesu, F. C., Georgescu, B., Mansoor, A., et al. (2019). Quantifying and leveraging classification uncertainty for chest radiograph assessment. Medical Image Computing and Computer-Assisted Intervention, 676-684.

Gülgün, O. D., & Erol, H. (2020). Classification performance comparisons of deep learning models in pneumonia diagnosis using chest X-ray images. Turkish Journal of Engineering, 4(3), 129-141. https://doi.org/10.31127/tuje.652358

Gulshan, V., Peng, L., Coram, M., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 316(22), 2402-2410.

Hatamizadeh, A., Tang, Y., Nath, V., et al. (2022). UNETR: Transformers for 3D medical image segmentation. IEEE Winter Conference on Applications of Computer Vision, 574-584.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 770-778.

Huang, H., Lin, L., Tong, R., et al. (2020). UNet 3+: A full-scale connected UNet for medical image segmentation. IEEE International Conference on Acoustics, Speech and Signal Processing, 1055-1059.

Huang, X., Deng, Z., Li, D., & Yuan, X. (2021). MISSFormer: An effective medical image segmentation Transformer. arXiv preprint arXiv:2109.07162.

Hyder, U., & Talpur, M. R. H. (2024). Detection of cotton leaf disease with machine learning model. Turkish Journal of Engineering, 8(2), 380-393. https://doi.org/10.31127/tuje.1406755

Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203-211.

Jaeger, P. F., Kohl, S. A., Bickelhaupt, S., et al. (2020). Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection. Machine Learning for Health Workshop, 171-183.

Juraev, D. A., Elsayed, E. E., Bulnes, J. J. D., Agarwal, P., & Saeed, R. K. (2023). History of ill-posed problems and their application to solve various mathematical problems. Engineering Applications, 2(3), 279-290. https://publish.mersin.edu.tr/index.php/enap/article/view/1178

Karimi, D., Dou, H., Warfield, S. K., & Gholipour, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759.

Kesikoğlu, H. M., Çiçekli, Y. S., & Kaynak, T. (2020). The identification of seasonal coastline changes from Landsat 8 satellite data using artificial neural networks and k-nearest neighbor. Turkish Journal of Engineering, 4(1), 47-56.

Litjens, G., Kooi, T., Bejnordi, B. E., et al. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60-88.

Liu, Z., Lin, Y., Cao, Y., et al. (2021). Swin Transformer: Hierarchical vision transformer using shifted windows. International Conference on Computer Vision, 10012-10022.

McKinney, S. M., Sieniek, M., Godbole, V., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89-94.

Mema, B., & Basholli, F. (2023). Internet of Things in the development of future businesses in Albania. Advanced Engineering Science, 3, 196-205. https://publish.mersin.edu.tr/index.php/ades/article/view/1325

Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. International Conference on 3D Vision, 565-571.

Mogaraju, J. K. (2024). Machine learning empowered prediction of geolocation using groundwater quality variables over YSR district of India. Turkish Journal of Engineering, 8(1), 31-45. https://doi.org/10.31127/tuje.1255863

Oktay, O., Schlemper, J., Folgoc, L. L., et al. (2018). Attention U-Net: Learning where to look for the pancreas. Medical Imaging with Deep Learning.

Othman, M. M. (2023). Modeling of daily groundwater level using deep learning neural networks. Turkish Journal of Engineering, 7(4), 331-337. https://doi.org/10.31127/tuje.1169908 Polater, S. N., & Sevli, O. (2024). Deep learning based classification for Alzheimer’s disease detection using MRI images. Turkish Journal of Engineering, 8(4), 729-740. https://doi.org/10.31127/tuje.1434866

Rajpurkar, P., Lungren, M. P., et al. (2020). AppendiXNet: Deep learning for diagnosis of appendicitis from a small dataset of CT exams using video pretraining. Scientific Reports, 10(1), 3958.

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (pp. 234-241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28

Shamshad, F., Khan, S., Zamir, S. W., et al. (2023). Transformers in medical imaging: A survey. Medical Image Analysis, 88, 102802.

Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., & Van Ginneken, B. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging, 23(4), 501-509.

Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 6105-6114.

Touvron, H., Cord, M., Douze, M., et al. (2021). Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning, 10347-10357.

Valanarasu, J. M. J., Oza, P., Hacihaliloglu, I., & Patel, V. M. (2021). Medical Transformer: Gated axial-attention for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 36-46.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 5998-6008). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., & Li, J. (2021). TransBTS: Multimodal brain tumor segmentation using Transformer. Medical Image Computing and Computer-Assisted Intervention, 109-119.

Xie, Y., Zhang, J., Shen, C., & Xia, Y. (2021). CoTr: Efficiently bridging CNN and Transformer for 3D medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 171-180. Zhang, Y., Liu, H., & Hu, Q. (2021). TransFuse: Fusing transformers and CNNs for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 14-24.

Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., & Liang, J. (2018). UNet++: A nested U-Net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 3-11.