<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN"
        "https://jats.nlm.nih.gov/publishing/1.4/JATS-journalpublishing1-4.dtd">
<article  article-type="research-article"        dtd-version="1.4">
            <front>

                <journal-meta>
                                                                <journal-id>tuje</journal-id>
            <journal-title-group>
                                                                                    <journal-title>Turkish Journal of Engineering</journal-title>
            </journal-title-group>
                                        <issn pub-type="epub">2587-1366</issn>
                                                                                            <publisher>
                    <publisher-name>Murat YAKAR</publisher-name>
                </publisher>
                    </journal-meta>
                <article-meta>
                                        <article-id pub-id-type="doi">10.31127/tuje.1822987</article-id>
                                                                <article-categories>
                                            <subj-group  xml:lang="en">
                                                            <subject>Wireless Communication Systems and Technologies (Incl. Microwave and Millimetrewave)</subject>
                                                    </subj-group>
                                            <subj-group  xml:lang="tr">
                                                            <subject>Kablosuz Haberleşme Sistemleri ve Teknolojileri (Mikro Dalga ve Milimetrik Dalga dahil)</subject>
                                                    </subj-group>
                                    </article-categories>
                                                                                                                                                        <title-group>
                                                                                                                        <article-title>Comparative Analysis of Vision Transformers and U-Net for Medical Image Segmentation in Early Disease Detection: A Deep Learning Approach</article-title>
                                                                                                    </title-group>
            
                                                    <contrib-group content-type="authors">
                                                                        <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0003-4696-326X</contrib-id>
                                                                <name>
                                    <surname>S.p.</surname>
                                    <given-names>Senthilkumar</given-names>
                                </name>
                                                                    <aff>ANNAMALAI UNIVERSITY</aff>
                                                            </contrib>
                                                    <contrib contrib-type="author">
                                                                    <contrib-id contrib-id-type="orcid">
                                        https://orcid.org/0000-0001-9174-2111</contrib-id>
                                                                <name>
                                    <surname>Muthukumarasamy</surname>
                                    <given-names>Chandramouleeswaran</given-names>
                                </name>
                                                                    <aff>Annamalai University</aff>
                                                            </contrib>
                                                                                </contrib-group>
                        
                                        <pub-date pub-type="pub" iso-8601-date="20260501">
                    <day>05</day>
                    <month>01</month>
                    <year>2026</year>
                </pub-date>
                                        <volume>10</volume>
                                        <issue>2</issue>
                                        <fpage>378</fpage>
                                        <lpage>395</lpage>
                        
                        <history>
                                    <date date-type="received" iso-8601-date="20251113">
                        <day>11</day>
                        <month>13</month>
                        <year>2025</year>
                    </date>
                                                    <date date-type="accepted" iso-8601-date="20260104">
                        <day>01</day>
                        <month>04</month>
                        <year>2026</year>
                    </date>
                            </history>
                                        <permissions>
                    <copyright-statement>Copyright © 2017, Turkish Journal of Engineering</copyright-statement>
                    <copyright-year>2017</copyright-year>
                    <copyright-holder>Turkish Journal of Engineering</copyright-holder>
                </permissions>
            
                                                                                                <abstract><p>Medical image segmentation remains a critical challenge in computer-aided diagnosis systems, particularly for early disease detection where precise boundary delineation can significantly impact patient outcomes. This study presents a comprehensive comparative analysis between Vision Transformer (ViT) based architectures and the conventional U-Net model for multi-organ segmentation tasks using chest CT scans and retinal fundus images. We evaluated both architectures on three distinct datasets comprising 15,420 annotated medical images, focusing on lung nodule detection, liver lesion segmentation, and retinal vessel segmentation for diabetic retinopathy screening. Our experimental results demonstrate that while U-Net achieves superior performance on smaller datasets (Dice coefficient: 0.89 ± 0.03), Vision Transformers exhibit remarkable capabilities with larger training samples (Dice coefficient: 0.93 ± 0.02), showing 4.5% improvement in segmentation accuracy. The ViT-based approach demonstrated enhanced generalization capabilities across diverse imaging modalities, reducing false positive rates by 31% compared to U-Net in cross-dataset validation. Furthermore, computational efficiency analysis revealed that despite requiring 2.3× more training time, ViT models reduced inference time by 18% in clinical deployment scenarios. Performance evaluation across image quality levels showed ViT maintained more consistent performance across signal-to-noise ratios (Dice drop: 4.2% from high to low SNR) compared to U-Net (8.7% drop), demonstrating transformers&#039; robustness to image degradation in clinical settings where scan quality varies. These findings suggest that the choice between architectures should be guided by dataset size, computational resources, and specific clinical requirements, with hybrid approaches showing promising potential for future development.</p></abstract>
                                                            
            
                                                            <kwd-group>
                                                    <kwd>Medical Image Segmentation</kwd>
                                                    <kwd>  Vision Transformers</kwd>
                                                    <kwd>  U-Net</kwd>
                                                    <kwd>  Deep Learning</kwd>
                                                    <kwd>  Attention Mechanisms</kwd>
                                            </kwd-group>
                            
                                                                                                                        </article-meta>
    </front>
    <back>
                            <ref-list>
                                    <ref id="ref1">
                        <label>1</label>
                        <mixed-citation publication-type="journal">Armato III, S. G., McLennan, G., Bidaut, L., et al. (2011). The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915-931.</mixed-citation>
                    </ref>
                                    <ref id="ref2">
                        <label>2</label>
                        <mixed-citation publication-type="journal">Aydın, V. A. (2024). Comparison of CNN-based methods for yoga pose classification. Turkish Journal of Engineering, 8(1), 65-75. https://doi.org/10.31127/tuje.1348210</mixed-citation>
                    </ref>
                                    <ref id="ref3">
                        <label>3</label>
                        <mixed-citation publication-type="journal">Azad, R., Aghdam, E. K., Rauland, A., et al. (2022). Medical image segmentation review: The success of U-Net. arXiv preprint arXiv:2211.14830.</mixed-citation>
                    </ref>
                                    <ref id="ref4">
                        <label>4</label>
                        <mixed-citation publication-type="journal">Bai, W., Sinclair, M., Tarroni, G., et al. (2018). Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance, 20(1), 65.</mixed-citation>
                    </ref>
                                    <ref id="ref5">
                        <label>5</label>
                        <mixed-citation publication-type="journal">Bilic, P., Christ, P. F., Vorontsov, E., et al. (2019). The Liver Tumor Segmentation Benchmark (LiTS). arXiv preprint arXiv:1901.04056.</mixed-citation>
                    </ref>
                                    <ref id="ref6">
                        <label>6</label>
                        <mixed-citation publication-type="journal">Cao, H., Wang, Y., Chen, J., et al. (2022). Swin-Unet: Unet-like pure transformer for medical image segmentation. European Conference on Computer Vision, 205-218.</mixed-citation>
                    </ref>
                                    <ref id="ref7">
                        <label>7</label>
                        <mixed-citation publication-type="journal">Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., &amp; Zhou, Y. (2021). TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306</mixed-citation>
                    </ref>
                                    <ref id="ref8">
                        <label>8</label>
                        <mixed-citation publication-type="journal">Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., &amp; Ronneberger, O. (2016). 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Medical Image Computing and Computer-Assisted Intervention, 424-432.</mixed-citation>
                    </ref>
                                    <ref id="ref9">
                        <label>9</label>
                        <mixed-citation publication-type="journal">Dirik, M. (2023). Machine learning-based lung cancer diagnosis. Turkish Journal of Engineering, 7(4), 322-330. https://doi.org/10.31127/tuje.1180931</mixed-citation>
                    </ref>
                                    <ref id="ref10">
                        <label>10</label>
                        <mixed-citation publication-type="journal">Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy</mixed-citation>
                    </ref>
                                    <ref id="ref11">
                        <label>11</label>
                        <mixed-citation publication-type="journal">Esteva, A., Kuprel, B., Novoa, R. A., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.</mixed-citation>
                    </ref>
                                    <ref id="ref12">
                        <label>12</label>
                        <mixed-citation publication-type="journal">Ghesu, F. C., Georgescu, B., Mansoor, A., et al. (2019). Quantifying and leveraging classification uncertainty for chest radiograph assessment. Medical Image Computing and Computer-Assisted Intervention, 676-684.</mixed-citation>
                    </ref>
                                    <ref id="ref13">
                        <label>13</label>
                        <mixed-citation publication-type="journal">Gülgün, O. D., &amp; Erol, H. (2020). Classification performance comparisons of deep learning models in pneumonia diagnosis using chest X-ray images. Turkish Journal of Engineering, 4(3), 129-141. https://doi.org/10.31127/tuje.652358</mixed-citation>
                    </ref>
                                    <ref id="ref14">
                        <label>14</label>
                        <mixed-citation publication-type="journal">Gulshan, V., Peng, L., Coram, M., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 316(22), 2402-2410.</mixed-citation>
                    </ref>
                                    <ref id="ref15">
                        <label>15</label>
                        <mixed-citation publication-type="journal">Hatamizadeh, A., Tang, Y., Nath, V., et al. (2022). UNETR: Transformers for 3D medical image segmentation. IEEE Winter Conference on Applications of Computer Vision, 574-584.</mixed-citation>
                    </ref>
                                    <ref id="ref16">
                        <label>16</label>
                        <mixed-citation publication-type="journal">He, K., Zhang, X., Ren, S., &amp; Sun, J. (2016). Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 770-778.</mixed-citation>
                    </ref>
                                    <ref id="ref17">
                        <label>17</label>
                        <mixed-citation publication-type="journal">Huang, H., Lin, L., Tong, R., et al. (2020). UNet 3+: A full-scale connected UNet for medical image segmentation. IEEE International Conference on Acoustics, Speech and Signal Processing, 1055-1059.</mixed-citation>
                    </ref>
                                    <ref id="ref18">
                        <label>18</label>
                        <mixed-citation publication-type="journal">Huang, X., Deng, Z., Li, D., &amp; Yuan, X. (2021). MISSFormer: An effective medical image segmentation Transformer. arXiv preprint arXiv:2109.07162.</mixed-citation>
                    </ref>
                                    <ref id="ref19">
                        <label>19</label>
                        <mixed-citation publication-type="journal">Hyder, U., &amp; Talpur, M. R. H. (2024). Detection of cotton leaf disease with machine learning model. Turkish Journal of Engineering, 8(2), 380-393. https://doi.org/10.31127/tuje.1406755</mixed-citation>
                    </ref>
                                    <ref id="ref20">
                        <label>20</label>
                        <mixed-citation publication-type="journal">Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., &amp; Maier-Hein, K. H. (2021). nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203-211.</mixed-citation>
                    </ref>
                                    <ref id="ref21">
                        <label>21</label>
                        <mixed-citation publication-type="journal">Jaeger, P. F., Kohl, S. A., Bickelhaupt, S., et al. (2020). Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection. Machine Learning for Health Workshop, 171-183.</mixed-citation>
                    </ref>
                                    <ref id="ref22">
                        <label>22</label>
                        <mixed-citation publication-type="journal">Juraev, D. A., Elsayed, E. E., Bulnes, J. J. D., Agarwal, P., &amp; Saeed, R. K. (2023). History of ill-posed problems and their application to solve various mathematical problems. Engineering Applications, 2(3), 279-290. https://publish.mersin.edu.tr/index.php/enap/article/view/1178</mixed-citation>
                    </ref>
                                    <ref id="ref23">
                        <label>23</label>
                        <mixed-citation publication-type="journal">Karimi, D., Dou, H., Warfield, S. K., &amp; Gholipour, A. (2020). Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65, 101759.</mixed-citation>
                    </ref>
                                    <ref id="ref24">
                        <label>24</label>
                        <mixed-citation publication-type="journal">Kesikoğlu, H. M., Çiçekli, Y. S., &amp; Kaynak, T. (2020). The identification of seasonal coastline changes from Landsat 8 satellite data using artificial neural networks and k-nearest neighbor. Turkish Journal of Engineering, 4(1), 47-56.</mixed-citation>
                    </ref>
                                    <ref id="ref25">
                        <label>25</label>
                        <mixed-citation publication-type="journal">Litjens, G., Kooi, T., Bejnordi, B. E., et al. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60-88.</mixed-citation>
                    </ref>
                                    <ref id="ref26">
                        <label>26</label>
                        <mixed-citation publication-type="journal">Liu, Z., Lin, Y., Cao, Y., et al. (2021). Swin Transformer: Hierarchical vision transformer using shifted windows. International Conference on Computer Vision, 10012-10022.</mixed-citation>
                    </ref>
                                    <ref id="ref27">
                        <label>27</label>
                        <mixed-citation publication-type="journal">McKinney, S. M., Sieniek, M., Godbole, V., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89-94.</mixed-citation>
                    </ref>
                                    <ref id="ref28">
                        <label>28</label>
                        <mixed-citation publication-type="journal">Mema, B., &amp; Basholli, F. (2023). Internet of Things in the development of future businesses in Albania. Advanced Engineering Science, 3, 196-205. https://publish.mersin.edu.tr/index.php/ades/article/view/1325</mixed-citation>
                    </ref>
                                    <ref id="ref29">
                        <label>29</label>
                        <mixed-citation publication-type="journal">Milletari, F., Navab, N., &amp; Ahmadi, S. A. (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. International Conference on 3D Vision, 565-571.</mixed-citation>
                    </ref>
                                    <ref id="ref30">
                        <label>30</label>
                        <mixed-citation publication-type="journal">Mogaraju, J. K. (2024). Machine learning empowered prediction of geolocation using groundwater quality variables over YSR district of India. Turkish Journal of Engineering, 8(1), 31-45. https://doi.org/10.31127/tuje.1255863</mixed-citation>
                    </ref>
                                    <ref id="ref31">
                        <label>31</label>
                        <mixed-citation publication-type="journal">Oktay, O., Schlemper, J., Folgoc, L. L., et al. (2018). Attention U-Net: Learning where to look for the pancreas. Medical Imaging with Deep Learning.</mixed-citation>
                    </ref>
                                    <ref id="ref32">
                        <label>32</label>
                        <mixed-citation publication-type="journal">Othman, M. M. (2023). Modeling of daily groundwater level using deep learning neural networks. Turkish Journal of Engineering, 7(4), 331-337. https://doi.org/10.31127/tuje.1169908
Polater, S. N., &amp; Sevli, O. (2024). Deep learning based classification for Alzheimer’s disease detection using MRI images. Turkish Journal of Engineering, 8(4), 729-740. https://doi.org/10.31127/tuje.1434866</mixed-citation>
                    </ref>
                                    <ref id="ref33">
                        <label>33</label>
                        <mixed-citation publication-type="journal">Rajpurkar, P., Lungren, M. P., et al. (2020). AppendiXNet: Deep learning for diagnosis of appendicitis from a small dataset of CT exams using video pretraining. Scientific Reports, 10(1), 3958.</mixed-citation>
                    </ref>
                                    <ref id="ref34">
                        <label>34</label>
                        <mixed-citation publication-type="journal">Ronneberger, O., Fischer, P., &amp; Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (pp. 234-241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28</mixed-citation>
                    </ref>
                                    <ref id="ref35">
                        <label>35</label>
                        <mixed-citation publication-type="journal">Shamshad, F., Khan, S., Zamir, S. W., et al. (2023). Transformers in medical imaging: A survey. Medical Image Analysis, 88, 102802.</mixed-citation>
                    </ref>
                                    <ref id="ref36">
                        <label>36</label>
                        <mixed-citation publication-type="journal">Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., &amp; Van Ginneken, B. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging, 23(4), 501-509.</mixed-citation>
                    </ref>
                                    <ref id="ref37">
                        <label>37</label>
                        <mixed-citation publication-type="journal">Tan, M., &amp; Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 6105-6114.</mixed-citation>
                    </ref>
                                    <ref id="ref38">
                        <label>38</label>
                        <mixed-citation publication-type="journal">Touvron, H., Cord, M., Douze, M., et al. (2021). Training data-efficient image transformers &amp; distillation through attention. International Conference on Machine Learning, 10347-10357.</mixed-citation>
                    </ref>
                                    <ref id="ref39">
                        <label>39</label>
                        <mixed-citation publication-type="journal">Valanarasu, J. M. J., Oza, P., Hacihaliloglu, I., &amp; Patel, V. M. (2021). Medical Transformer: Gated axial-attention for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 36-46.</mixed-citation>
                    </ref>
                                    <ref id="ref40">
                        <label>40</label>
                        <mixed-citation publication-type="journal">Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., &amp; Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 5998-6008). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html</mixed-citation>
                    </ref>
                                    <ref id="ref41">
                        <label>41</label>
                        <mixed-citation publication-type="journal">Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., &amp; Li, J. (2021). TransBTS: Multimodal brain tumor segmentation using Transformer. Medical Image Computing and Computer-Assisted Intervention, 109-119.</mixed-citation>
                    </ref>
                                    <ref id="ref42">
                        <label>42</label>
                        <mixed-citation publication-type="journal">Xie, Y., Zhang, J., Shen, C., &amp; Xia, Y. (2021). CoTr: Efficiently bridging CNN and Transformer for 3D medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 171-180.
Zhang, Y., Liu, H., &amp; Hu, Q. (2021). TransFuse: Fusing transformers and CNNs for medical image segmentation. Medical Image Computing and Computer-Assisted Intervention, 14-24.</mixed-citation>
                    </ref>
                                    <ref id="ref43">
                        <label>43</label>
                        <mixed-citation publication-type="journal">Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., &amp; Liang, J. (2018). UNet++: A nested U-Net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 3-11.</mixed-citation>
                    </ref>
                            </ref-list>
                    </back>
    </article>
