TY - JOUR T1 - Yüz Görüntülerinden Otizm Tespiti İçin Transformer Tabanlı Derin Öğrenme TT - Transformer-Based Deep Learning for Autism Detection from Facial Images AU - Keskin, Fesih AU - Cengiz, Faruk PY - 2025 DA - September Y2 - 2025 DO - 10.21597/jist.1640353 JF - Journal of the Institute of Science and Technology JO - J. Inst. Sci. and Tech. PB - Iğdır Üniversitesi WT - DergiPark SN - 2536-4618 SP - 755 EP - 764 VL - 15 IS - 3 LA - tr AB - Bu çalışma, yüz görüntülerinden Otizm Spektrum Bozukluğu (OSB) tespiti amacıyla dört farklı Transformer tabanlı derin öğrenme mimarisinin (Vision Transformer (ViT), Swin Transformer (Swin-T), Data-efficient Image Transformer (DeiT) ve Convolutional Transformer (CoaT)) karşılaştırmalı analizini sunmaktadır. Son yıllarda, OSB tespitine yönelik araştırmalarda geleneksel evrişimsel sinir ağları tabanlı yaklaşımların yerini giderek Transformer mimarileri almaya başlamıştır. Bu kapsamda gerçekleştirilen deneyler, Swin-T modelinin %87,76 doğruluk ve 0,96 AUC ile en yüksek sınıflandırma performansına ulaştığını göstermektedir. CoaT modeli %86,01 doğruluk ve 0,94 AUC ile ikinci sırada yer alırken, DeiT (%84,27 doğruluk) ve ViT (%82,52 doğruluk) nispeten daha düşük başarı sergilemiştir. Karışıklık matrisi ve ROC eğrileri analizleri, Swin-T modelinin yanlış pozitif ve yanlış negatif oranlarını önemli ölçüde azalttığını ortaya koymaktadır. Elde edilen bulgular, özellikle Swin-T ve CoaT modellerinin görsel veri işleme konusundaki etkinliğini vurgulamakta ve bu mimarilerin daha büyük veri kümeleri ile desteklendiğinde erken OSB tanısı sürecine klinik ve araştırma alanlarında değerli katkılar sağlayabileceğini öne sürmektedir. KW - Otizm spektrum bozukluğu KW - Vision transformers KW - Swin transformer KW - Deit KW - Coat N2 - This study presents a comparative analysis of four different Transformer-based deep learning architectures (Vision Transformer (ViT), Swin Transformer (Swin-T), Data-efficient Image Transformer (DeiT), and Convolutional Transformer (CoaT)) for Autism Spectrum Disorder (ASD) detection using facial images. In recent years, Transformer architectures have increasingly replaced traditional convolutional neural network based approaches in ASD detection research. In this context, experimental results demonstrate that the Swin-T model achieved the highest classification performance with 87.76% accuracy and an AUC of 0.96. The CoaT model followed closely with 86.01% accuracy and an AUC of 0.94, while DeiT (84.27% accuracy) and ViT (82.52% accuracy) exhibited relatively lower performance. Confusion matrix and ROC curve analyses confirm that the Swin-T model significantly reduced both false positive and false negative rates. These findings highlight the effectiveness of Swin-T and CoaT models in visual data processing and suggest that, when supported by larger datasets, these architectures could provide valuable contributions to early ASD diagnosis in both clinical and research domains. CR - Ahmed, Z. A., Aldhyani, T. H., Jadhav, M. E., Alzahrani, M. Y., Alzahrani, M. E., Althobaiti, M. M., . . . Al-madani, A. M. (2022, April). Facial Features Detection System To Identify Children With Autism Spectrum Disorder: Deep Learning Models. (D. Koundal, Dü.) Computational and Mathematical Methods in Medicine, 2022, 1–9. doi:10.1155/2022/3941049 CR - Alam, M. S., Rashid, M. M., Roy, R., Faizabadi, A. R., Gupta, K. D., & Ahsan, M. M. (2022, November). Empirical Study of Autism Spectrum Disorder Diagnosis Using Facial Images by Improved Transfer Learning Approach. Bioengineering, 9, 710. doi:10.3390/bioengineering9110710 CR - Alkahtani, H., Aldhyani, T. H., & Alzahrani, M. Y. (2023, April). Deep Learning Algorithms to Identify Autism Spectrum Disorder in Children-Based Facial Landmarks. Applied Sciences, 13, 4855. doi:10.3390/app13084855 CR - Angkustsiri, K., Krakowiak, P., Moghaddam, B., Wardinsky, T., Gardner, J., Kalamkarian, N., . . . Hansen, R. L. (2011, May). Minor physical anomalies in children with autism spectrum disorders. Autism, 15, 746–760. doi:10.1177/1362361310397620 CR - Awaji, B., Senan, E. M., Olayah, F., Alshari, E. A., Alsulami, M., Abosaq, H. A., . . . Janrao, P. (2023, September). Hybrid Techniques of Facial Feature Image Analysis for Early Detection of Autism Spectrum Disorder Based on Combined CNN Features. Diagnostics, 13, 2948. doi:10.3390/diagnostics13182948 CR - Baio, J., Wiggins, L., Christensen, D. L., Maenner, M. J., Daniels, J., Warren, Z., . . . Dowling, N. F. (2018, April). Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years — Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. MMWR. Surveillance Summaries, 67, 1–23. doi:10.15585/mmwr.ss6706a1 CR - Bazi, Y., Bashmal, L., Rahhal, M. M., Dayil, R. A., & Ajlan, N. A. (2021, February). Vision Transformers for Remote Sensing Image Classification. Remote Sensing, 13, 516. doi:10.3390/rs13030516 CR - Beary, M., Hadsell, A., Messersmith, R., & Hosseini, M.-P. (2020). Diagnosis of Autism in Children using Facial Analysis and Deep Learning. Diagnosis of Autism in Children using Facial Analysis and Deep Learning. arXiv. doi:10.48550/ARXIV.2008.02890 CR - Das, S., Chaudhury, P., & Tripathy, H. K. (2024, January). Classification of Autism Spectrum Disorder using CNN & Transfer Learning. 2024 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC) (s. 1–7). IEEE. doi:10.1109/assic60049.2024.10507982 CR - Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009, June). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. doi:10.1109/cvpr.2009.5206848 CR - Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., . . . Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929. https://api.semanticscholar.org/CorpusID:225039882 adresinden alındı CR - Hosseini, M.-P., Beary, M., Hadsell, A., Messersmith, R., & Soltanian-Zadeh, H. (2022, January). RETRACTED: Deep Learning for Autism Diagnosis and Facial Analysis in Children. Frontiers in Computational Neuroscience, 15. doi:10.3389/fncom.2021.789998 CR - Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., . . . Guo, B. (2021, October). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (s. 9992–10002). IEEE. doi:10.1109/iccv48922.2021.00986 CR - Manfredonia, J., Bangerter, A., Manyakov, N. V., Ness, S., Lewin, D., Skalkin, A., . . . Pandina, G. (2018, October). Automatic Recognition of Posed Facial Expression of Emotion in Individuals with Autism Spectrum Disorder. Journal of Autism and Developmental Disorders, 49, 279–293. doi:10.1007/s10803-018-3757-9 CR - Mujeeb Rahman, K. K., & Subashini, M. M. (2022, January). Identification of Autism in Children Using Static Facial Features and Deep Neural Networks. Brain Sciences, 12, 94. doi:10.3390/brainsci12010094 CR - P, V., & V, U. M. (2024, January). Identification of Autism Spectrum Disorder in Children from Facial Features Using Deep Learning. 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) (s. 1–6). IEEE. doi:10.1109/icaect60202.2024.10469379 CR - Parvej, B., Mahbub Alam, S. M., Fahim, F. I., Pathan, M. N., & Rahaman, M. A. (2024, September). Computer Vision-based Interactive Autism Detection System using Deep Learning. 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS) (s. 1–6). IEEE. doi:10.1109/compas60761.2024.10796046 CR - Pelphrey, K. A., Sasson, N. J., Reznick, J. S., Paul, G., Goldman, B. D., & Piven, J. (2002). Journal of Autism and Developmental Disorders, 32, 249–261. doi:10.1023/a:1016374617369 CR - Piosenka, G. (2021). Detect Autism from a Facial Image. Detect Autism from a Facial Image. https://www.kaggle.com/cihan063/autism-image-data adresinden alındı CR - Rashid, A., & Shaker, S. (2023, March). Autism spectrum Disorder detection Using Face Features based on Deep Neural network. Wasit Journal of Computer and Mathematics Science, 2, 74–83. doi:10.31185/wjcm.100 CR - Rezaee, K., Attar, H., & Khosravi, M. (2023, December). A review of machine learning-based methods for automatically detecting autism spectrum disorder in children’s faces. 2023 2nd International Engineering Conference on Electrical, Energy, and Artificial Intelligence (EICEEAI) (s. 1–5). IEEE. doi:10.1109/eiceeai60672.2023.10590257 CR - Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jegou, H. (2021). Training data-efficient image transformers & distillation through attention. M. Meila, & T. Zhang (Dü.), Proceedings of the 38th International Conference on Machine Learning. içinde 139, s. 10347–10357. PMLR. https://proceedings.mlr.press/v139/touvron21a.html adresinden alındı CR - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention Is All You Need. Attention Is All You Need. arXiv. doi:10.48550/ARXIV.1706.03762 CR - Xu, W., Xu, Y., Chang, T., & Tu, Z. (2021, October). Co-Scale Conv-Attentional Image Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. doi:10.1109/iccv48922.2021.00983 UR - https://doi.org/10.21597/jist.1640353 L1 - https://dergipark.org.tr/tr/download/article-file/4611776 ER -