Research Article

Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis

Volume: 16 Number: 1 June 30, 2026

Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis

Abstract

Otitis media (OM) is one of the most prevalent middle ear diseases globally. It represents a major clinical and public health burden. It imposes significant health and socio-economic challenges, particularly among pediatric populations. Conventional diagnostic tools, such as otoscopy and tympanometry, are constrained by inherent subjectivity and operator dependency. Furthermore, they suffer from low specificity. This underscores the critical need for reliable, scalable, and automated diagnostic solutions. Recent advances in deep learning have improved image-based diagnosis; however, traditional Convolutional Neural Networks (CNNs) remain constrained by their inability to capture long-range dependencies essential for distinguishing subtle tympanic membrane pathologies. To address these limitations, this study systematically benchmarks three transformer-based architectures: Vision Transformer (ViT), Data-Efficient Image Transformer (DeiT), and Pyramid Vision Transformer (PvT). The objective is to automate the classification of otoscopic images. Classification targets include Acute Otitis Media (AOM), Chronic Otitis Media (COM), and normal cases. A balanced dataset of 1,800 images was curated and augmented. This ensured a fair evaluation under standardized training conditions. The experimental results demonstrate that ViT performs well for AOM, achieving an accuracy and F1-score of 0.98. However, its performance declines for chronic and normal cases. In contrast, DeiT produces the most consistent results across all categories, achieving near-perfect accuracy of 1.00 for acute OM cases and 0.96 for chronic and normal cases. PvT also demonstrates strong performance, achieving an accuracy of 1.00 for OM and 0.99 for normal. These findings demonstrate the superior robustness and clinical potential of DeiT and PvT compared to ViT, suggesting their suitability for real-world applications. Beyond delivering a reproducible benchmarking framework, this work contributes toward bridging the gap between algorithmic innovation and clinical translation. This provides a way of achieving reliable, interpretable and scalable AI-assisted diagnosis of otitis media.

Keywords

Ethical Statement

The author declare that this study complies with research and publication ethics. This study does not require an ethics committee approval as it does not involve any human or animal subjects, surveys, or clinical data.

References

  1. I. A. Mohamed, Z. A. Mohamed, F. Ning, and W. Xin, “The Prevalence and Risk Factors Associated with Otitis Media in Children under Five Years of Age in Mogadishu, Somalia: A Hospital-Based Cross-Sectional Study,” International Journal of Otolaryngology and Head & Neck Surgery, vol. 12, no. 06, pp. 426–443, 2023, doi: 10.4236/ijohns.2023.126046.
  2. M. W. You et al., “The Roles of NOD-like Receptors in Innate Immunity in Otitis Media,” Feb. 01, 2022, MDPI. doi: 10.3390/ijms23042350.
  3. L. Cupples, T. Y. C. Ching, and S. Hou, “Speech, language, functional communication, psychosocial outcomes and QOL in school-age children with congenital unilateral hearing loss,” Front Pediatr, vol. 12, Mar. 2024, doi: 10.3389/fped.2024.1282952.
  4. F. Folino, M. Caruso, P. Bosi, M. Aldè, S. Torretta, and P. Marchisio, “Acute otitis media diagnosis in childhood: still a problem in 2023?,” Ital J Pediatr, vol. 50, no. 1, Dec. 2024, doi: 10.1186/s13052-024-01588-y.
  5. J. H. Stephens et al., “A multimodal machine learning algorithm improved diagnostic accuracy for otitis media in a school aged Aboriginal population,” J Biomed Inform, vol. 164, Apr. 2025, doi: 10.1016/j.jbi.2025.104801.
  6. R. G. Kashani et al., “Shortwave infrared otoscopy for diagnosis of middle ear effusions: a machine-learning-based approach,” Sci Rep, vol. 11, no. 1, Dec. 2021, doi: 10.1038/s41598-021-91736-9.
  7. C. B. Botelho, R. Tortoriello, S. N. Koch, and J. I. Fernandes, “Clinical aspects of middle ear tympanokeratoma in dogs diagnosed through advanced imaging, otoendoscopy and histopathological evaluation,” Vet Dermatol, vol. 36, no. 4, pp. 417–423, Aug. 2025, doi: 10.1111/vde.13321.
  8. D. Muhammad and M. Bendechache, “Unveiling the black box: A systematic review of Explainable Artificial Intelligence in medical image analysis,” Dec. 01, 2024, Elsevier B.V. doi: 10.1016/j.csbj.2024.08.005.

Details

Primary Language

English

Subjects

Image Processing

Journal Section

Research Article

Publication Date

June 30, 2026

Submission Date

March 31, 2026

Acceptance Date

June 3, 2026

Published in Issue

Year 2026 Volume: 16 Number: 1

APA
Örenç, S. (2026). Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis. Bitlis Eren University Journal of Science and Technology, 16(1), 53-73. https://doi.org/10.17678/beuscitech.1920012
AMA
1.Örenç S. Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis. Bitlis Eren University Journal of Science and Technology. 2026;16(1):53-73. doi:10.17678/beuscitech.1920012
Chicago
Örenç, Sedat. 2026. “Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis”. Bitlis Eren University Journal of Science and Technology 16 (1): 53-73. https://doi.org/10.17678/beuscitech.1920012.
EndNote
Örenç S (June 1, 2026) Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis. Bitlis Eren University Journal of Science and Technology 16 1 53–73.
IEEE
[1]S. Örenç, “Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis”, Bitlis Eren University Journal of Science and Technology, vol. 16, no. 1, pp. 53–73, June 2026, doi: 10.17678/beuscitech.1920012.
ISNAD
Örenç, Sedat. “Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis”. Bitlis Eren University Journal of Science and Technology 16/1 (June 1, 2026): 53-73. https://doi.org/10.17678/beuscitech.1920012.
JAMA
1.Örenç S. Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis. Bitlis Eren University Journal of Science and Technology. 2026;16:53–73.
MLA
Örenç, Sedat. “Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis”. Bitlis Eren University Journal of Science and Technology, vol. 16, no. 1, June 2026, pp. 53-73, doi:10.17678/beuscitech.1920012.
Vancouver
1.Sedat Örenç. Transformer-Based Otitis Media Classification: A Comparative Study of ViT, DeiT, and PVT Architectures in Otoscopic Image Analysis. Bitlis Eren University Journal of Science and Technology. 2026 Jun. 1;16(1):53-7. doi:10.17678/beuscitech.1920012