Research Article
BibTex RIS Cite

A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING

Year 2025, Volume: 11 Issue: 2, 321 - 333, 30.12.2025
https://doi.org/10.51477/mejs.1821720

Abstract

The accurate assessment of tympanic membrane (TM) conditions—such as effusion, normal membrane appearance, or the presence of ventilation tubes—is essential for timely diagnosis and treatment of otitis media (OM), one of the leading causes of preventable hearing loss in children. However, visual otoscopic examination remains highly subjective, and diagnostic accuracy varies widely across clinicians due to inconsistent image quality, limited training, and subtle variations in membrane morphology. Recent deep learning–based approaches have shown strong performance, yet most rely on large-scale pretraining or heavyweight convolutional networks that are impractical for point-of-care deployment. Vision Transformers (ViTs) have emerged as powerful feature extractors, but their reliance on fixed patch tokenization and a global class token results in large parameter counts and suboptimal performance on small medical datasets.
In this study, we propose a lightweight Compact Convolutional Transformer (CCT) for TM image classification trained from scratch on a clinical otoscopy dataset. Unlike standard ViT architectures, CCT integrates convolutional tokenization to extract local visual patterns before self-attention, and sequence pooling replaces the class token, reducing model complexity while preserving global reasoning. We conduct a structured ablation study varying both convolutional kernel size (3×3, 5×5, 7×7) and transformer encoder depth (3, 5, 7 layers), resulting in nine model configurations. Across these experiments, the optimal configuration (7×7 kernel, 3 transformer layers) achieved 91.21% accuracy and 90.65% macro F1 on the test set with only 3.26M parameters, outperforming deeper models and demonstrating superior efficiency–performance trade-offs. Results show that wider convolutional tokenizers effectively capture broader visual patterns of the TM, while excessive transformer depth may introduce overfitting on small datasets.
These findings indicate that compact transformer architectures can deliver high diagnostic performance without transfer learning or large data requirements, supporting their potential for real-time clinical decision support and integration into low-resource or mobile otoscopy systems.

Ethical Statement

This study did not involve human participants or the collection of new data. All analyses were performed on publicly available, de-identified datasets, used in accordance with their licenses and terms of use. As this constitutes secondary analysis of open data with no intervention or interaction with individuals, institutional ethics approval and informed consent were not required. The work adheres to applicable ethical standards for research using publicly accessible datasets.

Thanks

The author would like to thank the creators of the publicly available tympanic membrane dataset on Zenodo (https://zenodo.org/records/3595567) for making their data accessible to the research community.

References

  • S. Chadha, K. Kamenov, and A. Cieza, “The world report on hearing, 2021,” Bull. World Health Organ., vol. 99, no. 4, pp. 242-242A, Apr. 2021, doi: 10.2471/BLT.21.285643.
  • D.-H. Lee and S.-W. Yeo, “Clinical Diagnostic Accuracy of Otitis Media with Effusion in Children, and Significance of Myringotomy: Diagnostic or Therapeutic?,” J. Korean Med. Sci., vol. 19, no. 5, p. 739, 2004, doi: 10.3346/jkms.2004.19.5.739.
  • S. Camalan et al., “OtoPair: Combining Right and Left Eardrum Otoscopy Images to Improve the Accuracy of Automated Image Analysis,” Appl. Sci., vol. 11, no. 4, p. 1831, Feb. 2021, doi: 10.3390/app11041831.
  • S. Camalan et al., “OtoMatch: Content-based eardrum image retrieval using deep learning,” PLoS One, vol. 15, no. 5, p. e0232776, May 2020, doi: 10.1371/journal.pone.0232776.
  • A. Dosovitskiy, L. Beyer, A. Kolesnikov, and D. Weissenborn, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in ICLR 2021, 2021. [Online]. Available: https://arxiv.org/abs/2010.11929
  • A. Hassani, S. Walton, N. Shah, A. Abuduweili, J. Li, and H. Shi, “Escaping the Big Data Paradigm with Compact Transformers,” Jun. 2022, [Online]. Available: http://arxiv.org/abs/2104.05704
  • S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010, doi: 10.1109/TKDE.2009.191.
  • Z. Wu et al., “Deep Learning for Classification of Pediatric Otitis Media,” Laryngoscope, vol. 131, no. 7, Jul. 2021, doi: 10.1002/lary.29302.
  • A. I. Elabbas, K. K. Khan, and C. C. Hortinela, “Classification of Otitis Media Infections using Image Processing and Convolutional Neural Network,” in 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), IEEE, Nov. 2021, pp. 1–6. doi: 10.1109/HNICEM54116.2021.9732013.
  • Y. Choi, J. Chae, K. Park, J. Hur, J. Kweon, and J. H. Ahn, “Automated multi-class classification for prediction of tympanic membrane changes with deep learning models,” PLoS One, vol. 17, no. 10, p. e0275846, Oct. 2022, doi: 10.1371/journal.pone.0275846.
  • A.-R. Habib et al., “Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy,” Sci. Rep., vol. 13, no. 1, p. 5368, Apr. 2023, doi: 10.1038/s41598-023-31921-0.
  • A. Alhudhaif, Z. Cömert, and K. Polat, “Otitis media detection using tympanic membrane images with a novel multi-class machine learning algorithm,” PeerJ Comput. Sci., vol. 7, p. e405, Feb. 2021, doi: 10.7717/peerj-cs.405.
  • K. Akyol, E. Uçar, Ü. Atila, and M. Uçar, “An ensemble approach for classification of tympanic membrane conditions using soft voting classifier,” Multimed. Tools Appl., vol. 83, no. 32, pp. 77809–77830, Feb. 2024, doi: 10.1007/s11042-024-18631-z.
  • S. Kılıçarslan, A. Diker, C. Közkurt, E. Dönmez, F. B. Demir, and A. Elen, “Identification of multiclass tympanic membranes by using deep feature transfer learning and hyperparameter optimization,” Measurement, vol. 229, p. 114488, Apr. 2024, doi: 10.1016/j.measurement.2024.114488.
  • N. Shaikh et al., “Development and Validation of an Automated Classifier to Diagnose Acute Otitis Media in Children,” JAMA Pediatr., vol. 178, no. 4, p. 401, Apr. 2024, doi: 10.1001/jamapediatrics.2024.0011.
  • H. Binol, M. K. K. Niazi, C. Elmaraghy, A. C. Moberly, and M. N. Gurcan, “OtoXNet—automated identification of eardrum diseases from otoscope videos: a deep learning study for video-representing images,” Neural Comput. Appl., vol. 34, no. 14, pp. 12197–12210, Jul. 2022, doi: 10.1007/s00521-022-07107-6.
  • H. Lu, S. Camalan, C. Elmaraghy, A. C. Moberly, and M. N. Gurcan, “A video classification method for diagnosing ear diseases using otoscope imaging,” in Medical Imaging 2025: Computer-Aided Diagnosis, S. M. Astley and A. Wismüller, Eds., SPIE, Apr. 2025, p. 117. doi: 10.1117/12.3046822.
  • H. Wu et al., “CvT: Introducing Convolutions to Vision Transformers,” Mar. 2021, [Online]. Available: http://arxiv.org/abs/2103.15808
  • R. Müller, S. Kornblith, and G. Hinton, “When Does Label Smoothing Help?,” Jun. 2020, [Online]. Available: http://arxiv.org/abs/1906.02629
  • G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger, “Deep Networks with Stochastic Depth,” Jul. 2016, [Online]. Available: http://arxiv.org/abs/1603.09382
  • A. Vaswani, N. Shazeer, and N. Parmar, “Attention is All You Need,” in 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, 2015. [Online]. Available: https://arxiv.org/abs/1706.03762
  • M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 10691–10700, 2019.
  • T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” Jul. 2020, [Online]. Available: http://arxiv.org/abs/2002.05709
There are 23 citations in total.

Details

Primary Language English
Subjects Biomedical Diagnosis
Journal Section Research Article
Authors

Mesut Şeker 0000-0001-9245-6790

Submission Date November 11, 2025
Acceptance Date December 22, 2025
Publication Date December 30, 2025
Published in Issue Year 2025 Volume: 11 Issue: 2

Cite

IEEE M. Şeker, “A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING”, MEJS, vol. 11, no. 2, pp. 321–333, 2025, doi: 10.51477/mejs.1821720.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

TRDizinlogo_live-e1586763957746.png   ici2.png     scholar_logo_64dp.png    CenterLogo.png     crossref-logo-landscape-200.png  logo.png         logo1.jpg   DRJI_Logo.jpg  17826265674769  logo.png