Research Article

A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING

Volume: 11 Number: 2 December 30, 2025

A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING

Abstract

The accurate assessment of tympanic membrane (TM) conditions—such as effusion, normal membrane appearance, or the presence of ventilation tubes—is essential for timely diagnosis and treatment of otitis media (OM), one of the leading causes of preventable hearing loss in children. However, visual otoscopic examination remains highly subjective, and diagnostic accuracy varies widely across clinicians due to inconsistent image quality, limited training, and subtle variations in membrane morphology. Recent deep learning–based approaches have shown strong performance, yet most rely on large-scale pretraining or heavyweight convolutional networks that are impractical for point-of-care deployment. Vision Transformers (ViTs) have emerged as powerful feature extractors, but their reliance on fixed patch tokenization and a global class token results in large parameter counts and suboptimal performance on small medical datasets. In this study, we propose a lightweight Compact Convolutional Transformer (CCT) for TM image classification trained from scratch on a clinical otoscopy dataset. Unlike standard ViT architectures, CCT integrates convolutional tokenization to extract local visual patterns before self-attention, and sequence pooling replaces the class token, reducing model complexity while preserving global reasoning. We conduct a structured ablation study varying both convolutional kernel size (3×3, 5×5, 7×7) and transformer encoder depth (3, 5, 7 layers), resulting in nine model configurations. Across these experiments, the optimal configuration (7×7 kernel, 3 transformer layers) achieved 91.21% accuracy and 90.65% macro F1 on the test set with only 3.26M parameters, outperforming deeper models and demonstrating superior efficiency–performance trade-offs. Results show that wider convolutional tokenizers effectively capture broader visual patterns of the TM, while excessive transformer depth may introduce overfitting on small datasets. These findings indicate that compact transformer architectures can deliver high diagnostic performance without transfer learning or large data requirements, supporting their potential for real-time clinical decision support and integration into low-resource or mobile otoscopy systems.

Keywords

Ethical Statement

This study did not involve human participants or the collection of new data. All analyses were performed on publicly available, de-identified datasets, used in accordance with their licenses and terms of use. As this constitutes secondary analysis of open data with no intervention or interaction with individuals, institutional ethics approval and informed consent were not required. The work adheres to applicable ethical standards for research using publicly accessible datasets.

Thanks

The author would like to thank the creators of the publicly available tympanic membrane dataset on Zenodo (https://zenodo.org/records/3595567) for making their data accessible to the research community.

References

  1. S. Chadha, K. Kamenov, and A. Cieza, “The world report on hearing, 2021,” Bull. World Health Organ., vol. 99, no. 4, pp. 242-242A, Apr. 2021, doi: 10.2471/BLT.21.285643.
  2. D.-H. Lee and S.-W. Yeo, “Clinical Diagnostic Accuracy of Otitis Media with Effusion in Children, and Significance of Myringotomy: Diagnostic or Therapeutic?,” J. Korean Med. Sci., vol. 19, no. 5, p. 739, 2004, doi: 10.3346/jkms.2004.19.5.739.
  3. S. Camalan et al., “OtoPair: Combining Right and Left Eardrum Otoscopy Images to Improve the Accuracy of Automated Image Analysis,” Appl. Sci., vol. 11, no. 4, p. 1831, Feb. 2021, doi: 10.3390/app11041831.
  4. S. Camalan et al., “OtoMatch: Content-based eardrum image retrieval using deep learning,” PLoS One, vol. 15, no. 5, p. e0232776, May 2020, doi: 10.1371/journal.pone.0232776.
  5. A. Dosovitskiy, L. Beyer, A. Kolesnikov, and D. Weissenborn, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in ICLR 2021, 2021. [Online]. Available: https://arxiv.org/abs/2010.11929
  6. A. Hassani, S. Walton, N. Shah, A. Abuduweili, J. Li, and H. Shi, “Escaping the Big Data Paradigm with Compact Transformers,” Jun. 2022, [Online]. Available: http://arxiv.org/abs/2104.05704
  7. S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010, doi: 10.1109/TKDE.2009.191.
  8. Z. Wu et al., “Deep Learning for Classification of Pediatric Otitis Media,” Laryngoscope, vol. 131, no. 7, Jul. 2021, doi: 10.1002/lary.29302.

Details

Primary Language

English

Subjects

Biomedical Diagnosis

Journal Section

Research Article

Publication Date

December 30, 2025

Submission Date

November 11, 2025

Acceptance Date

December 22, 2025

Published in Issue

Year 2025 Volume: 11 Number: 2

APA
Şeker, M. (2025). A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING. Middle East Journal of Science, 11(2), 321-333. https://doi.org/10.51477/mejs.1821720
AMA
1.Şeker M. A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING. MEJS. 2025;11(2):321-333. doi:10.51477/mejs.1821720
Chicago
Şeker, Mesut. 2025. “A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING”. Middle East Journal of Science 11 (2): 321-33. https://doi.org/10.51477/mejs.1821720.
EndNote
Şeker M (December 1, 2025) A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING. Middle East Journal of Science 11 2 321–333.
IEEE
[1]M. Şeker, “A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING”, MEJS, vol. 11, no. 2, pp. 321–333, Dec. 2025, doi: 10.51477/mejs.1821720.
ISNAD
Şeker, Mesut. “A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING”. Middle East Journal of Science 11/2 (December 1, 2025): 321-333. https://doi.org/10.51477/mejs.1821720.
JAMA
1.Şeker M. A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING. MEJS. 2025;11:321–333.
MLA
Şeker, Mesut. “A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING”. Middle East Journal of Science, vol. 11, no. 2, Dec. 2025, pp. 321-33, doi:10.51477/mejs.1821720.
Vancouver
1.Mesut Şeker. A LIGHTWEIGHT COMPACT CONVOLUTIONAL TRANSFORMER FOR TYMPANIC MEMBRANE CLASSIFICATION IN OTOSCOPY IMAGING. MEJS. 2025 Dec. 1;11(2):321-33. doi:10.51477/mejs.1821720

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

TRDizinlogo_live-e1586763957746.png   ici2.png     scholar_logo_64dp.png    CenterLogo.png     crossref-logo-landscape-200.png  logo.png         logo1.jpg   DRJI_Logo.jpg  17826265674769  logo.png