Research Article

Report Generation from X-ray Images: An Evaluation with Transformer Architectures

Volume: 5 Number: 2 October 1, 2025

Report Generation from X-ray Images: An Evaluation with Transformer Architectures

Abstract

The automatic generation of medical reports from chest X-ray images has attached increasing attention due to its capability to enhance diagnostic accuracy and reduce work-load in clinical decision support. The latest advancements in medical report generation, particularly with encoder-decoder models, emphasize their ability to integrate visual information with textual reports. However, these models suffer from challenges such as generating generic statements, failing to capture detailed pathological findings, and inconsistency of reports. In this study, the effectiveness of Vision Transformer and Convolutional Vision Transformer encoders combined with GPT2-based (Generative Pre-trained Transformer) decoders are investigated for the task of chest X-ray report generation. Their ability to capture radiological findings and generate clinically meaningful reports is evaluated through comparative analyses conducted under diverse experimental configurations on IU X-RAY dataset.

Keywords

Supporting Institution

the Scientific Research Projects Coordination Unit of Izmir Katip Celebi University

Project Number

2025-TYL-FEBE-0006

References

  1. [1] K. Belikova, O. Y. Rogov, A. Rybakov, M. V. Maslov, and D. V. J. S. R. Dylov, "Deep negative volume segmentation," vol. 11, no. 1, p. 16292, 2021.
  2. [2] M. Gurgitano et al., "Interventional Radiology ex-machina: Impact of Artificial Intelligence on practice," vol. 126, no. 7, pp. 998-1006, 2021.
  3. [3] M. Sermesant, H. Delingette, H. Cochet, P. Jaïs, and N. J. N. R. C. Ayache, "Applications of artificial intelligence in cardiovascular imaging," vol. 18, no. 8, pp. 600-609, 2021.
  4. [4] Ö. Çayli, X. Liu, V. Kiliç, and W. Wang, "Knowledge distillation for efficient audio-visual video captioning," in 2023 31st European Signal Processing Conference (EUSIPCO), 2023, pp. 745-749: IEEE.
  5. [5] B. Fetiler, Ö. Çaylı, V. J. E. J. o. S. Kılıç, and Technology, "Leveraging Pre-trained 3D-CNNs for Video Captioning," no. 53, pp. 58-63, 2024.
  6. [6] B. Fetiler, Ö. Çaylı, Ö. T. Moral, V. Kılıç, and A. J. A. B. v. T. D. Onan, "Video captioning based on multi-layer gated recurrent unit for smartphones," no. 32, pp. 221-226, 2021.
  7. [7] V. J. S. U. J. o. C. Kılıç and I. Sciences, "Deep gated recurrent unit for smartphone-based image captioning," vol. 4, no. 2, pp. 181-191, 2021.
  8. [8] Ö. A. Koca, H. Ö. Kabak, and V. J. T. J. o. S. Kılıç, "Attention-based multilayer GRU decoder for on-site glucose prediction on smartphone," vol. 80, no. 17, pp. 25616-25639, 2024.

Details

Primary Language

English

Subjects

Image Processing, Deep Learning, Natural Language Processing

Journal Section

Research Article

Publication Date

October 1, 2025

Submission Date

August 4, 2025

Acceptance Date

September 24, 2025

Published in Issue

Year 2025 Volume: 5 Number: 2

APA
Fetiler, B., Koca, Ö. A., & Kılıç, V. (2025). Report Generation from X-ray Images: An Evaluation with Transformer Architectures. Artificial Intelligence Theory and Applications, 5(2), 1-10. https://izlik.org/JA85PZ28XG
AMA
1.Fetiler B, Koca ÖA, Kılıç V. Report Generation from X-ray Images: An Evaluation with Transformer Architectures. AITA. 2025;5(2):1-10. https://izlik.org/JA85PZ28XG
Chicago
Fetiler, Bengü, Ömer Atılım Koca, and Volkan Kılıç. 2025. “Report Generation from X-Ray Images: An Evaluation With Transformer Architectures”. Artificial Intelligence Theory and Applications 5 (2): 1-10. https://izlik.org/JA85PZ28XG.
EndNote
Fetiler B, Koca ÖA, Kılıç V (October 1, 2025) Report Generation from X-ray Images: An Evaluation with Transformer Architectures. Artificial Intelligence Theory and Applications 5 2 1–10.
IEEE
[1]B. Fetiler, Ö. A. Koca, and V. Kılıç, “Report Generation from X-ray Images: An Evaluation with Transformer Architectures”, AITA, vol. 5, no. 2, pp. 1–10, Oct. 2025, [Online]. Available: https://izlik.org/JA85PZ28XG
ISNAD
Fetiler, Bengü - Koca, Ömer Atılım - Kılıç, Volkan. “Report Generation from X-Ray Images: An Evaluation With Transformer Architectures”. Artificial Intelligence Theory and Applications 5/2 (October 1, 2025): 1-10. https://izlik.org/JA85PZ28XG.
JAMA
1.Fetiler B, Koca ÖA, Kılıç V. Report Generation from X-ray Images: An Evaluation with Transformer Architectures. AITA. 2025;5:1–10.
MLA
Fetiler, Bengü, et al. “Report Generation from X-Ray Images: An Evaluation With Transformer Architectures”. Artificial Intelligence Theory and Applications, vol. 5, no. 2, Oct. 2025, pp. 1-10, https://izlik.org/JA85PZ28XG.
Vancouver
1.Bengü Fetiler, Ömer Atılım Koca, Volkan Kılıç. Report Generation from X-ray Images: An Evaluation with Transformer Architectures. AITA [Internet]. 2025 Oct. 1;5(2):1-10. Available from: https://izlik.org/JA85PZ28XG