Generating informative chest X-ray captions with LSTM architecture

Ömer Faruk Güzel; Harun Tanrıverdi; Mehmet Gökhan Bakal

doi:10.61112/jiens.1529215

Research Article

LSTM mimarisiyle bilgilendirici göğüs röntgeni altyazıları oluşturma

Year 2025, Volume: 5 Issue: 2, 477 - 489, 31.07.2025

Ömer Faruk Güzel , Harun Tanrıverdi , Mehmet Gökhan Bakal

https://doi.org/10.61112/jiens.1529215

Abstract

Biyomedikal görüntüleme, tıp uzmanları için en etkili tıbbi tarama prosedürüdür. Özellikle röntgen görüntüleri, tıbbi teşhis amacıyla yoğun bir şekilde referans noktası olarak kullanılmaktadır. Ancak, röntgen görüntülerinden altta yatan hususları anlamak önemli radyolojik bilgi gerektirir. Bu çalışmada, bir kodlayıcı modülü olarak DenseNet121 sinir ağı mimarisini ve kelime gömme katmanları olarak metinsel veri (altyazılar) öğelerini kullanan bir derin öğrenme modeli, verilen X-ışını görüntülerinin ilgili başlık / altyazı bilgilerini tahmin etmek için eğitilmiştir. Oluşturulan model, özellikle nöral makine çevirisi görevleri için kullanılan tipik bir diziden diziye modeldir. Deneylerde, eğitim ve test aşamaları için Indiana Üniversitesi tarafından hazırlanan Open-i veri tabanı kullanılmıştır. Veri kümesi, bir alan uzmanı tarafından oluşturulan XML formatında saklanan 7.470 X-ray görüntüsü ve 3.955 hasta raporundan oluşmaktadır. Metinsel raporlar izlenimler, bulgular, karşılaştırmalar ve endikasyonlar dahil olmak üzere dört özel başlık içermektedir. Model geliştirme sırasında, izlenim başlıkları altındaki metinsel verilerden eğitim ve test adımlarında yararlanılmıştır. Modelin performansını ölçmek için İki Dilli Değerlendirme Alt Çalışma Puanı (BLUE) hesaplanmış ve birincil performans değerlendirme metriği olarak kullanılmıştır. BLUE puanlarına göre en iyi performans puanı, diğer n-gram setlerine kıyasla (burada n: 1, 2 ve 3) 0,38368 BLUE puanı ile dört kelime (dört gram) tahmin edildiğinde elde edilmiştir. Bu araştırma çalışması, otomatik teşhis amaçlı tıbbi görüntü veri kümelerinde metin oluşturma görevinde diziden diziye modellerin gücünü göstermektedir.

Keywords

biyomedikal görüntüleme , metin madenciliği , derin öğrenme , medikal informatik

References

Lu D, Weng Q (2007) A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing 28(5):823-870.
Bakal G, Talari P, Kakani EV, Kavuluru R (2018) Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. Journal of Biomedical Informatics 82:189-199.
Bernstam EV, Smith JW, Johnson TR (2010) What is biomedical informatics? Journal of Biomedical Informatics 43(1):104-110.
Kampouraki A, Vassis D, Belsis P, Skourlas C (2013) e-Doctor: A web based support vector machine for automatic medical diagnosis. Procedia - Social and Behavioral Sciences 73:467-474.
Ma F, Sun T, Liu L, Jing H (2020) Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous modified artificial neural network. Future Generation Computer Systems 111:17-26.
Shanthi T, Sabeenian RS, Anand R (2020) Automatic diagnosis of skin diseases using convolution neural network. Microprocessors and Microsystems 76, 103074.
Islam MM, Haque MR, Iqbal H, Hasan MM, Hasan M, Kabir MN (2020) Breast cancer prediction: A comparative study using machine learning techniques. SN Computer Science 1:1-14.
Xie S, Yu Z, Lv Z (2021) Multi-disease prediction based on deep learning: A survey. Computer Modeling in Engineering and Sciences 128(2): 489-522.
Thieme AH, Zheng Y, Machiraju G, Sadee C, Mittermaier M, Gertler M, et al (2023) A deep-learning algorithm to classify skin lesions from mpox virus infection. Nature Medicine 29(3):738-747.
Bakal G, Kilicoglu H, Kavuluru R (2019) Non-negative matrix factorization for drug repositioning: Experiments with the repoDB dataset. In AMIA Annual Symposium Proceedings. American Medical Informatics Association, pp 238.
Shaker B, Ahmad S, Lee J, Jung C, Na D (2021) In silico methods and tools for drug discovery. Computers in Biology and Medicine 137, 104851.
Akkaya A, Bakal G (2023) A computational drug repositioning effort using patients’ reviews dataset. In 2023 International Conference on Smart Applications, Communications and Networking (SmartNets). IEEE, pp 1–6.
Park JH, Cho YR (2024) Computational drug repositioning with attention walking. Scientific Reports 14(1):10072.
Yang SR, Schultheis AM, Yu H, Mandelker D, Ladanyi M, Büttner R (2022) Precision medicine in non-small cell lung cancer: Current applications and future directions. Seminars in Cancer Biology 84:184–198.
MacEachern SJ, Forkert ND (2021) Machine learning for precision medicine. Genome 64(4):416–425.
Johnson KB, Wei WQ, Weeraratne D, Frisse ME, Misulis K, Rhee K, et al (2021) Precision medicine, AI, and the future of personalized health care. Clinical and Translational Science 14(1):86–93.
Chan HP, Hadjiiski LM, Samala RK (2020) Computer-aided diagnosis in the era of deep learning. Medical Physics 47(5):e218–e227.
Guler Ayyildiz B, Karakis R, Terzioglu B, Ozdemir D (2024) Comparison of deep learning methods for the radiographic detection of patients with different periodontitis stages. Dentomaxillofacial Radiology 53(1):32–42.
Şahin E, Özdemir D, Temurtaş H (2024) Multi-objective optimization of ViT architecture for efficient brain tumor classification. Biomedical Signal Processing and Control 91:105938.
Özdemir D, Arslan NN (2022) Analysis of deep transfer learning methods for early diagnosis of the Covid-19 disease with Chest X-ray images. Düzce Üniversitesi Bilim ve Teknoloji Dergisi 10(2):628–640.
Arslan NN, Ozdemir D (2024) Analysis of CNN models in classifying Alzheimer's stages: Comparison and explainability examination of the proposed separable convolution-based neural network and transfer learning models. Signal, Image and Video Processing:1–15.
Pavlopoulos J, Kougia V, Androutsopoulos I (2019) A survey on biomedical image captioning. In Proceedings of the Second Workshop on Shortcomings in Vision and Language. pp 26–36.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4700–4708.
Kasban H, El-Bendary MAM, Salama DH (2015) A comparative study of medical imaging techniques. International Journal of Information Science and Intelligent System 4(2):37–58.
Sharma A, Raju D, Ranjan S (2017) Detection of pneumonia clouds in chest X-ray using image processing approach. In 2017 Nirma University International Conference on Engineering (NUiCONE). IEEE, pp 1–4.
Matsui T, Kamata T, Koseki S, Koyama K (2022) Development of automatic detection model for stem-end rots of ‘Hass’ avocado fruit using X-ray imaging and image processing. Postharvest Biology and Technology 192:111996.
Civit-Masot J, Luna-Perejón F, Domínguez Morales M, Civit A (2020) Deep learning system for COVID-19 diagnosis aid using X-ray pulmonary images. Applied Sciences 10(13):4640.
Tabik S, Gómez-Ríos A, Martín-Rodríguez JL, Sevillano-García I, Rey-Area M, Charte D, et al (2020) COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on chest X-ray images. IEEE Journal of Biomedical and Health Informatics 24(12):3595–3605.
Jain R, Gupta M, Taneja S, Hemanth DJ (2021) Deep learning based detection and analysis of COVID-19 on chest X-ray images. Applied Intelligence 51:1690–1700.
Mishra R, Daescu O (2017) Deep learning for skin lesion segmentation. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp 1189–1194.
Harangi B (2018) Skin lesion classification with ensembles of deep convolutional neural networks. Journal of Biomedical Informatics 86:25–32.
Ayesha H, Iqbal S, Tariq M, Abrar M, Sanaullah M, Abbas I, et al (2021) Automatic medical image interpretation: State of the art and future directions. Pattern Recognition 114:107856.
Yin C, Qian B, Wei J, Li X, Zhang X, Li Y, Zheng Q (2019) Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, pp 728–737.
National Institutes of Health. Open-i: Biomedical image search engine. Chest X-ray Collection. Retrieved 08.05.2025 from https://openi.nlm.nih.gov/gridquery?sub=x&it=xg&coll=cxr&m=1.
Erkantarci B, Bakal G (2024) An empirical study of sentiment analysis utilizing machine learning and deep learning algorithms. Journal of Computational Social Science 7(1):241–257.
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp 1532–1543.
Abad A, Ortega A, Teixeira A, Mateo CG, Hinarejos CDM, Perdigão F, et al (2016) Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23–25, 2016, Proceedings. Springer, Vol. 10077.
Kalajdziski S, Ackovska N (2018) ICT Innovations 2018. Engineering and Life Sciences: 10th International Conference, ICT Innovations 2018, Ohrid, Macedonia, September 17–19, 2018, Proceedings. Springer, Vol. 940.

Generating informative chest X-ray captions with LSTM architecture

Year 2025, Volume: 5 Issue: 2, 477 - 489, 31.07.2025

Ömer Faruk Güzel , Harun Tanrıverdi , Mehmet Gökhan Bakal

https://doi.org/10.61112/jiens.1529215

Abstract

Biomedical imaging is the most effective medical screening procedure for medical specialists. Specifically, X-ray images are intensively used as a reference point for medical diagnostic purposes. However, understanding the underlying matters from the X-ray images requires significant radiological knowledge. In this study, a deep learning model, which employs the DenseNet121 neural network architecture as an encoder module and textual data (captions) items as word embedding layers, is trained to predict the corresponding title/caption information of the given X-ray images. The generated model is a typical sequence-to-sequence model used particularly for neural machine translation tasks. In the experiments, the Open-i database curated by Indiana University is used for the training and testing phases. The dataset consists of 7,470 X-ray images and 3,955 patient reports stored in XML format, composed by a domain expert. The textual reports contain four specific captions, including impressions, findings, comparisons, and indications. During the model development, the textual data under the impression captions was exploited in the training and testing steps. To measure the model’s performance, the Bilingual Evaluation Understudy Score (BLUE) was calculated and utilized as the primary performance evaluation metric. Based on the BLUE scores, the best performance score was achieved when four words (four grams) were predicted with the BLUE score of 0.38368 compared to other n-gram sets (where n: 1, 2, and 3). This research effort demonstrates the power of sequence-to-sequence models on the text generation task in medical image datasets for automatic diagnosing purposes.

Keywords

biomedical imaging , text mining , deep learning , medical informatics

References

Lu D, Weng Q (2007) A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing 28(5):823-870.
Bakal G, Talari P, Kakani EV, Kavuluru R (2018) Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. Journal of Biomedical Informatics 82:189-199.
Bernstam EV, Smith JW, Johnson TR (2010) What is biomedical informatics? Journal of Biomedical Informatics 43(1):104-110.
Kampouraki A, Vassis D, Belsis P, Skourlas C (2013) e-Doctor: A web based support vector machine for automatic medical diagnosis. Procedia - Social and Behavioral Sciences 73:467-474.
Ma F, Sun T, Liu L, Jing H (2020) Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous modified artificial neural network. Future Generation Computer Systems 111:17-26.
Shanthi T, Sabeenian RS, Anand R (2020) Automatic diagnosis of skin diseases using convolution neural network. Microprocessors and Microsystems 76, 103074.
Islam MM, Haque MR, Iqbal H, Hasan MM, Hasan M, Kabir MN (2020) Breast cancer prediction: A comparative study using machine learning techniques. SN Computer Science 1:1-14.
Xie S, Yu Z, Lv Z (2021) Multi-disease prediction based on deep learning: A survey. Computer Modeling in Engineering and Sciences 128(2): 489-522.
Thieme AH, Zheng Y, Machiraju G, Sadee C, Mittermaier M, Gertler M, et al (2023) A deep-learning algorithm to classify skin lesions from mpox virus infection. Nature Medicine 29(3):738-747.
Bakal G, Kilicoglu H, Kavuluru R (2019) Non-negative matrix factorization for drug repositioning: Experiments with the repoDB dataset. In AMIA Annual Symposium Proceedings. American Medical Informatics Association, pp 238.
Shaker B, Ahmad S, Lee J, Jung C, Na D (2021) In silico methods and tools for drug discovery. Computers in Biology and Medicine 137, 104851.
Akkaya A, Bakal G (2023) A computational drug repositioning effort using patients’ reviews dataset. In 2023 International Conference on Smart Applications, Communications and Networking (SmartNets). IEEE, pp 1–6.
Park JH, Cho YR (2024) Computational drug repositioning with attention walking. Scientific Reports 14(1):10072.
Yang SR, Schultheis AM, Yu H, Mandelker D, Ladanyi M, Büttner R (2022) Precision medicine in non-small cell lung cancer: Current applications and future directions. Seminars in Cancer Biology 84:184–198.
MacEachern SJ, Forkert ND (2021) Machine learning for precision medicine. Genome 64(4):416–425.
Johnson KB, Wei WQ, Weeraratne D, Frisse ME, Misulis K, Rhee K, et al (2021) Precision medicine, AI, and the future of personalized health care. Clinical and Translational Science 14(1):86–93.
Chan HP, Hadjiiski LM, Samala RK (2020) Computer-aided diagnosis in the era of deep learning. Medical Physics 47(5):e218–e227.
Guler Ayyildiz B, Karakis R, Terzioglu B, Ozdemir D (2024) Comparison of deep learning methods for the radiographic detection of patients with different periodontitis stages. Dentomaxillofacial Radiology 53(1):32–42.
Şahin E, Özdemir D, Temurtaş H (2024) Multi-objective optimization of ViT architecture for efficient brain tumor classification. Biomedical Signal Processing and Control 91:105938.
Özdemir D, Arslan NN (2022) Analysis of deep transfer learning methods for early diagnosis of the Covid-19 disease with Chest X-ray images. Düzce Üniversitesi Bilim ve Teknoloji Dergisi 10(2):628–640.
Arslan NN, Ozdemir D (2024) Analysis of CNN models in classifying Alzheimer's stages: Comparison and explainability examination of the proposed separable convolution-based neural network and transfer learning models. Signal, Image and Video Processing:1–15.
Pavlopoulos J, Kougia V, Androutsopoulos I (2019) A survey on biomedical image captioning. In Proceedings of the Second Workshop on Shortcomings in Vision and Language. pp 26–36.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 4700–4708.
Kasban H, El-Bendary MAM, Salama DH (2015) A comparative study of medical imaging techniques. International Journal of Information Science and Intelligent System 4(2):37–58.
Sharma A, Raju D, Ranjan S (2017) Detection of pneumonia clouds in chest X-ray using image processing approach. In 2017 Nirma University International Conference on Engineering (NUiCONE). IEEE, pp 1–4.
Matsui T, Kamata T, Koseki S, Koyama K (2022) Development of automatic detection model for stem-end rots of ‘Hass’ avocado fruit using X-ray imaging and image processing. Postharvest Biology and Technology 192:111996.
Civit-Masot J, Luna-Perejón F, Domínguez Morales M, Civit A (2020) Deep learning system for COVID-19 diagnosis aid using X-ray pulmonary images. Applied Sciences 10(13):4640.
Tabik S, Gómez-Ríos A, Martín-Rodríguez JL, Sevillano-García I, Rey-Area M, Charte D, et al (2020) COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on chest X-ray images. IEEE Journal of Biomedical and Health Informatics 24(12):3595–3605.
Jain R, Gupta M, Taneja S, Hemanth DJ (2021) Deep learning based detection and analysis of COVID-19 on chest X-ray images. Applied Intelligence 51:1690–1700.
Mishra R, Daescu O (2017) Deep learning for skin lesion segmentation. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp 1189–1194.
Harangi B (2018) Skin lesion classification with ensembles of deep convolutional neural networks. Journal of Biomedical Informatics 86:25–32.
Ayesha H, Iqbal S, Tariq M, Abrar M, Sanaullah M, Abbas I, et al (2021) Automatic medical image interpretation: State of the art and future directions. Pattern Recognition 114:107856.
Yin C, Qian B, Wei J, Li X, Zhang X, Li Y, Zheng Q (2019) Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, pp 728–737.
National Institutes of Health. Open-i: Biomedical image search engine. Chest X-ray Collection. Retrieved 08.05.2025 from https://openi.nlm.nih.gov/gridquery?sub=x&it=xg&coll=cxr&m=1.
Erkantarci B, Bakal G (2024) An empirical study of sentiment analysis utilizing machine learning and deep learning algorithms. Journal of Computational Social Science 7(1):241–257.
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp 1532–1543.
Abad A, Ortega A, Teixeira A, Mateo CG, Hinarejos CDM, Perdigão F, et al (2016) Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016, Lisbon, Portugal, November 23–25, 2016, Proceedings. Springer, Vol. 10077.
Kalajdziski S, Ackovska N (2018) ICT Innovations 2018. Engineering and Life Sciences: 10th International Conference, ICT Innovations 2018, Ohrid, Macedonia, September 17–19, 2018, Proceedings. Springer, Vol. 940.

There are 38 citations in total.

Details

Primary Language	English
Subjects	Image Processing, Human-Computer Interaction, Deep Learning, Bioinformatics, Natural Language Processing
Journal Section	Research Articles
Authors	Ömer Faruk Güzel 0000-0002-3975-1659 Harun Tanrıverdi 0000-0003-3835-830X Mehmet Gökhan Bakal 0000-0003-2897-3894
Publication Date	July 31, 2025
Submission Date	August 7, 2024
Acceptance Date	March 13, 2025
Published in Issue	Year 2025 Volume: 5 Issue: 2

Cite

APA	Güzel, Ö. F., Tanrıverdi, H., & Bakal, M. G. (2025). Generating informative chest X-ray captions with LSTM architecture. Journal of Innovative Engineering and Natural Science, 5(2), 477-489. https://doi.org/10.61112/jiens.1529215

Download Cover Image

Article Files

Full Text

Open Journal Systems 28737

Journal of Innovative Engineering and Natural Science by İdris Karagöz is licensed under CC BY 4.0