Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods

Uçman Ergün; Sedanur Orcin; Sezin Barın

doi:10.55546/jmm.1571384

Araştırma Makalesi

Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods

Yıl 2025, Cilt: 6 Sayı: 1, 1 - 14, 19.06.2025

Uçman Ergün , Sedanur Orcin , Sezin Barın

https://doi.org/10.55546/jmm.1571384

Öz

Radiology reports are essential for clinical decision-making and diagnosis, containing complex and detailed information. However, their unstructured nature makes efficient processing and analysis challenging, increasing the workload of healthcare professionals and slowing down clinical workflows. Natural Language Processing (NLP) techniques provide effective solutions by extracting meaningful information from such texts, reducing expert workload, and expediting decision-making processes. This study focuses on Named Entity Recognition (NER) in chest radiology reports using the RadGraph dataset, annotated with four tag types. The objective is to compare the performance of two NLP models—BERT (Bidirectional Encoder Representations from Transformers) and LSTM (Long Short-Term Memory) —to identify the most suitable approach for clinical data. Various training parameters, including learning rate, optimization algorithm, and input size, were optimized to enhance model performance. To address the class imbalance in the dataset, data augmentation techniques were applied, and both models were fine-tuned. The results revealed that BERT, leveraging its attention mechanism, demonstrated superior performance in identifying complex terms and entities, outperforming LSTM in accuracy, precision, recall, and F1 score. While LSTM effectively captured long-term dependencies, it required longer training times. This research highlights the potential of NLP in automating the extraction of clinical entities from radiology reports. It provides valuable insights for optimizing models and developing clinical decision support systems, ultimately aiming to enhance the efficiency of healthcare workflows.

Anahtar Kelimeler

Deep Learning , Natural Language Processing , Named Entity Recognation , Radiological Report , BERT

Destekleyen Kurum

TÜBİTAK

Proje Numarası

1649B022405236

Teşekkür

This project was supported with application number 1649B022405236 within the scope of TÜBİTAK 2210-C Priority areas scholarship program.

Kaynakça

Abuzayed A., Al-Khalifa H., Sarcasm and sentiment detection in Arabic tweets using BERT-based models and data augmentation. In Proceedings of the sixth Arabic natural language processing workshop 312-317, 2021.
Banerjee I., Ling Y., Chen M. C., Hasan S. A., Langlotz C. P., Moradzadeh N., Chapman B., Amrhein T., Mong D., Rubin D. L., Farri O., Lungren M. P., Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artificial Intelligence in Medicine 97, 79–88, 2019. https://doi.org/10.1016/j.artmed.2018.11.004
Brasoveanu A. M. P., Andonie R., Visualizing Transformers for NLP: A Brief Survey, 24th International Conference Information Visualisation (IV), Melbourne/Australia, September 07-11, 2020, pp: 270–279. https://doi.org/10.1109/IV51561.2020.00051
Choi H., Kim J., Joe S., Gwon Y., Evaluation of bert and albert sentence embedding performance on downstream nlp tasks, In 2020 25th International conference on pattern recognition (ICPR), Milan/Italy, January 10-15, 2021, pp: 5482-5487. 10.1109/ICPR48806.2021.9412102
Cornegruta S., Bakewell R., Withey S., Montana G., Modelling radiological language with bidirectional long short-term memory networks. arXiv preprint arXiv:1609.08409, 2016.
Houlsby N., Giurgiu A., Jastrzebski S., Morrone B., De Laroussilhe Q., Gesmundo A., Gelly S., Parameter-efficient transfer learning for NLP. 36th International Conference on Machine Learning, Long Beach/California, 2019, pp: 2790-2799. https://doi.org/10.1007/978-3-030-77211-6_12
Jain S., Agrawal A., Saporta A., Truon S. Q., Duong D. N., Bui T., Rajpurkar P., Radgraph: Extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:2106.14463, 2021.
Lamproudis A., Henriksson A., Dalianis H., Developing a clinical language model for Swedish: continued pretraining of generic BERT with in-domain data, In International Conference Recent Advances in Natural Language Processing (RANLP'21), Shoumen, September 1-3, 2021, pp: 790-797, 2021.
Liu J., Chen Y., Xu J., Low-Resource NER by Data Augmentation with Prompting, Thirty-First International Joint Conference on Artificial Intelligence, July 23-29, 2022, pp: 4252-4258.
López-Úbeda P., Díaz-Galiano M. C., Martín-Noguerol T., Luna A., Ureña-López L. A., Martín-Valdivia M. T., COVID-19 detection in radiological text reports integrating entity recognition. Computers in Biology and Medicine 127, 104066, 2020. https://doi.org/10.1016/j.compbiomed.2020.104066
López-Úbeda P., Martín-Noguerol T., Luna A., Automatic classification and prioritisation of actionable BI-RADS categories using natural language processing models. Clinical Radiology 79(1), e1-e7, 2024. https://doi.org/10.1016/j.crad.2023.09.009
Nag P. K., Bhagat A., Priya R. V., Khare D. kumar. Emotional Intelligence Through Artificial Intelligence: NLP and Deep Learning in the Analysis of Healthcare Texts, arXiv preprint arXiv: 2403.09762, 2024. http://arxiv.org/abs/2403.09762
Nishio M., Matsunaga T., Matsuo H., Nogami M., Kurata Y., Fujimoto K., Sugiyama O., Akashi T., Aoki S., Murakami T., Fully automatic summarization of radiology reports using natural language processing with large language models. Informatics in Medicine Unlocked 46, 101465, 2024. https://doi.org/10.1016/j.imu.2024.101465
Pereira S. C., Mendonça A. M., Campilho A., Sousa P., Lopes C. T., Automated image label extraction from radiology reports—A review. Artificial Intelligence in Medicine 149, 102814, 2024. https://doi.org/10.1016/j.artmed.2024.102814
RadGraph Dataset. Last Access Date: 13 Haziran 2024 from https://physionet.org/content/radgraph/1.0.0/ Rahali A., Akhloufi M. A., End-to-End Transformer-Based Models in Textual-Based NLP. AI, 4(1), 54–110, 2023. https://doi.org/10.3390/ai4010004
Rahman M. H., Islam M. S., Jowel M. M. U., Hasan M. M., Latif S., Classification of Book Review Sentiment in Bangla Language Using NLP, Machine Learning and LSTM, 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur/India, July 06-08, 2021, IEEE- 51525. https://doi.org/10.1109/ICCCNT51525.2021.9580116
Rani S., Jain A., Kumar A., Yang G., CCheXR-Attention: Clinical concept extraction and chest x-ray reports classification using modified Mogrifier and bidirectional LSTM with multihead attention. International Journal of Imaging Systems and Technology, 34(1), 1-15, 2024. https://doi.org/10.1002/ima.23025
Sun Z., Lin M., Zhu Q., Xie Q., Wang F., Lu Z., Peng Y., A scoping review on multimodal deeplearning in biomedical images and texts. Journal of Biomedical Informatics 146, 104482, 2023. https://doi.org/10.1016/j.jbi.2023.104482
Tarwani K. M., Edem S., Survey on Recurrent Neural Network in Natural Language Processing. International Journal of Engineering Trends and Technology 48(6), 301-304, 2017. https://doi.org/10.14445/22315381/IJETT-V48P253
Thukral A., Dhiman S., Meher R., Bedi P., Knowledge graph enrichment from clinical narratives using NLP, NER, and biomedical ontologies for healthcare applications. International Journal of Information Technology, 15(1), 53-65, 2023.
Tokgoz M., Turhan F., Bolucu N., Can B., Tuning language representation models for classification of Turkish news, 2021 International symposium on electrical, electronics and information engineering, 2021, pp: 402-407. Turchin A., Masharsky S., Zitnik M., Comparison of BERT implementations for natural language processing of narrative medical documents. Informatics in Medicine Unlocked 36, 101139, 2023. https://doi.org/10.1016/j.imu.2022.101139
Uskaner Hepsağ P., Özel S. A., Dalcı K., Yazıcı A., Using BERT models for breast cancer diagnosis from Turkish radiology reports. Language Resources and Evaluation, 58, 981-1012 2024. https://doi.org/10.1007/s10579-023-09669-w
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. Attention Is All You Need, arXiv preprint arXiv: 1706.03762, 2017. http://arxiv.org/abs/1706.03762
Wang M., Hu F., The application of nltk library for python natural language processing in corpus research. Theory and Practice in Language Studies 11(9), 1041-1049, 2021. https://doi.org/10.17507/tpls.1109.09
Yamashita R., Bird K., Cheung P. Y. C., Decker J. H., Flory M. N., Goff D., Morimoto L. N., Shon A., Wentland A. L., Rubin D. L., Desser T. S., Automated Identification and Measurement Extraction of Pancreatic Cystic Lesions from Free-Text Radiology Reports Using Natural Language Processing. Radiology: Artificial Intelligence 4(2), e210092, 2022.
Yan A., McAuley J., Lu X., Du J., Chang E. Y., Gentili A., Hsu C. N., RadBERT: Adapting Transformer-based Language Models to Radiology. Radiology: Artificial Intelligence 4(4), e210258, 2022. https://doi.org/10.1148/ryai.210258
Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R. R., Le Q. V., Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems, 32, 10, 2019.
Yuan J., Liao H., Luo R., Luo J., Automatic Radiology Report Generation Based on Multi-view Image Fusion and Medical Concept Enrichment. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11769 LNCS, 721-729, 2019. https://doi.org/10.1007/978-3-030-32226-7_80
Zhang X., Chen M. H., Qin Y., NLP-QA Framework Based on LSTM-RNN, 2nd International Conference on Data Science and Business Analytics (ICDSBA), Changsha/China, September 21-23, 2018, 307-311, 2018. https://doi.org/10.1109/ICDSBA.2018.00065

NLP Yöntemleri Kullanılarak Göğüs Radyolojisi Raporlarından Klinik Varlıkların Çıkarılması

Yıl 2025, Cilt: 6 Sayı: 1, 1 - 14, 19.06.2025

Uçman Ergün , Sedanur Orcin , Sezin Barın

https://doi.org/10.55546/jmm.1571384

Öz

Klinik karar verme ve tanı koyma süreçlerinde büyük önem taşıyan radyoloji raporlarının zengin ancak karmaşık olması, verilerin verimli bir şekilde işlenmesini ve analiz edilmesini zorlaştırmaktadır. Bu durum sağlık çalışanlarının iş yükünü artırmakta ve klinik iş akışlarını yavaşlatmaktadır. Doğal Dil İşleme (NLP) teknikleri, bu tür metinlerden anlamlı bilgileri çıkarıp işleyerek değerli çözümler sunar, böylece uzman iş yükünü azaltır ve karar verme sürecini hızlandırır.
Bu çalışmada, dört farklı etiket türüyle etiketlenmiş RadGraph veri kümesini kullanarak göğüs radyolojisi raporlarından Adlandırılmış Varlık Tanıma (NER) üzerine odaklanıyoruz. Amaç, iki modelin performansını karşılaştırmaktır: BERT (Transformatörlerden Çift Yönlü Kodlayıcı Temsilleri) ve LSTM (Uzun Kısa Süreli Bellek). Klinik veriler için en etkili kombinasyonu bulmak amacıyla öğrenme oranı, optimizasyon algoritması ve girdi boyutu gibi çeşitli eğitim parametreleri test edilmiştir.
Veri kümesinin etiket dağılımındaki dengesizlik, veri artırımı yoluyla giderilmiş ve her iki modele de ince ayar yapılmıştır. Sonuçlar, BERT'i n dikkat mekanizmasıyla karmaşık terimleri ve varlıkları tanımlamada mükemmel olduğunu ve doğruluk, kesinlik, geri çağırma ve F1 puanı açısından LSTM' den daha iyi performans gösterdiğini göstermiştir. LSTM uzun vadeli bağımlılıkları iyi bir şekilde ele almasına rağmen, daha uzun eğitim sürelerine sahipti.
Bu araştırma, NLP' nin tıbbi metinlerden varlık çıkarımını otomatikleştirmedeki etkinliğinin altını çizmekte ve gelecekteki model optimizasyonu ve klinik karar destek sistemi geliştirme için değerli bilgiler sunmaktadır.

Anahtar Kelimeler

Derin Öğrenme , Doğal Dil İşleme , Adlandırılmış varlık tanıma , Radyolojik Rapor , BERT

Proje Numarası

1649B022405236

Kaynakça

Abuzayed A., Al-Khalifa H., Sarcasm and sentiment detection in Arabic tweets using BERT-based models and data augmentation. In Proceedings of the sixth Arabic natural language processing workshop 312-317, 2021.
Banerjee I., Ling Y., Chen M. C., Hasan S. A., Langlotz C. P., Moradzadeh N., Chapman B., Amrhein T., Mong D., Rubin D. L., Farri O., Lungren M. P., Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artificial Intelligence in Medicine 97, 79–88, 2019. https://doi.org/10.1016/j.artmed.2018.11.004
Brasoveanu A. M. P., Andonie R., Visualizing Transformers for NLP: A Brief Survey, 24th International Conference Information Visualisation (IV), Melbourne/Australia, September 07-11, 2020, pp: 270–279. https://doi.org/10.1109/IV51561.2020.00051
Choi H., Kim J., Joe S., Gwon Y., Evaluation of bert and albert sentence embedding performance on downstream nlp tasks, In 2020 25th International conference on pattern recognition (ICPR), Milan/Italy, January 10-15, 2021, pp: 5482-5487. 10.1109/ICPR48806.2021.9412102
Cornegruta S., Bakewell R., Withey S., Montana G., Modelling radiological language with bidirectional long short-term memory networks. arXiv preprint arXiv:1609.08409, 2016.
Houlsby N., Giurgiu A., Jastrzebski S., Morrone B., De Laroussilhe Q., Gesmundo A., Gelly S., Parameter-efficient transfer learning for NLP. 36th International Conference on Machine Learning, Long Beach/California, 2019, pp: 2790-2799. https://doi.org/10.1007/978-3-030-77211-6_12
Jain S., Agrawal A., Saporta A., Truon S. Q., Duong D. N., Bui T., Rajpurkar P., Radgraph: Extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:2106.14463, 2021.
Lamproudis A., Henriksson A., Dalianis H., Developing a clinical language model for Swedish: continued pretraining of generic BERT with in-domain data, In International Conference Recent Advances in Natural Language Processing (RANLP'21), Shoumen, September 1-3, 2021, pp: 790-797, 2021.
Liu J., Chen Y., Xu J., Low-Resource NER by Data Augmentation with Prompting, Thirty-First International Joint Conference on Artificial Intelligence, July 23-29, 2022, pp: 4252-4258.
López-Úbeda P., Díaz-Galiano M. C., Martín-Noguerol T., Luna A., Ureña-López L. A., Martín-Valdivia M. T., COVID-19 detection in radiological text reports integrating entity recognition. Computers in Biology and Medicine 127, 104066, 2020. https://doi.org/10.1016/j.compbiomed.2020.104066
López-Úbeda P., Martín-Noguerol T., Luna A., Automatic classification and prioritisation of actionable BI-RADS categories using natural language processing models. Clinical Radiology 79(1), e1-e7, 2024. https://doi.org/10.1016/j.crad.2023.09.009
Nag P. K., Bhagat A., Priya R. V., Khare D. kumar. Emotional Intelligence Through Artificial Intelligence: NLP and Deep Learning in the Analysis of Healthcare Texts, arXiv preprint arXiv: 2403.09762, 2024. http://arxiv.org/abs/2403.09762
Nishio M., Matsunaga T., Matsuo H., Nogami M., Kurata Y., Fujimoto K., Sugiyama O., Akashi T., Aoki S., Murakami T., Fully automatic summarization of radiology reports using natural language processing with large language models. Informatics in Medicine Unlocked 46, 101465, 2024. https://doi.org/10.1016/j.imu.2024.101465
Pereira S. C., Mendonça A. M., Campilho A., Sousa P., Lopes C. T., Automated image label extraction from radiology reports—A review. Artificial Intelligence in Medicine 149, 102814, 2024. https://doi.org/10.1016/j.artmed.2024.102814
RadGraph Dataset. Last Access Date: 13 Haziran 2024 from https://physionet.org/content/radgraph/1.0.0/ Rahali A., Akhloufi M. A., End-to-End Transformer-Based Models in Textual-Based NLP. AI, 4(1), 54–110, 2023. https://doi.org/10.3390/ai4010004
Rahman M. H., Islam M. S., Jowel M. M. U., Hasan M. M., Latif S., Classification of Book Review Sentiment in Bangla Language Using NLP, Machine Learning and LSTM, 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur/India, July 06-08, 2021, IEEE- 51525. https://doi.org/10.1109/ICCCNT51525.2021.9580116
Rani S., Jain A., Kumar A., Yang G., CCheXR-Attention: Clinical concept extraction and chest x-ray reports classification using modified Mogrifier and bidirectional LSTM with multihead attention. International Journal of Imaging Systems and Technology, 34(1), 1-15, 2024. https://doi.org/10.1002/ima.23025
Sun Z., Lin M., Zhu Q., Xie Q., Wang F., Lu Z., Peng Y., A scoping review on multimodal deeplearning in biomedical images and texts. Journal of Biomedical Informatics 146, 104482, 2023. https://doi.org/10.1016/j.jbi.2023.104482
Tarwani K. M., Edem S., Survey on Recurrent Neural Network in Natural Language Processing. International Journal of Engineering Trends and Technology 48(6), 301-304, 2017. https://doi.org/10.14445/22315381/IJETT-V48P253
Thukral A., Dhiman S., Meher R., Bedi P., Knowledge graph enrichment from clinical narratives using NLP, NER, and biomedical ontologies for healthcare applications. International Journal of Information Technology, 15(1), 53-65, 2023.
Tokgoz M., Turhan F., Bolucu N., Can B., Tuning language representation models for classification of Turkish news, 2021 International symposium on electrical, electronics and information engineering, 2021, pp: 402-407. Turchin A., Masharsky S., Zitnik M., Comparison of BERT implementations for natural language processing of narrative medical documents. Informatics in Medicine Unlocked 36, 101139, 2023. https://doi.org/10.1016/j.imu.2022.101139
Uskaner Hepsağ P., Özel S. A., Dalcı K., Yazıcı A., Using BERT models for breast cancer diagnosis from Turkish radiology reports. Language Resources and Evaluation, 58, 981-1012 2024. https://doi.org/10.1007/s10579-023-09669-w
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. Attention Is All You Need, arXiv preprint arXiv: 1706.03762, 2017. http://arxiv.org/abs/1706.03762
Wang M., Hu F., The application of nltk library for python natural language processing in corpus research. Theory and Practice in Language Studies 11(9), 1041-1049, 2021. https://doi.org/10.17507/tpls.1109.09
Yamashita R., Bird K., Cheung P. Y. C., Decker J. H., Flory M. N., Goff D., Morimoto L. N., Shon A., Wentland A. L., Rubin D. L., Desser T. S., Automated Identification and Measurement Extraction of Pancreatic Cystic Lesions from Free-Text Radiology Reports Using Natural Language Processing. Radiology: Artificial Intelligence 4(2), e210092, 2022.
Yan A., McAuley J., Lu X., Du J., Chang E. Y., Gentili A., Hsu C. N., RadBERT: Adapting Transformer-based Language Models to Radiology. Radiology: Artificial Intelligence 4(4), e210258, 2022. https://doi.org/10.1148/ryai.210258
Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R. R., Le Q. V., Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems, 32, 10, 2019.
Yuan J., Liao H., Luo R., Luo J., Automatic Radiology Report Generation Based on Multi-view Image Fusion and Medical Concept Enrichment. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11769 LNCS, 721-729, 2019. https://doi.org/10.1007/978-3-030-32226-7_80
Zhang X., Chen M. H., Qin Y., NLP-QA Framework Based on LSTM-RNN, 2nd International Conference on Data Science and Business Analytics (ICDSBA), Changsha/China, September 21-23, 2018, 307-311, 2018. https://doi.org/10.1109/ICDSBA.2018.00065

Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Derin Öğrenme, Makine Öğrenme (Diğer), Biyomedikal Bilimler ve Teknolojiler
Bölüm	Araştırma Makaleleri
Yazarlar	Uçman Ergün 0000-0002-9218-2192 Sedanur Orcin 0009-0007-4345-4984 Sezin Barın 0000-0002-0394-2779
Proje Numarası	1649B022405236
Erken Görünüm Tarihi	15 Haziran 2025
Yayımlanma Tarihi	19 Haziran 2025
Gönderilme Tarihi	21 Ekim 2024
Kabul Tarihi	20 Aralık 2024
Yayımlandığı Sayı	Yıl 2025 Cilt: 6 Sayı: 1

Kaynak Göster

APA	Ergün, U., Orcin, S., & Barın, S. (2025). Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods. Journal of Materials and Mechatronics: A, 6(1), 1-14. https://doi.org/10.55546/jmm.1571384
AMA	Ergün U, Orcin S, Barın S. Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods. J. Mater. Mechat. A. Haziran 2025;6(1):1-14. doi:10.55546/jmm.1571384
Chicago	Ergün, Uçman, Sedanur Orcin, ve Sezin Barın. “Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods”. Journal of Materials and Mechatronics: A 6, sy. 1 (Haziran 2025): 1-14. https://doi.org/10.55546/jmm.1571384.
EndNote	Ergün U, Orcin S, Barın S (01 Haziran 2025) Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods. Journal of Materials and Mechatronics: A 6 1 1–14.
IEEE	U. Ergün, S. Orcin, ve S. Barın, “Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods”, J. Mater. Mechat. A, c. 6, sy. 1, ss. 1–14, 2025, doi: 10.55546/jmm.1571384.
ISNAD	Ergün, Uçman vd. “Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods”. Journal of Materials and Mechatronics: A 6/1 (Haziran2025), 1-14. https://doi.org/10.55546/jmm.1571384.
JAMA	Ergün U, Orcin S, Barın S. Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods. J. Mater. Mechat. A. 2025;6:1–14.
MLA	Ergün, Uçman vd. “Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods”. Journal of Materials and Mechatronics: A, c. 6, sy. 1, 2025, ss. 1-14, doi:10.55546/jmm.1571384.
Vancouver	Ergün U, Orcin S, Barın S. Extraction Of Clinical Entities from Chest Radiology Reports Using NLP Methods. J. Mater. Mechat. A. 2025;6(1):1-14.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin