COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS

İbrahim Sarbay; Göksu Bozdereli Berikol; İbrahim Ulaş Özturan; Keith Grimes

doi:10.24938/kutfd.1369468

TR EN

Açık Erişimli Doğal Dil İşleme Tabanlı Sohbet Botu Uygulamalarının Triyaj Kararlarındaki Performanslarının Karşılaştırılması

Abstract

Amaç: Herkese açık olan, kolay kullanılan ve sürekli gelişen yeni nesil sohbet botları, Acil Servisin en kritik işlevlerinden biri olan triyajda kullanılma potansiyeline sahiptir. Bu çalışmanın amacı, acil servis triyajına karar verme sırasında Generative Pre-trained Transformer 4 (GPT-4), Bard ve Claude uygulamalarının performansını değerlendirmektir. Gereç ve Yöntemler: Bu çalışma, 50 vaka senaryosu ile yürütülen kesitsel bir ön çalışmaydı. Acil Tıp uzmanları her senaryonun referans Emergency Severity Index triyaj kategorisini belirledikten sonra, her vaka senaryosu üç sohbet botu kullanılarak sorgulandı. Sohbet botları ve referanslar arasındaki tutarsız sınıflandırmalar overtriyaj (yanlış pozitif) veya undertriyaj (yanlış negatif) olarak tanımlandı. Birincil sonlanım sohbet botlarının tahmin performansı ve ikincil sonlanım ise yüksek ciddiyetteki vakaların triyajını belirlemede aralarındaki farktı. Bulgular: GPT-4, Bard ve Claude’nin Emergency Severity Index 1 ve 2’yi belirlemede F1 skorları sırasıyla 0,899, 0,791 ve 0,865’ti. Yüksek ciddiyet tespiti için ROC eğrilerinde; GPT-4'ün eğri altında kalan alanı (AUC) 0,911 (%95 GA: 0,814-1;p<0.001), Bard’ın 0,819 (%95 GA: 0,692-0,945; p<0.001) ve Claude’nin 0,881 idi (%95 GA: 0,768-0,994; p<0,001). Sonuç: GPT-4, mevcut haliyle, vaka setimizde yüksek ciddiyetteki Emergency Severity Index skorlarını tespit edebildi ve Acil Tıp uzmanları ile yakın uyum gösterdi. Bunu Claude takip ederken, Bard ile uyumu ise nispeten daha düşüktü. GPT-4 ve Claude, vaka yönetimi önerilerinde Bard'a göre daha iyi sonuçlar verdi. Gelecekteki potansiyelleri nedeniyle, sohbet botlarının triyajdaki etkinliğini ve sınırlılıklarını değerlendiren çalışmaların önemli olduğunu düşünüyoruz.

Keywords

COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS

Abstract

Objective: Being publicly available, easy to use, and continuously evolving, next-generation chatbots have the potential to be used in triage, one of the most critical functions of an Emergency Department. The aim of this study was to assess the performance of Generative Pre-trained Transformer 4 (GPT-4), Bard and Claude during decision-making for Emergency Department triage. Material and Methods: This was a preliminary cross-sectional study conducted with 50 case scenarios. Emergency Medicine specialists determined the reference Emergency Severity Index triage category of each scenario. Subsequently, each case scenario was queried using three chatbots. Inconsistent classifications between the chatbots and references were defined as over-triage (false positive) or under-triage (false negative). The primary and secondary outcomes were the predictive performance of chatbots and the difference between them in predicting high acuity triage. Results: F1 Scores for GPT-4, Bard, and Claude for predicting Emergency Severity Index 1 and 2 were 0.899, 0.791, and 0.865 respectively. The ROC Curve of GPT-4 for high acuity predictions showed an area under the curve (AUC) of 0.911 (95% CI: 0,814-1; p<0.001), while Bard showed an AUC of 0.819 (95% CI: 0.692-0.945; p<0.001) and for Claude this was 0.881 (95% CI:0.768-0.994; p<0.001). Conclusion: GPT-4, in its current form, was able to detect high acuity Emergency Severity Index scores in our case set and had close agreement with Emergency Medicine specialists, followed by Claude, while Bard's agreement was relatively lower. GPT-4 and Claude provided better results than Bard in case management recommendations. We believe that studies evaluating the effectiveness and limitations of chatbots in triage are important because of their future potential.

Keywords

Ethical Statement

Institutional review board approval was obtained for this study on 06.04.2023 (Kocaeli University Non-Interventional Clinical Research Ethics Committee - GOKAEK-2023/07.10).

Thanks

The authors would like to thank Prof. Elif Yaka for her valuable insights.

References

Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine. 2023;388(13):1233-9.
OpenAI. GPT-4 technical report. ArXiv. Accessed date: September 29, 2023: https://arxiv.org/abs/2303.08774.
Katz DM, Bommarito MJ, Gao S, Arredondo P. GPT-4 passes the bar exam. SSRN Electronic Journal. Published online 2023.
Google. Bard FAQ. Accessed date: April 21, 2023: https://bard.google.com/faq?hl=en
Anthropic. Introducing Claude. Accessed date: April 21, 2023:https://www.anthropic.com/index/introducing- claude
Kuriyama A, Urushidani S, Nakayama T. Five-level emergency triage systems: Variation in assessment of validity. Emergency Medicine Journal. 2017;34(11):703-10.
McHugh M, Tanabe P, McClelland M, Khare RK. More patients are triaged using the emergency severity index than any other triage acuity system in the United States. Academic Emergency Medicine. 2012;19(1):106-9.
Gilboy N, Tanabe P, Travers D, Rosenau A, Eitel D. Emergency Severity Index, Version 4: Implementation Handbook. 2005. Accessed date: September 29, 2023: https://www.sgnor.ch/fileadmin/user_upload/Doku mente/Downloads/Esi_Handbook.pdf.

Sánchez-Salmerón R, Gómez-Urquiza JL, Albendín-García L, Correa-Rodríguez M, Martos- Cabrera MB, Velando-Soriano A et al. Machine learning methods applied to triage in emergency services: A systematic review. Int Emerg Nurs. 2022;60:101109.
Greenbaum NR, Jernite Y, Halpern Y, Calder S, Nathanson LA, Sontag DA et al. Improving documentation of presenting problems in the emergency department using a domain-specific ontology and machine learning-driven user interfaces. Int J Med Inform. 2019;132:103981.
Sterling NW, Patzer RE, Di M, Schrager JD. Prediction of emergency department patient disposition based on natural language processing of triage notes. Int J Med Inform. 2019;129:184-8.
Sterling NW, Brann F, Patzer RE, Di M, Koebbe M, Burke M et al. Prediction of emergency department resource requirements during triage: An application of current natural language processing techniques. J Am Coll Emerg Physicians Open. 2020;1(6):1676- 83.
Tootooni MS, Pasupathy KS, Heaton HA, Clements CM, Sir MY. CCMapper: An adaptive NLP-based free-text chief complaint mapping algorithm. Comput Biol Med. 2019;113:103398.
Stewart J, Lu J, Goudie A, Arendts G, Meka SA, Freeman S et al. Applications of natural language processing at emergency department triage: A systematic review. MedRxiv. Published online December 21, 2022. Accessed date: April 21, 2023: https://doi.org/10.1101/2022.12.20.22283735.
Ivanov O, Wolf L, Brecher D, Lewis E, Masek K, Montgomery K et al. Improving ED emergency severity index acuity assignment using machine learning and clinical natural language processing. J Emerg Nurs. 2021;47(2):265-278.e7.
Ganjali R, Golmakani R, Ebrahimi M, Eslami S, Bolvardi E. Accuracy of the emergency department triage system using the emergency severity index for predicting patient outcome: A single center experience. Bull Emerg Trauma. 2020;8(2):115-20.
Chang D, Hong WS, Taylor RA. Generating contextual embeddings for emergency department chief complaints. JAMIA Open. 2020;3(2):160-6.
Arora A, Arora A. The promise of large language models in health care. The Lancet. 2023;401(10377):641.
Iftikhar L, Iftikhar MF, I Hanif M. DocGPT: Impact of ChatGPT-3 on health services as a virtual doctor. EC Paediatrics. 2023;12(3):45-55. Accessed date: April 21, 2023: https://ecronicon.org/assets/ecpe/pdf/ECPE-12- 01277.pdf
Chen W, Linthicum B, Argon NT, Bohrmann T, Lopiano K, Mehrotra A et al. The effects of emergency department crowding on triage and hospital admission decisions. Am J Emerg Med. 2020;38(4):774-9.
Rashid K, Ullah M, Ahmed ST, Sajid MZ, Hayat MA, Nawaz B et al. Accuracy of emergency room triage using emergency severity index (ESI): Independent predictor of under and over triage. Cureus. 2021;13(12):e20229.
Takaoka K, Ooya K, Ono M, Kakeda T. Utility of the emergency severity index by accuracy of interrater agreement by expert triage nurses in a simulated scenario in Japan: A randomized controlled trial. J Emerg Nurs. 2021;47(4):669-74.
Wang G, Liu X, Xie K, Chen N, Chen T. DeepTriager: A neural attention model for emergency triage with electronic health records. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2019:978-82.
Tahayori B, Chini‐Foroush N, Akhlaghi H. Advanced natural language processing technique to predict patient disposition based on emergency triage notes. Emergency Medicine Australasia. 2021;33(3):480-4.
Passi S, Vorvoreanu M. Overreliance on AI: Literature review. 2022. Accessed date: April 21, 2023: https://www.microsoft.com/en- us/research/uploads/prod/2022/06/Aether- Overreliance-on-AI-Review-Final-6.21.22.pdf

Details

Primary Language

English

Subjects

Health Services and Systems (Other)

Journal Section

Research Article

Authors

İbrahim Sarbay ^*
0000-0001-8804-2501
Türkiye

Göksu Bozdereli Berikol
0000-0002-4529-3578
Türkiye

İbrahim Ulaş Özturan
0000-0002-1364-5292
Türkiye

Keith Grimes
0000-0002-4906-6612
United Kingdom

Publication Date

December 26, 2023

Submission Date

October 1, 2023

Acceptance Date

October 12, 2023

Published in Issue

Year 2023 Volume: 25 Number: 3

DOI

https://doi.org/10.24938/kutfd.1369468

IZ

https://izlik.org/JA95WT53FU

Cite

RIS / Bibtex

APA

Sarbay, İ., Bozdereli Berikol, G., Özturan, İ. U., & Grimes, K. (2023). COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. The Journal of Kırıkkale University Faculty of Medicine, 25(3), 482-521. https://doi.org/10.24938/kutfd.1369468

AMA

1.Sarbay İ, Bozdereli Berikol G, Özturan İU, Grimes K. COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. Kırıkkale Uni Med J. 2023;25(3):482-521. doi:10.24938/kutfd.1369468

Chicago

Sarbay, İbrahim, Göksu Bozdereli Berikol, İbrahim Ulaş Özturan, and Keith Grimes. 2023. “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”. The Journal of Kırıkkale University Faculty of Medicine 25 (3): 482-521. https://doi.org/10.24938/kutfd.1369468.

EndNote

Sarbay İ, Bozdereli Berikol G, Özturan İU, Grimes K (December 1, 2023) COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. The Journal of Kırıkkale University Faculty of Medicine 25 3 482–521.

IEEE

[1]İ. Sarbay, G. Bozdereli Berikol, İ. U. Özturan, and K. Grimes, “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”, Kırıkkale Uni Med J, vol. 25, no. 3, pp. 482–521, Dec. 2023, doi: 10.24938/kutfd.1369468.

ISNAD

Sarbay, İbrahim - Bozdereli Berikol, Göksu - Özturan, İbrahim Ulaş - Grimes, Keith. “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”. The Journal of Kırıkkale University Faculty of Medicine 25/3 (December 1, 2023): 482-521. https://doi.org/10.24938/kutfd.1369468.

JAMA

1.Sarbay İ, Bozdereli Berikol G, Özturan İU, Grimes K. COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. Kırıkkale Uni Med J. 2023;25:482–521.

MLA

Sarbay, İbrahim, et al. “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”. The Journal of Kırıkkale University Faculty of Medicine, vol. 25, no. 3, Dec. 2023, pp. 482-21, doi:10.24938/kutfd.1369468.

Vancouver

1.İbrahim Sarbay, Göksu Bozdereli Berikol, İbrahim Ulaş Özturan, Keith Grimes. COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. Kırıkkale Uni Med J. 2023 Dec. 1;25(3):482-521. doi:10.24938/kutfd.1369468

Cited By

The Role of Language in Remote Healthcare Triage: A Meta‐Aggregative Review

Journal of Advanced Nursing

https://doi.org/10.1111/jan.16528