Research Article
BibTex RIS Cite

Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis

Year 2025, Volume: 11 Issue: 1, 22 - 31, 27.03.2025
https://doi.org/10.21306/dishekimligi.1639393

Abstract

Aim: This study aimed to evaluate the reliability and consistency of four artificial intelligence (AI) chatbots—ChatGPT 3.5, Google Gemini, Bing, and Claude AI—as public sources of information on the management of primary tooth trauma.
Materials and Methods: A total of 31 dichotomous questions were developed based on common issues and concerns related to dental trauma, particularly those frequently raised by parents. Each question, sequentially presented to the four AI chatbots, was repeated three times daily, with a one-hour interval between repetitions, over a five-day period, to assess the reliability and reproducibility of responses. Accuracy was determined by calculating the proportion of correct responses, with 95% confidence intervals estimated using the Wald binomial method. Reliability was assessed using Fleiss’ kappa coefficient.
Results: All AI chatbots demonstrated high accuracy. Bing emerged as the most accurate model, achieving an accuracy rate of 96.34%, while Claude had the lowest accuracy at 88.17%. Consistency was classified as “almost perfect” for ChatGPT, Bing, and Gemini, whereas Claude exhibited a “substantial” level of agreement. These findings underscore the relative performance of AI models in tasks requiring high accuracy and reliability.
Conclusion: These results emphasize the importance of critically evaluating AI-based systems for their potential use in clinical applications. Continuous improvements and updates are essential to enhance their reliability and ensure their effectiveness as public information tools.

References

  • 1. Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon. 2023;9:e23050.
  • 2. LeCun Y, Bengio Y, Hinton G. Deep Learning. Nature. 2015;28;521(7553):436-44.
  • 3. Johnson AJ, Singh TK, Gupta A, Sankar H, Gill I, Shalini M, Mohan N. Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma. Dent Traumatol. 2024 41(2):187-93.
  • 4. Agrawal P, Nikhade P. Artificial intelligence in dentistry: past, present, and future. Cureus. 2022;14:e27405.
  • 5. Guven Y, Ozdemir OT, Kavan MY. Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study. Dent Traumatol. 2024 22: 1-10.
  • 6. Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, et al. GPT understands, too. AI Open. 2024;5:208–15.
  • 7. Atkinson CF. Cheap, quick, and rigorous: artificial intelligence and the systematic literature review. Soc Sci Comput Rev. 2023;42:376–93.
  • 8. Safi Z, Abd-Alrazaq A, Khalifa M, Househ M. Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review, J Med Internet Res. 2020;22(12):e19127.
  • 9. Koçyiğit A, Darı AB. ChatGPT in artificial intelligence communication: the future of humanized digitalization. J Strateg Soc Res. 2023;7(2):427–38.
  • 10. Ayers JW, Zhu Z, Poliak A, Leas E, Dredze M, Hogarth M, et al. Evaluating Artificial Intelligence Responses to Public Health Questions. JAMA Network Open. 2023;6(6):e2317517.
  • 11. Levin L, Day PF, Hicks L, O’Connell A, Fouas AF, Bourguignon C, et al. International Association of Dental Traumatology Guidelines for the Management of Traumatic Dental Injuries: General Introduction. Dental Traumatol. 2020;36(4):309–13.
  • 12. Khan L. Dental Care and Trauma Management in Children and Adolescents. Pediatr Ann. 2019;48:e3– e8.
  • 13. Erwin J, Horrell J, Wheat H, Axford N, Burns L, Booth J, et al. Access to Dental Care for Children and Young People in Care and Care Leavers: A Global Scoping Review. Dental Journal. 2024;12(2):37.
  • 14. Shahnavazi M, Mohamadrahimi H. The application of artificial neural networks in the detection of mandibular fractures using panoramic radiography. Dent Res J. 2023;20:27.
  • 15. Pandey S, Sharma S. A comparative study of retrieval-based and generative-based chatbots using deep learning and machine learning. Healthc Anal. 2023;3:100198.
  • 16. Chiesa-Estomba CM, Lechien JR, Vaira LA, Brunet A, Cammaroto G, Mayo-Yanez M, et al. Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol. 2024;281:2081–6.
  • 17. Beltrami EJ, Grant-Kels JM. Consulting ChatGPT: Ethical dilemmas in language model artificial intelligence. J Am Acad Dermatol. 2024;90(4):879-80.
  • 18. Day PF, Flores MT, O’Connell AC, Abbott PV, Tsilingaridis F, Fouad AF, et al. International Association of Dental Traumatology guidelines for the management of traumatic dental injuries: 3. Injuries in the primary dentition. Dent Traumatol. 2020;36(4):343- 359.
  • 19. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-74.
  • 20. Fleiss JL. Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin, 1971;76(5):378–382.
  • 21. Wagle E, Allred EN, Needleman HL. Time delays in treating dental trauma at a children's hospital and private pediatric dental practice. Pediatr Dent. 2014;36(3):216-21.
  • 22. Kayıllıoğlu Zencircioğlu Ö, Eden E, Öcek ZA. Access to health care after dental trauma in children: A quantitative and qualitative evaluation. Dent Traumatol. 2019;35(3):163-70.
  • 23. Portilla ND, Garcia-Font M, Nagendrababu V, Abbott PV, Sanchez JAG, Abella F. Accuracy and Consistency of Gemini Responses Regarding the Management of Traumatized Permanent Teeth. Dent Traumatol. 2024 Oct 26 Epub ahead of print.
  • 24. Ozden I, Gokyar M, Ozden ME, Sazak Ovecoglu H. H. Assessment of artificial intelligence applications in responding to dental trauma. Dent Traumatol. 2024;40(6):722-9.
  • 25. Samaan JS, Yeo YH, Rajeev N, Hawley L, Abel S, Ng WH, et al. Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery. Obes Surg. 2023;33(6):1790-6.
  • 26. Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal HealthRelated Questions: Are We There Yet? Diagnostics. 2023;13(11):1950.
  • 27. Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG. Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study. J Med Internet Res. 2023; 28;25:e51580.
  • 28. Umer F, Habib S. Critical Analysis of Artificial Intelligence in Endodontics: A Scoping Review. J Endod. 2022;48(2):152-60.
  • 29. Sharma D, Vidhate DA, Osei-Asiamah J, Kumari M, Mahajan V, Rajagopal K. Exploring the Evolution of Chatgpt: From Origin to Revolutionary Influence. Educational Administration: Theory and Practice 30(5),2685-92.
  • 30. Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PMH, Nosrat A. Validity and Reliability of Artificial Intelligence Chatbots as Public Sources of Information on Endodontics Int Endod J. 2024;57(3):305-14.
  • 31. Gemini Team, M. Reid, N. Savinov, et al. “Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context. arXiv. 2024;4: 1–154.

Süt Dişi Travmalarının Yönetiminde Yapay Zekâ Sohbet Botlarının Değerlendirilmesi: Karşılaştırmalı Bir Analiz

Year 2025, Volume: 11 Issue: 1, 22 - 31, 27.03.2025
https://doi.org/10.21306/dishekimligi.1639393

Abstract

Amaç: Bu çalışma, dört yapay zeka sohbet botunun (ChatGPT 3.5, Google Gemini, Bing ve Claude AI) süt dişi travmasının yönetimiyle ilgili kamuya açık bilgi kaynakları olarak güvenilirliğini ve tutarlılığını değerlendirmeyi amaçlamıştır.
Yöntem: Ebeveynlerin dental travmalar hakkında en sık sorduğu sorular temel alınarak, "Evet" veya "Hayır" şeklinde yanıtlanabilen 31 soru hazırlanmıştır. Her soru, dört yapay zeka sohbet botuna sırasıyla yöneltilmiş ve yanıtların güvenilirliğini ve tekrarlanabilirliğini değerlendirmek amacıyla beş gün boyunca, günde üç kez, birer saat arayla tekrarlanmıştır. Doğruluk, doğru yanıtların oranı hesaplanarak belirlenmiş ve %95 güven aralıkları Wald binom yöntemi kullanılarak tahmin edilmiştir. Güvenilirlik, Fleiss’in kappa katsayısı ile değerlendirilmiştir.
Bulgular: Tüm yapay zeka sohbet botları yüksek doğruluk sergilemiştir. Bing, %96,34 doğruluk oranı ile en doğru model olarak öne çıkarken, Claude %88,17 doğruluk oranı ile en düşük performansı göstermiştir. Tutarlılık açısından ChatGPT, Bing ve Gemini “neredeyse mükemmel” düzeyde uyum gösterirken, Claude “önemli” düzeyde bir uyum sergilemiştir. Bu bulgular, yüksek doğruluk ve güvenilirlik gerektiren görevlerde Yapay Zeka modellerinin göreceli performansını vurgulamaktadır.
Sonuç: Bu sonuçlar, klinik uygulamalarda potansiyel kullanımları açısından yapay zeka tabanlı sistemlerin eleştirel bir şekilde değerlendirilmesinin önemini ortaya koymaktadır. Güvenilirliklerini artırmak ve kamuya açık bilgi araçları olarak etkinliklerini sağlamak için sürekli iyileştirmeler ve güncellemeler gereklidir.

References

  • 1. Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon. 2023;9:e23050.
  • 2. LeCun Y, Bengio Y, Hinton G. Deep Learning. Nature. 2015;28;521(7553):436-44.
  • 3. Johnson AJ, Singh TK, Gupta A, Sankar H, Gill I, Shalini M, Mohan N. Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma. Dent Traumatol. 2024 41(2):187-93.
  • 4. Agrawal P, Nikhade P. Artificial intelligence in dentistry: past, present, and future. Cureus. 2022;14:e27405.
  • 5. Guven Y, Ozdemir OT, Kavan MY. Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study. Dent Traumatol. 2024 22: 1-10.
  • 6. Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, et al. GPT understands, too. AI Open. 2024;5:208–15.
  • 7. Atkinson CF. Cheap, quick, and rigorous: artificial intelligence and the systematic literature review. Soc Sci Comput Rev. 2023;42:376–93.
  • 8. Safi Z, Abd-Alrazaq A, Khalifa M, Househ M. Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review, J Med Internet Res. 2020;22(12):e19127.
  • 9. Koçyiğit A, Darı AB. ChatGPT in artificial intelligence communication: the future of humanized digitalization. J Strateg Soc Res. 2023;7(2):427–38.
  • 10. Ayers JW, Zhu Z, Poliak A, Leas E, Dredze M, Hogarth M, et al. Evaluating Artificial Intelligence Responses to Public Health Questions. JAMA Network Open. 2023;6(6):e2317517.
  • 11. Levin L, Day PF, Hicks L, O’Connell A, Fouas AF, Bourguignon C, et al. International Association of Dental Traumatology Guidelines for the Management of Traumatic Dental Injuries: General Introduction. Dental Traumatol. 2020;36(4):309–13.
  • 12. Khan L. Dental Care and Trauma Management in Children and Adolescents. Pediatr Ann. 2019;48:e3– e8.
  • 13. Erwin J, Horrell J, Wheat H, Axford N, Burns L, Booth J, et al. Access to Dental Care for Children and Young People in Care and Care Leavers: A Global Scoping Review. Dental Journal. 2024;12(2):37.
  • 14. Shahnavazi M, Mohamadrahimi H. The application of artificial neural networks in the detection of mandibular fractures using panoramic radiography. Dent Res J. 2023;20:27.
  • 15. Pandey S, Sharma S. A comparative study of retrieval-based and generative-based chatbots using deep learning and machine learning. Healthc Anal. 2023;3:100198.
  • 16. Chiesa-Estomba CM, Lechien JR, Vaira LA, Brunet A, Cammaroto G, Mayo-Yanez M, et al. Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol. 2024;281:2081–6.
  • 17. Beltrami EJ, Grant-Kels JM. Consulting ChatGPT: Ethical dilemmas in language model artificial intelligence. J Am Acad Dermatol. 2024;90(4):879-80.
  • 18. Day PF, Flores MT, O’Connell AC, Abbott PV, Tsilingaridis F, Fouad AF, et al. International Association of Dental Traumatology guidelines for the management of traumatic dental injuries: 3. Injuries in the primary dentition. Dent Traumatol. 2020;36(4):343- 359.
  • 19. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-74.
  • 20. Fleiss JL. Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin, 1971;76(5):378–382.
  • 21. Wagle E, Allred EN, Needleman HL. Time delays in treating dental trauma at a children's hospital and private pediatric dental practice. Pediatr Dent. 2014;36(3):216-21.
  • 22. Kayıllıoğlu Zencircioğlu Ö, Eden E, Öcek ZA. Access to health care after dental trauma in children: A quantitative and qualitative evaluation. Dent Traumatol. 2019;35(3):163-70.
  • 23. Portilla ND, Garcia-Font M, Nagendrababu V, Abbott PV, Sanchez JAG, Abella F. Accuracy and Consistency of Gemini Responses Regarding the Management of Traumatized Permanent Teeth. Dent Traumatol. 2024 Oct 26 Epub ahead of print.
  • 24. Ozden I, Gokyar M, Ozden ME, Sazak Ovecoglu H. H. Assessment of artificial intelligence applications in responding to dental trauma. Dent Traumatol. 2024;40(6):722-9.
  • 25. Samaan JS, Yeo YH, Rajeev N, Hawley L, Abel S, Ng WH, et al. Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery. Obes Surg. 2023;33(6):1790-6.
  • 26. Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal HealthRelated Questions: Are We There Yet? Diagnostics. 2023;13(11):1950.
  • 27. Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG. Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study. J Med Internet Res. 2023; 28;25:e51580.
  • 28. Umer F, Habib S. Critical Analysis of Artificial Intelligence in Endodontics: A Scoping Review. J Endod. 2022;48(2):152-60.
  • 29. Sharma D, Vidhate DA, Osei-Asiamah J, Kumari M, Mahajan V, Rajagopal K. Exploring the Evolution of Chatgpt: From Origin to Revolutionary Influence. Educational Administration: Theory and Practice 30(5),2685-92.
  • 30. Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PMH, Nosrat A. Validity and Reliability of Artificial Intelligence Chatbots as Public Sources of Information on Endodontics Int Endod J. 2024;57(3):305-14.
  • 31. Gemini Team, M. Reid, N. Savinov, et al. “Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context. arXiv. 2024;4: 1–154.
There are 31 citations in total.

Details

Primary Language English
Subjects Paedodontics
Journal Section Research Articles
Authors

Mihriban Gökcek Taraç 0000-0003-3960-8518

Publication Date March 27, 2025
Submission Date February 14, 2025
Acceptance Date March 11, 2025
Published in Issue Year 2025 Volume: 11 Issue: 1

Cite

APA Gökcek Taraç, M. (2025). Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis. Journal of International Dental Sciences (Uluslararası Diş Hekimliği Bilimleri Dergisi), 11(1), 22-31. https://doi.org/10.21306/dishekimligi.1639393
AMA Gökcek Taraç M. Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis. J Int Dent Sci. March 2025;11(1):22-31. doi:10.21306/dishekimligi.1639393
Chicago Gökcek Taraç, Mihriban. “Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis”. Journal of International Dental Sciences (Uluslararası Diş Hekimliği Bilimleri Dergisi) 11, no. 1 (March 2025): 22-31. https://doi.org/10.21306/dishekimligi.1639393.
EndNote Gökcek Taraç M (March 1, 2025) Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis. Journal of International Dental Sciences (Uluslararası Diş Hekimliği Bilimleri Dergisi) 11 1 22–31.
IEEE M. Gökcek Taraç, “Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis”, J Int Dent Sci, vol. 11, no. 1, pp. 22–31, 2025, doi: 10.21306/dishekimligi.1639393.
ISNAD Gökcek Taraç, Mihriban. “Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis”. Journal of International Dental Sciences (Uluslararası Diş Hekimliği Bilimleri Dergisi) 11/1 (March 2025), 22-31. https://doi.org/10.21306/dishekimligi.1639393.
JAMA Gökcek Taraç M. Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis. J Int Dent Sci. 2025;11:22–31.
MLA Gökcek Taraç, Mihriban. “Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis”. Journal of International Dental Sciences (Uluslararası Diş Hekimliği Bilimleri Dergisi), vol. 11, no. 1, 2025, pp. 22-31, doi:10.21306/dishekimligi.1639393.
Vancouver Gökcek Taraç M. Evaluation of Artificial Intelligence Chatbots in the Management of Primary Tooth Traumas: A Comparative Analysis. J Int Dent Sci. 2025;11(1):22-31.

The journal receives submissions of research articles, case reports and review-type publications, and these are indexed by international and national indexes.

The International Journal of Dental Sciences has been indexed by Europub, the Asian Science Citation Index, the Asos index, the ACAR index and Google Scholar. In addition, applications were made to TR Index and other indexes.