İntraoral Maksillofasiyal Protez Sorularında Yapay Zeka Tabanlı Sohbet Robotlarının Doğruluk ve Tutarlılığının Değerlendirilmesi

Mustafa Ayata; Haydar Albayrak

doi:10.62268/add.1835285

Research Article

Evaluation of the Accuracy and Consistency of Artificial Intelligence-Based Chatbots in Intraoral Maxillofacial Prosthodontics Questions

Year 2025, Volume: 4 Issue: 3, 204 - 211, 30.12.2025

Mustafa Ayata , Haydar Albayrak

https://doi.org/10.62268/add.1835285

https://izlik.org/JA45LN32NU

Abstract

Objectives
The aim of this study was to comparatively evaluate the accuracy and temporal consistency of responses provided by four artificial intelligence–based chatbots to multiple-choice questions related to intraoral maxillofacial prostheses.
Material and Methods
Forty single-best-answer multiple-choice questions were prepared on topics such as maxillectomy obturators, palatopharyngeal obturators and palatal lift prostheses, mandibular guidance flange prostheses, and implant-retained obturators. Each chatbot was asked the same set of questions 3 times (morning, noon, and evening) on the same day. Responses were compared with the key answer, and each answer was recorded as correct or incorrect to calculate accuracy rates. A generalized linear mixed model was constructed to examine the effects of chatbot model and time on accuracy. Temporal consistency was assessed by determining the proportion of identical answers across the three repetitions for each question and by calculating Fleiss’ kappa coefficients.
Results
Overall accuracy rates were 95% for ChatGPT, 92.5% for Claude, 88.3% for Gemini, and 88.3% for Copilot. The generalized linear mixed model revealed no statistically significant differences in accuracy among the chatbots (p = 0.084) or across time points (p = 0.760). The random effect of question identity was significant, indicating differences in difficulty among questions. Full temporal stability rates were calculated as 92.5% for ChatGPT, 95% for Copilot, 85% for Gemini, and 92.5% for Claude. Fleiss’ kappa coefficients ranged from 0.84 to 0.95, indicating a high level of agreement.
Conclusion
All four chatbots demonstrated high accuracy and high short-term consistency on intraoral maxillofacial prosthodontics questions. However, instances of repeated incorrect answers suggest that these tools should serve as complementary educational aid rather than replacements for expert judgment and current scientific evidence.

Keywords

Artificial Intelligence , Chatbots , Maxillofacial Prostheses , Reliability , Fleiss Kappa

References

Dholam KP, Bachher G, Gurav SV. Changes in the quality of life and acoustic speech parameters of patients in various stages of prosthetic rehabilitation with an obturator after maxillectomy. J Prosthet Dent. 2020; 123: 355-63.
Kalaignan SP, Ahmed SE. Oral health-related quality of life (OHRQoL) in patients with definitive maxillary obturator prostheses: a prospective study. J Adv Oral Res. 2021; 12: 1-8.
Buurman DJ, Speksnijder CM, de Groot RJ, et al. Mastication in maxillectomy patients: a comparison between reconstructed maxillae and implant-supported obturators: a cross‐sectional study. J Oral Rehabil. 2020; 47: 1171-7.
Prasad S. Maxillofacial prosthesis: a review of treatment concepts for better prosthesis prognosis. Bengal J Otolaryngol Head Neck Surg. 2017; 25: 95-9.
Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). 2023; 11: 887.
Li J, Dada A, Puladi B, et al. ChatGPT and healthcare: a taxonomy and systematic review. Comput Methods Programs Biomed. 2024; 241: 107720.
Umer F, Batool I, Naved N. Innovation and application of large language models (LLMs) in dentistry - a scoping review. BDJ Open. 2024; 10: 90.
Puleio F, Lo Giudice G, Bellocchio AM, et al. Clinical, research, and educational applications of ChatGPT in dentistry. Appl Sci (Basel). 2024; 14: 10802.
Esmailpour H, Rasaie V, Babaee Hemmati Y, et al. Performance of artificial intelligence chatbots in responding to the frequently asked questions of patients regarding dental prostheses. BMC Oral Health. 2025; 25: 574.
Freire Y, Laorden AS, Pérez JO, et al. ChatGPT performance in prosthodontics: assessment of accuracy and repeatability in answer generation. J Prosthet Dent. 2024; 131: 659.
Yay Kuşçu HY, Görüş Z. Performance of large language models on prosthodontics questions of the Dentistry Specialization Examination (DSE). Anatolian Curr Med J. 2025; 7: 893-99.
Eraslan R, Ayata M, Yagci F, et al. Exploring the potential of artificial intelligence chatbots in prosthodontics education. BMC Med Educ. 2025; 25: 321.
Gheisarifar M, Shembesh M, Koseoglu M, et al. Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients’ frequently asked questions in prosthodontics. J Prosthet Dent. 2025; 134: 199-206.
Prasad S, Koseoglu M, Antonopoulou S, et al. Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics. J Prosthet Dent. 2025; 26: S0022-3913(25)00737-1.
Özyemişci N, Bal BT, Güngör MB, et al. Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses. J Prosthet Dent. 2025; 8: S0022-3913(25)00684-5.
Özcivelek T, Özcan B. Comparative evaluation of responses from DeepSeek-R1, ChatGPT-o1, ChatGPT-4, and dental GPT chatbots to patient inquiries about dental and maxillofacial prostheses. BMC Oral Health. 2025; 25: 871.
Aradya A, Sravani K, Ravi MB, et al. Artificial intelligence for maxillofacial prosthodontics: a technological shift in craniofacial rehabilitation – a scoping review. J Oral Biol Craniofac Res. 2025; 15: 1749-66.
Liang EN, Pei S, Staibano P, et al. Clinical applications of large language models in medicine and surgery: a scoping review. J Int Med Res. 2025; 53: 3000605251347556.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33: 159-74.

İntraoral Maksillofasiyal Protez Sorularında Yapay Zeka Tabanlı Sohbet Robotlarının Doğruluk ve Tutarlılığının Değerlendirilmesi

Year 2025, Volume: 4 Issue: 3, 204 - 211, 30.12.2025

Mustafa Ayata , Haydar Albayrak

https://doi.org/10.62268/add.1835285

https://izlik.org/JA45LN32NU

Abstract

Amaç
Bu çalışmanın amacı, intraoral maksillofasiyal protezlerle ilgili çoktan seçmeli sorulara dört yapay zeka (YZ) tabanlı sohbet robotu tarafından verilen yanıtların doğruluğunu ve zaman içi tutarlılığını karşılaştırmalı olarak değerlendirmektir.
Gereç ve Yöntemler
Maksillektomi obturatörleri, palatofaringeal obturatör ve palatal lift protezleri, mandibular rehber flanş protezleri ve implant tutuculu obturatörler gibi konuları kapsayan, tek doğru şıklı 40 çoktan seçmeli soru hazırlandı. Sorular her bir sohbet robotuna aynı gün içinde sabah, öğle ve akşam olmak üzere 3 kez yöneltildi. Yanıtlar cevap anahtarı ile karşılaştırılarak her bir cevabın doğru veya yanlış olduğu kaydedildi ve doğruluk oranları hesaplandı. Doğruluk üzerine YZ modeli ve zamanın etkisini incelemek amacıyla genelleştirilmiş lineer karma model oluşturuldu. Zaman içi tutarlılık, aynı soruya 3 tekrarda da aynı şıkkın verilme oranı ve Fleiss Kappa katsayıları ile değerlendirildi.
Bulgular
Genel doğruluk oranları ChatGPT için %95, Claude için %92.5, Gemini için %88.3 ve Copilot için %88.3 olarak bulundu. Genelleştirilmiş lineer karma model analizinde sohbet robotları arasında (p = 0.084) ve zamanlar arasında da istatistiksel olarak anlamlı bir fark saptanmadı (p = 0.760). Soru kimliğinin rastgele etkisi anlamlıydı ve sorular arasında zorluk farkı bulundu. Zaman içi tam stabilite oranları ChatGPT için %92.5, Copilot için %95, Gemini için %85 ve Claude için %92.5 olarak hesaplandı. Fleiss Kappa katsayıları 0.84-0.95 aralığında olup yüksek derecede uyumluydu.
Sonuçlar
Dört sohbet robotu, intraoral maksillofasiyal protez sorularında yüksek doğruluk ve yüksek zaman içi tutarlılık göstermiştir. Bununla birlikte bazı sorularda tutarlı biçimde hatalı yanıtlar verilmesi, bu araçların uzman değerlendirmesi ve güncel literatürün yerini alamayacağını, ancak eğitim süreçlerinde tamamlayıcı bir kaynak olarak yararlı olabileceğini göstermektedir.

Keywords

Chatbot , Doğruluk , Fleiss Kappa , Maksillofasiyal Protezler , Yapay Zekâ

Ethical Statement

Bu çalışma için etik kurul onayı gerekli değildir.

Supporting Institution

Bu çalışma herhangi bir sponsor veya ticari bir kuruluş tarafından desteklenmemiştir.

References

Dholam KP, Bachher G, Gurav SV. Changes in the quality of life and acoustic speech parameters of patients in various stages of prosthetic rehabilitation with an obturator after maxillectomy. J Prosthet Dent. 2020; 123: 355-63.
Kalaignan SP, Ahmed SE. Oral health-related quality of life (OHRQoL) in patients with definitive maxillary obturator prostheses: a prospective study. J Adv Oral Res. 2021; 12: 1-8.
Buurman DJ, Speksnijder CM, de Groot RJ, et al. Mastication in maxillectomy patients: a comparison between reconstructed maxillae and implant-supported obturators: a cross‐sectional study. J Oral Rehabil. 2020; 47: 1171-7.
Prasad S. Maxillofacial prosthesis: a review of treatment concepts for better prosthesis prognosis. Bengal J Otolaryngol Head Neck Surg. 2017; 25: 95-9.
Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). 2023; 11: 887.
Li J, Dada A, Puladi B, et al. ChatGPT and healthcare: a taxonomy and systematic review. Comput Methods Programs Biomed. 2024; 241: 107720.
Umer F, Batool I, Naved N. Innovation and application of large language models (LLMs) in dentistry - a scoping review. BDJ Open. 2024; 10: 90.
Puleio F, Lo Giudice G, Bellocchio AM, et al. Clinical, research, and educational applications of ChatGPT in dentistry. Appl Sci (Basel). 2024; 14: 10802.
Esmailpour H, Rasaie V, Babaee Hemmati Y, et al. Performance of artificial intelligence chatbots in responding to the frequently asked questions of patients regarding dental prostheses. BMC Oral Health. 2025; 25: 574.
Freire Y, Laorden AS, Pérez JO, et al. ChatGPT performance in prosthodontics: assessment of accuracy and repeatability in answer generation. J Prosthet Dent. 2024; 131: 659.
Yay Kuşçu HY, Görüş Z. Performance of large language models on prosthodontics questions of the Dentistry Specialization Examination (DSE). Anatolian Curr Med J. 2025; 7: 893-99.
Eraslan R, Ayata M, Yagci F, et al. Exploring the potential of artificial intelligence chatbots in prosthodontics education. BMC Med Educ. 2025; 25: 321.
Gheisarifar M, Shembesh M, Koseoglu M, et al. Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients’ frequently asked questions in prosthodontics. J Prosthet Dent. 2025; 134: 199-206.
Prasad S, Koseoglu M, Antonopoulou S, et al. Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics. J Prosthet Dent. 2025; 26: S0022-3913(25)00737-1.
Özyemişci N, Bal BT, Güngör MB, et al. Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses. J Prosthet Dent. 2025; 8: S0022-3913(25)00684-5.
Özcivelek T, Özcan B. Comparative evaluation of responses from DeepSeek-R1, ChatGPT-o1, ChatGPT-4, and dental GPT chatbots to patient inquiries about dental and maxillofacial prostheses. BMC Oral Health. 2025; 25: 871.
Aradya A, Sravani K, Ravi MB, et al. Artificial intelligence for maxillofacial prosthodontics: a technological shift in craniofacial rehabilitation – a scoping review. J Oral Biol Craniofac Res. 2025; 15: 1749-66.
Liang EN, Pei S, Staibano P, et al. Clinical applications of large language models in medicine and surgery: a scoping review. J Int Med Res. 2025; 53: 3000605251347556.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33: 159-74.

There are 19 citations in total.

Details

Primary Language	Turkish
Subjects	Prosthodontics
Journal Section	Research Article
Authors	Mustafa Ayata 0000-0001-6102-9729 Haydar Albayrak 0000-0002-2833-1317
Submission Date	December 3, 2025
Acceptance Date	December 9, 2025
Publication Date	December 30, 2025
DOI	https://doi.org/10.62268/add.1835285
IZ	https://izlik.org/JA45LN32NU
Published in Issue	Year 2025 Volume: 4 Issue: 3

Cite

Vancouver	1.Mustafa Ayata, Haydar Albayrak. İntraoral Maksillofasiyal Protez Sorularında Yapay Zeka Tabanlı Sohbet Robotlarının Doğruluk ve Tutarlılığının Değerlendirilmesi. Akd Dent J. 2025 Dec. 1;4(3):204-11. doi:10.62268/add.1835285

Article Files

Full Text

Founded: 2022

Period: 3 Issues Per Year

Publisher: Akdeniz University