Research Article

Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study

Volume: 23 Number: 1 April 25, 2025
EN

Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study

An Erratum to this article was published on August 9, 2025. https://dergipark.org.tr/en/pub/tjph/article/1759338

Abstract

Objective: Artificial intelligence (AI), specifically ChatGPT, developed by Open AI provides human-like understanding and answers to a variety of domain questions and has the potential to transform medical education. However, its reliability in providing accurate clinical information is highly uncertain. This study is aimed at evaluating the accuracy and reliability of ChatGPT in answering multiple-choice questions (MCQs) and protocol-based questions in the field of medicine. Methods: This cross-sectional study was conducted using mixed methods at MVJ Medical College and Research Hospital (April 2024), Hoskote, India, i.e. MCQs (n=228) and protocol-based questions (n=10) from all 19 MBBS Subjects from standard medical literature were used to test ChatGPT. Subject experts checked the responses for accuracy. Statistical analysis, by chi-square test, was performed using IBM SPSS Version 20.0 for Windows. Results: The study findings stated that ChatGPT in easy and simple MCQs, had good accuracy, but its performance lowered with more complex questions, and overall answered about 57.02% of MCQs correctly. Protocol-based questions were given average scores, i.e. 6.35/10 for textbook accurate knowledge and 5.75/10 for real-life application. Conclusion: ChatGPT shows potential as a tool for medical education, especially in recalling basic facts but, it should not be relied upon as a sole source of information, instead used in conjunction with traditional methods to ensure a comprehensive understanding of medical concepts.

Keywords

Application, ChatGPT, Knowledge, MCQS, Reliability

References

  1. McCarthy, J., Minsky, M.L., Rochester, N. and Shannon, C.E. 2006. A Proposal for the Dartmouth Summer Re-search Project on Artificial Intelligence, August 31, 1955. AI Magazine. 27, 4 (Dec. 2006), 12. DOI:https://doi.org/10.1609/aimag.v27i4.1904.
  2. Chen J. Playing to our human strengths to prepare medical students for the future. Korean J Med Educ. 2017;29(3):193-197. doi:10.3946/kjme.2017.65
  3. Meskó B, Hetényi G, Győrffy Z. Will artificial intelligence solve the human resource crisis in healthcare? BMC Health Serv Res. 2018;18(1):545. Published 2018 Jul 13. doi:10.1186/s12913-018-3359-4
  4. OpenAI. ChatGPT [Internet]. OpenAI API; 2022
  5. Savery M, Abacha AB, Gayen S, Demner-Fushman D. Question-driven summarization of answers to consumer health questions. Sci Data. 2020;7(1):322. Published 2020 Oct 2. doi:10.1038/s41597-020-00667-z
  6. Gutiérrez BJ, McNeal N, Washington C, Chen Y, Li L, Sun H, et al. Thinking about GPT-3 in-context learning for biomedical IE? Think again. arXiv. Preprint posted online on November 5, 2022. [doi: 10.48550/arXiv.2203.08410]
  7. Kolachalama, V. B., & Garg, P. S. (2018). Machine learning and medical education. NPJ digital medicine, 1(1), 54.
  8. Zarei M, Mamaghani HE, Abbasi A, Hosseini M. Application of artificial intelligence in medical education: A review of benefits, challenges, and solutions. Medicina Clínica Práctica. doi:10.1016/j.mcpsp.2023.100422
  9. Sun L, Yin C, Xu Q, Zhao W. Artificial intelligence for healthcare and medical education: a systematic re-view. Am J Transl Res. 2023;15(7):4820-4828. Published 2023 Jul 15
  10. Jin, Di & Pan, Eileen & Oufattole, Nassim & Weng, Wei-Hung & Fang, Hanyi & Szolovits, Peter. (2021). What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Ex-ams. Applied Sciences. 11. 6421. 10.3390/app11146421.
APA
Vishal, A. R., Harshitha, A. S., Sindhu, A. V., R, A., Mb, P., & Madhukumar, S. (2025). Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. Turkish Journal of Public Health, 23(1), 11-17. https://doi.org/10.20518/tjph.1498611
AMA
1.Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025;23(1):11-17. doi:10.20518/tjph.1498611
Chicago
Vishal, A Ra, A S Harshitha, A V Sindhu, Abhivanth R, Pavithra Mb, and Suwarna Madhukumar. 2025. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health 23 (1): 11-17. https://doi.org/10.20518/tjph.1498611.
EndNote
Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S (April 1, 2025) Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. Turkish Journal of Public Health 23 1 11–17.
IEEE
[1]A. R. Vishal, A. S. Harshitha, A. V. Sindhu, A. R, P. Mb, and S. Madhukumar, “Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study”, TJPH, vol. 23, no. 1, pp. 11–17, Apr. 2025, doi: 10.20518/tjph.1498611.
ISNAD
Vishal, A Ra - Harshitha, A S - Sindhu, A V - R, Abhivanth - Mb, Pavithra - Madhukumar, Suwarna. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health 23/1 (April 1, 2025): 11-17. https://doi.org/10.20518/tjph.1498611.
JAMA
1.Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025;23:11–17.
MLA
Vishal, A Ra, et al. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health, vol. 23, no. 1, Apr. 2025, pp. 11-17, doi:10.20518/tjph.1498611.
Vancouver
1.A Ra Vishal, A S Harshitha, A V Sindhu, Abhivanth R, Pavithra Mb, Suwarna Madhukumar. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025 Apr. 1;23(1):11-7. doi:10.20518/tjph.1498611