Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study

A Ra Vishal; A S Harshitha; A V Sindhu; Abhivanth R; Pavithra Mb; Suwarna Madhukumar

doi:10.20518/tjph.1498611

EN

Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study

An Erratum to this article was published on August 9, 2025. https://dergipark.org.tr/en/pub/tjph/article/1759338

Abstract

Objective: Artificial intelligence (AI), specifically ChatGPT, developed by Open AI provides human-like understanding and answers to a variety of domain questions and has the potential to transform medical education. However, its reliability in providing accurate clinical information is highly uncertain. This study is aimed at evaluating the accuracy and reliability of ChatGPT in answering multiple-choice questions (MCQs) and protocol-based questions in the field of medicine. Methods: This cross-sectional study was conducted using mixed methods at MVJ Medical College and Research Hospital (April 2024), Hoskote, India, i.e. MCQs (n=228) and protocol-based questions (n=10) from all 19 MBBS Subjects from standard medical literature were used to test ChatGPT. Subject experts checked the responses for accuracy. Statistical analysis, by chi-square test, was performed using IBM SPSS Version 20.0 for Windows. Results: The study findings stated that ChatGPT in easy and simple MCQs, had good accuracy, but its performance lowered with more complex questions, and overall answered about 57.02% of MCQs correctly. Protocol-based questions were given average scores, i.e. 6.35/10 for textbook accurate knowledge and 5.75/10 for real-life application. Conclusion: ChatGPT shows potential as a tool for medical education, especially in recalling basic facts but, it should not be relied upon as a sole source of information, instead used in conjunction with traditional methods to ensure a comprehensive understanding of medical concepts.

Keywords

References

McCarthy, J., Minsky, M.L., Rochester, N. and Shannon, C.E. 2006. A Proposal for the Dartmouth Summer Re-search Project on Artificial Intelligence, August 31, 1955. AI Magazine. 27, 4 (Dec. 2006), 12. DOI:https://doi.org/10.1609/aimag.v27i4.1904.
Chen J. Playing to our human strengths to prepare medical students for the future. Korean J Med Educ. 2017;29(3):193-197. doi:10.3946/kjme.2017.65
Meskó B, Hetényi G, Győrffy Z. Will artificial intelligence solve the human resource crisis in healthcare? BMC Health Serv Res. 2018;18(1):545. Published 2018 Jul 13. doi:10.1186/s12913-018-3359-4
OpenAI. ChatGPT [Internet]. OpenAI API; 2022
Savery M, Abacha AB, Gayen S, Demner-Fushman D. Question-driven summarization of answers to consumer health questions. Sci Data. 2020;7(1):322. Published 2020 Oct 2. doi:10.1038/s41597-020-00667-z
Gutiérrez BJ, McNeal N, Washington C, Chen Y, Li L, Sun H, et al. Thinking about GPT-3 in-context learning for biomedical IE? Think again. arXiv. Preprint posted online on November 5, 2022. [doi: 10.48550/arXiv.2203.08410]
Kolachalama, V. B., & Garg, P. S. (2018). Machine learning and medical education. NPJ digital medicine, 1(1), 54.
Zarei M, Mamaghani HE, Abbasi A, Hosseini M. Application of artificial intelligence in medical education: A review of benefits, challenges, and solutions. Medicina Clínica Práctica. doi:10.1016/j.mcpsp.2023.100422

Sun L, Yin C, Xu Q, Zhao W. Artificial intelligence for healthcare and medical education: a systematic re-view. Am J Transl Res. 2023;15(7):4820-4828. Published 2023 Jul 15
Jin, Di & Pan, Eileen & Oufattole, Nassim & Weng, Wei-Hung & Fang, Hanyi & Szolovits, Peter. (2021). What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Ex-ams. Applied Sciences. 11. 6421. 10.3390/app11146421.
Ha LA, Yaneva V. Automatic question answering for medical MCQs: can it go further than information retriev-al? In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). 2019 Presented at RANLP 2019; September 2-4, 2019; Varna, Bulgaria p. 418-422. [doi: 10.26615/978-954-452-056-4_049
Xu X, Chen Y, Miao J. Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review. J Educ Eval Health Prof [Internet]. 2024;21:6. [doi: 10.3352/jeehp.2024.21.6]
Mackey BP, Garabet R, Maule L, Tadesse A, Cross J, Weingarten M. Evaluating ChatGPT-4 in medical educa-tion: an assessment of subject exam performance reveals limitations in clinical curriculum support for stu-dents. Discov Artif Intell [Internet]. 2024;4(1). [doi: 10.1007/s44163-024-00135-2]
Harrison TR, Braunwald E. Harrison’s principles of internal medicine. 15th ed. New York, NY: McGraw-Hill; 2002
O’Connell PR, McCaskie AW, Sayers RD, editors. Bailey & love’s short practice of surgery - 28th edition. 28th ed. London, England: CRC Press; 2023.
Espinosa L, Salathé M. Use of large language models as a scalable approach to understanding public health discourse. PLOS Digit Health [Internet]. 2024;3(10):e0000631 [doi: 10.1371/journal.pdig.0000631]

Details

Primary Language

English

Subjects

Health Services and Systems (Other)

Journal Section

Research Article

Authors

A Ra Vishal
0009-0001-0497-529X
India

A S Harshitha
0009-0005-6896-1478
India

A V Sindhu
0009-0005-4477-415X
India

Abhivanth R
0009-0004-9689-5554
India

Pavithra Mb ^*
0000-0002-5382-6653
India

Suwarna Madhukumar
0000-0003-3814-5224
India

Early Pub Date

April 20, 2025

Publication Date

April 25, 2025

Submission Date

September 2, 2024

Acceptance Date

March 29, 2025

Published in Issue

Year 2025 Volume: 23 Number: 1

DOI

https://doi.org/10.20518/tjph.1498611

IZ

https://izlik.org/JA48DW26CU

Cite

RIS / Bibtex

APA

Vishal, A. R., Harshitha, A. S., Sindhu, A. V., R, A., Mb, P., & Madhukumar, S. (2025). Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. Turkish Journal of Public Health, 23(1), 11-17. https://doi.org/10.20518/tjph.1498611

AMA

1.Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025;23(1):11-17. doi:10.20518/tjph.1498611

Chicago

Vishal, A Ra, A S Harshitha, A V Sindhu, Abhivanth R, Pavithra Mb, and Suwarna Madhukumar. 2025. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health 23 (1): 11-17. https://doi.org/10.20518/tjph.1498611.

EndNote

Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S (April 1, 2025) Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. Turkish Journal of Public Health 23 1 11–17.

IEEE

[1]A. R. Vishal, A. S. Harshitha, A. V. Sindhu, A. R, P. Mb, and S. Madhukumar, “Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study”, TJPH, vol. 23, no. 1, pp. 11–17, Apr. 2025, doi: 10.20518/tjph.1498611.

ISNAD

Vishal, A Ra - Harshitha, A S - Sindhu, A V - R, Abhivanth - Mb, Pavithra - Madhukumar, Suwarna. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health 23/1 (April 1, 2025): 11-17. https://doi.org/10.20518/tjph.1498611.

JAMA

1.Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025;23:11–17.

MLA

Vishal, A Ra, et al. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health, vol. 23, no. 1, Apr. 2025, pp. 11-17, doi:10.20518/tjph.1498611.

Vancouver

1.A Ra Vishal, A S Harshitha, A V Sindhu, Abhivanth R, Pavithra Mb, Suwarna Madhukumar. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025 Apr. 1;23(1):11-7. doi:10.20518/tjph.1498611