Research Article
BibTex RIS Cite

Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study

Year 2025, Volume: 23 Issue: 1, 11 - 17, 25.04.2025

Abstract

Objective: Artificial intelligence (AI), specifically ChatGPT, developed by Open AI provides human-like understanding and answers to a variety of domain questions and has the potential to transform medical education. However, its reliability in providing accurate clinical information is highly uncertain. This study is aimed at evaluating the accuracy and reliability of ChatGPT in answering multiple-choice questions (MCQs) and protocol-based questions in the field of medicine.
Methods: This cross-sectional study was conducted using mixed methods at MVJ Medical College and Research Hospital (April 2024), Hoskote, India, i.e. MCQs (n=228) and protocol-based questions (n=10) from all 19 MBBS Subjects from standard medical literature were used to test ChatGPT. Subject experts checked the responses for accuracy. Statistical analysis, by chi-square test, was performed using IBM SPSS Version 20.0 for Windows.
Results: The study findings stated that ChatGPT in easy and simple MCQs, had good accuracy, but its performance lowered with more complex questions, and overall answered about 57.02% of MCQs correctly. Protocol-based questions were given average scores, i.e. 6.35/10 for textbook accurate knowledge and 5.75/10 for real-life application.
Conclusion: ChatGPT shows potential as a tool for medical education, especially in recalling basic facts but, it should not be relied upon as a sole source of information, instead used in conjunction with traditional methods to ensure a comprehensive understanding of medical concepts.

References

  • McCarthy, J., Minsky, M.L., Rochester, N. and Shannon, C.E. 2006. A Proposal for the Dartmouth Summer Re-search Project on Artificial Intelligence, August 31, 1955. AI Magazine. 27, 4 (Dec. 2006), 12. DOI:https://doi.org/10.1609/aimag.v27i4.1904.
  • Chen J. Playing to our human strengths to prepare medical students for the future. Korean J Med Educ. 2017;29(3):193-197. doi:10.3946/kjme.2017.65
  • Meskó B, Hetényi G, Győrffy Z. Will artificial intelligence solve the human resource crisis in healthcare? BMC Health Serv Res. 2018;18(1):545. Published 2018 Jul 13. doi:10.1186/s12913-018-3359-4
  • OpenAI. ChatGPT [Internet]. OpenAI API; 2022
  • Savery M, Abacha AB, Gayen S, Demner-Fushman D. Question-driven summarization of answers to consumer health questions. Sci Data. 2020;7(1):322. Published 2020 Oct 2. doi:10.1038/s41597-020-00667-z
  • Gutiérrez BJ, McNeal N, Washington C, Chen Y, Li L, Sun H, et al. Thinking about GPT-3 in-context learning for biomedical IE? Think again. arXiv. Preprint posted online on November 5, 2022. [doi: 10.48550/arXiv.2203.08410]
  • Kolachalama, V. B., & Garg, P. S. (2018). Machine learning and medical education. NPJ digital medicine, 1(1), 54.
  • Zarei M, Mamaghani HE, Abbasi A, Hosseini M. Application of artificial intelligence in medical education: A review of benefits, challenges, and solutions. Medicina Clínica Práctica. doi:10.1016/j.mcpsp.2023.100422
  • Sun L, Yin C, Xu Q, Zhao W. Artificial intelligence for healthcare and medical education: a systematic re-view. Am J Transl Res. 2023;15(7):4820-4828. Published 2023 Jul 15
  • Jin, Di & Pan, Eileen & Oufattole, Nassim & Weng, Wei-Hung & Fang, Hanyi & Szolovits, Peter. (2021). What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Ex-ams. Applied Sciences. 11. 6421. 10.3390/app11146421.
  • Ha LA, Yaneva V. Automatic question answering for medical MCQs: can it go further than information retriev-al? In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). 2019 Presented at RANLP 2019; September 2-4, 2019; Varna, Bulgaria p. 418-422. [doi: 10.26615/978-954-452-056-4_049
  • Xu X, Chen Y, Miao J. Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review. J Educ Eval Health Prof [Internet]. 2024;21:6. [doi: 10.3352/jeehp.2024.21.6]
  • Mackey BP, Garabet R, Maule L, Tadesse A, Cross J, Weingarten M. Evaluating ChatGPT-4 in medical educa-tion: an assessment of subject exam performance reveals limitations in clinical curriculum support for stu-dents. Discov Artif Intell [Internet]. 2024;4(1). [doi: 10.1007/s44163-024-00135-2]
  • Harrison TR, Braunwald E. Harrison’s principles of internal medicine. 15th ed. New York, NY: McGraw-Hill; 2002
  • O’Connell PR, McCaskie AW, Sayers RD, editors. Bailey & love’s short practice of surgery - 28th edition. 28th ed. London, England: CRC Press; 2023.
  • Espinosa L, Salathé M. Use of large language models as a scalable approach to understanding public health discourse. PLOS Digit Health [Internet]. 2024;3(10):e0000631 [doi: 10.1371/journal.pdig.0000631]
Year 2025, Volume: 23 Issue: 1, 11 - 17, 25.04.2025

Abstract

References

  • McCarthy, J., Minsky, M.L., Rochester, N. and Shannon, C.E. 2006. A Proposal for the Dartmouth Summer Re-search Project on Artificial Intelligence, August 31, 1955. AI Magazine. 27, 4 (Dec. 2006), 12. DOI:https://doi.org/10.1609/aimag.v27i4.1904.
  • Chen J. Playing to our human strengths to prepare medical students for the future. Korean J Med Educ. 2017;29(3):193-197. doi:10.3946/kjme.2017.65
  • Meskó B, Hetényi G, Győrffy Z. Will artificial intelligence solve the human resource crisis in healthcare? BMC Health Serv Res. 2018;18(1):545. Published 2018 Jul 13. doi:10.1186/s12913-018-3359-4
  • OpenAI. ChatGPT [Internet]. OpenAI API; 2022
  • Savery M, Abacha AB, Gayen S, Demner-Fushman D. Question-driven summarization of answers to consumer health questions. Sci Data. 2020;7(1):322. Published 2020 Oct 2. doi:10.1038/s41597-020-00667-z
  • Gutiérrez BJ, McNeal N, Washington C, Chen Y, Li L, Sun H, et al. Thinking about GPT-3 in-context learning for biomedical IE? Think again. arXiv. Preprint posted online on November 5, 2022. [doi: 10.48550/arXiv.2203.08410]
  • Kolachalama, V. B., & Garg, P. S. (2018). Machine learning and medical education. NPJ digital medicine, 1(1), 54.
  • Zarei M, Mamaghani HE, Abbasi A, Hosseini M. Application of artificial intelligence in medical education: A review of benefits, challenges, and solutions. Medicina Clínica Práctica. doi:10.1016/j.mcpsp.2023.100422
  • Sun L, Yin C, Xu Q, Zhao W. Artificial intelligence for healthcare and medical education: a systematic re-view. Am J Transl Res. 2023;15(7):4820-4828. Published 2023 Jul 15
  • Jin, Di & Pan, Eileen & Oufattole, Nassim & Weng, Wei-Hung & Fang, Hanyi & Szolovits, Peter. (2021). What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Ex-ams. Applied Sciences. 11. 6421. 10.3390/app11146421.
  • Ha LA, Yaneva V. Automatic question answering for medical MCQs: can it go further than information retriev-al? In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). 2019 Presented at RANLP 2019; September 2-4, 2019; Varna, Bulgaria p. 418-422. [doi: 10.26615/978-954-452-056-4_049
  • Xu X, Chen Y, Miao J. Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review. J Educ Eval Health Prof [Internet]. 2024;21:6. [doi: 10.3352/jeehp.2024.21.6]
  • Mackey BP, Garabet R, Maule L, Tadesse A, Cross J, Weingarten M. Evaluating ChatGPT-4 in medical educa-tion: an assessment of subject exam performance reveals limitations in clinical curriculum support for stu-dents. Discov Artif Intell [Internet]. 2024;4(1). [doi: 10.1007/s44163-024-00135-2]
  • Harrison TR, Braunwald E. Harrison’s principles of internal medicine. 15th ed. New York, NY: McGraw-Hill; 2002
  • O’Connell PR, McCaskie AW, Sayers RD, editors. Bailey & love’s short practice of surgery - 28th edition. 28th ed. London, England: CRC Press; 2023.
  • Espinosa L, Salathé M. Use of large language models as a scalable approach to understanding public health discourse. PLOS Digit Health [Internet]. 2024;3(10):e0000631 [doi: 10.1371/journal.pdig.0000631]
There are 16 citations in total.

Details

Primary Language English
Subjects Health Services and Systems (Other)
Journal Section Original Research
Authors

A Ra Vishal 0009-0001-0497-529X

A S Harshitha 0009-0005-6896-1478

A V Sindhu 0009-0005-4477-415X

Abhivanth R 0009-0004-9689-5554

Pavithra Mb 0000-0002-5382-6653

Suwarna Madhukumar 0000-0003-3814-5224

Early Pub Date April 20, 2025
Publication Date April 25, 2025
Submission Date September 2, 2024
Acceptance Date March 29, 2025
Published in Issue Year 2025 Volume: 23 Issue: 1

Cite

APA Vishal, A. R., Harshitha, A. S., Sindhu, A. V., R, A., et al. (2025). Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. Turkish Journal of Public Health, 23(1), 11-17. https://doi.org/10.20518/tjph.1498611
AMA Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. April 2025;23(1):11-17. doi:10.20518/tjph.1498611
Chicago Vishal, A Ra, A S Harshitha, A V Sindhu, Abhivanth R, Pavithra Mb, and Suwarna Madhukumar. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health 23, no. 1 (April 2025): 11-17. https://doi.org/10.20518/tjph.1498611.
EndNote Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S (April 1, 2025) Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. Turkish Journal of Public Health 23 1 11–17.
IEEE A. R. Vishal, A. S. Harshitha, A. V. Sindhu, A. R, P. Mb, and S. Madhukumar, “Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study”, TJPH, vol. 23, no. 1, pp. 11–17, 2025, doi: 10.20518/tjph.1498611.
ISNAD Vishal, A Ra et al. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health 23/1 (April 2025), 11-17. https://doi.org/10.20518/tjph.1498611.
JAMA Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025;23:11–17.
MLA Vishal, A Ra et al. “Assessing ChatGPT’s Accuracy and Reliability in Medical Education: A Cross-Sectional Study”. Turkish Journal of Public Health, vol. 23, no. 1, 2025, pp. 11-17, doi:10.20518/tjph.1498611.
Vancouver Vishal AR, Harshitha AS, Sindhu AV, R A, Mb P, Madhukumar S. Assessing ChatGPT’s accuracy and reliability in medical education: a cross-sectional study. TJPH. 2025;23(1):11-7.

                     13955                      13956                         13959                        28911                34080              13958

  

       

TURKISH JOURNAL OF PUBLIC HEALTH - TURK J PUBLIC HEALTH. online-ISSN: 1304-1096 

Copyright holder Turkish Journal of Public Health. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.