Sistematik Derlemeler ve Meta Analiz
BibTex RIS Kaynak Göster

Büyük Dil Modellerinin Eğitimde Soru Yanıtlama Sistemlerinde Kullanımı: Potansiyel, Zorluklar ve Gelecek Yönelimleri

Yıl 2025, Cilt: 1 Sayı: 1, 82 - 108, 31.03.2025

Öz

Bu makale, büyük dil modellerinin (LLM) eğitimde soru yanıtlama sistemlerindeki kullanımını, potansiyelini, zorluklarını ve gelecek yönelimlerini incelemektedir. Yapay zeka teknolojilerinin hızla gelişmesiyle birlikte, GPT, LLaMA ve Gemini gibi büyük dil modelleri, eğitim alanında öğrenme deneyimlerini dönüştürme potansiyeli taşımaktadır. Bu çalışmanın temel amacı, LLM'lerin eğitimdeki rolünü analiz etmek, bu modellerin avantajlarını ve sınırlamalarını ortaya koymak ve gelecekteki araştırma alanlarına ışık tutmaktır. Makale, LLM'lerin temel ilkelerini ve eğitimdeki uygulama alanlarını detaylandırarak başlamaktadır. Özellikle soru yanıtlama sistemlerinde LLM'lerin geleneksel yöntemlere kıyasla sağladığı verimlilik, kişiselleştirilmiş öğrenme desteği ve çok dilli erişim gibi avantajlar vurgulanmaktadır. Ayrıca, bu modellerin öğrencilere anında geri bildirim sağlama, karmaşık konuları özetleme ve dil becerilerini geliştirme gibi alanlardaki etkileri incelenmektedir. Çalışmada, LLM'lerin eğitime entegrasyonunun getirdiği zorluklar da ele alınmaktadır. Yanıtların doğruluğu, model yanlılıkları, mahremiyet endişeleri ve öğrenci motivasyonu üzerindeki etkiler gibi konular, bu teknolojilerin eğitimde sorumlu bir şekilde kullanılması için dikkate alınması gereken önemli hususlar olarak öne çıkmaktadır. Makale, mevcut uygulamaları ve pilot çalışmaları inceleyerek LLM'lerin eğitimdeki etkinliğini değerlendirmektedir. Ayrıca, gelecekte eğitim için özelleştirilmiş dil modelleri, etkileşimli soru yanıtlama sistemleri ve etik standartların geliştirilmesi gibi araştırma alanları önerilmektedir. Bu çalışma, büyük dil modellerinin eğitimdeki potansiyelini anlamak ve bu teknolojilerin daha etkili ve güvenilir bir şekilde kullanılmasına yönelik bir yol haritası sunmayı amaçlamaktadır.

Kaynakça

  • Abbas, A., Rehman, M. S., & Rehman, S. S. (2024). Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions. https://doi.org/10.7759/cureus.55991
  • Abd-Alrazaq, A., AlSaad, R., Alhuwail, D., Ahmed, A., Healy, P. M., Latifi, S., Aziz, S., Damseh, R., Alrazak, S. A., & Sheikh, J. (2023). Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions. JMIR Medical Education, 9. https://doi.org/10.2196/48291
  • Agarwal, M., Sharma, P., & Goswami, A. (2023). Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology. https://doi.org/10.7759/cureus.40977
  • Al-Ashwal, F. Y., Zawiah, M., Gharaibeh, L., Abu-Farha, R., & Bitar, A. N. (2023). Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools. Volume 15, 137–147. https://doi.org/10.2147/dhps.s425858
  • Al-Worafi, Y. M., Chooi, W. H., Tan, C. S., Lua, P. L., Farrukh, M. J., Zulkifly, H. H., & Ming, L. C. (2024). ChatGPT’s Success in the Board-Certified Pharmacotherapy Specialist (BCPS) Exam. Journal of Research in Pharmacy, 28(3), 674–678. https://doi.org/10.29228/jrp.729
  • Al Qurashi, A. A., Albalawi, I. A. S., Halawani, I. R., Asaad, A. H., Al Dwehji, A. M. O., Almusa, H. A., Alharbi, R. I., Alobaidi, H. A., Alarki, S. M. K. Z., & Aljindan, F. K. (2023). Can a Machine Ace the Test? Assessing GPT-4.0’s Precision in Plastic Surgery Board Examinations. 11(12), e5448. https://doi.org/10.1097/gox.0000000000005448
  • Alasker, A., Alsalamah, S., Alshathri, N., Almansour, N., Alsalamah, F., Alghafees, M., AlKhamees, M., & Alsaikhan, B. (2023). Performance of Large Language Models (LLMs) in Providing Prostate Cancer Information. https://doi.org/10.21203/rs.3.rs-3499451/v1
  • Ali, R., Tang, O. Y., Connolly, I. D., Fridley, J. S., Shin, J. H., Zadnik Sullivan, P. L., Cielo, D., Oyelese, A. A., Doberstein, C. E., Telfeian, A. E., Gokaslan, Z. L., & Asaad, W. F. (2023). Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank. 93(5), 1090–1098. https://doi.org/10.1227/neu.0000000000002551
  • Ali, R., Tang, O. Y., Connolly, I. D., Zadnik Sullivan, P. L., Shin, J. H., Fridley, J. S., Asaad, W. F., Cielo, D., Oyelese, A. A., Doberstein, C. E., Gokaslan, Z. L., & Telfeian, A. E. (2023). Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations. https://doi.org/10.1101/2023.03.25.23287743
  • Aljindan, F. K., Al Qurashi, A. A., Albalawi, I. A. S., Alanazi, A. M. M., Aljuhani, H. A. M., Falah Almutairi, F., Aldamigh, O. A., Halawani, I. R., & K. Zino Alarki, S. M. (2023). ChatGPT Conquers the Saudi Medical Licensing Exam: Exploring the Accuracy of Artificial Intelligence in Medical Knowledge Assessment and Implications for Modern Medical Education. Cureus. https://doi.org/10.7759/cureus.45043
  • Althnian, A., AlSaeed, D., Al-Baity, H., Samha, A., Dris, A. Bin, Alzakari, N., Abou Elwafa, A., & Kurdi, H. (2021). Impact of dataset size on classification performance: An empirical evaluation in the medical domain. Applied Sciences (Switzerland), 11(2), 1–18. https://doi.org/10.3390/app11020796
  • Artsi, Y., Sorin, V., Konen, E., Glicksberg, B. S., Nadkarni, G., & Klang, E. (2024). Large language models in simplifying radiological reports: systematic review. MedRxiv, 2024.01.05.24300884. http://medrxiv.org/content/early/2024/01/09/2024.01.05.24300884.abstract
  • B.T, B., & Chen, J.-M. (2024). Performance Assessment of ChatGPT Versus Bard in Detecting Alzheimer’s Dementia. 14(8), 817. https://doi.org/10.3390/diagnostics14080817
  • Beam, K., Sharma, P., Kumar, B., Wang, C., Brodsky, D., Martin, C. R., & Beam, A. (2023). Performance of a Large Language Model on Practice Questions for the Neonatal Board Examination. 177(9), 977. https://doi.org/10.1001/jamapediatrics.2023.2373
  • Bharatha, A., Ojeh, N., Fazle Rabbi, A. M., Campbell, M., Krishnamurthy, K., Layne-Yarde, R., Kumar, A., Springer, D., Connell, K., & Majumder, M. A. (2024). Comparing the Performance of ChatGPT-4 and Medical Students on McQs at Varied Levels of Bloom’s Taxonomy. Volume 15, 393–400. https://doi.org/10.2147/amep.s457408
  • Biri, S. K., Kumar, S., Panigrahi, M., Mondal, S., Behera, J. K., & Mondal, H. (2023). Assessing the Utilization of Large Language Models in Medical Education: Insights From Undergraduate Medical Students. Cureus. https://doi.org/10.7759/cureus.47468
  • Brin, D., Sorin, V., Konen, E., Nadkarni, G., Glicksberg, B. S., & Klang, E. (2023). How Large Language Models Perform on the United States Medical Licensing Examination: A Systematic Review. https://doi.org/10.1101/2023.09.03.23294842
  • Busch, F., Hoffmann, L., Rueger, C., van Dijk, E. H. C., Kader, R., Ortiz-Prado, E., Makowski, M. R., Saba, L., Hadamitzky, M., Kather, J. N., Truhn, D., Cuocolo, R., Adams, L. C., & Bressem, K. K. (2024). Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges. https://doi.org/10.1101/2024.03.04.24303733
  • Chen, C. J., Bilolikar, V. K., VanNest, D., Raphael, J., & Shaffer, G. (2024). Artificial Intelligence in Orthopaedic Education: A Comparative Analysis of ChatGPT and Bing AI’s Orthopaedic In‐Training Examination Performance. 2(3), 284–290. https://doi.org/10.1002/med4.77
  • Cowling, M., Crawford, J., Allen, K.-A., & Wehmeyer, M. (2023). Using Leadership to Leverage ChatGPT and Artificial Intelligence for Undergraduate and Postgraduate Research Supervision. 39(4), 89–103. https://doi.org/10.14742/ajet.8598
  • Davidian, M., Lahav, A., Joshua, B. Z., Wand, O., Lurie, Y., & Mark, S. (2024). Exploring the Interplay of Dataset Size and Imbalance on CNN Performance in Healthcare: Using X-rays to Identify COVID-19 Patients. Diagnostics, 14(16). https://doi.org/10.3390/diagnostics14161727
  • de Winter, J. C. F. (2023). Can ChatGPT Pass High School Exams on English Language Comprehension? 34(3), 915–930. https://doi.org/10.1007/s40593-023-00372-z
  • Deb Roy, A., Bharat Jaiswal, I., Nath Tiu, D., Das, D., Mondal, S., Behera, J. K., & Mondal, H. (2024). Assessing the Utilization of Large Language Model Chatbots for Educational Purposes by Medical Teachers: A Nationwide Survey From India. Cureus. https://doi.org/10.7759/cureus.73484
  • Dhanvijay, A. K. D., Pinjar, M. J., Dhokane, N., Sorte, S. R., Kumari, A., & Mondal, H. (2023). Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology. https://doi.org/10.7759/cureus.42972
  • Du, W., Jin, X., Harris, J. C., Brunetti, A., Johnson, E., Leung, O., Li, X., Walle, S., Yu, Q., Zhou, X., Bian, F., McKenzie, K., Kanathanavanich, M., Ozcelik, Y., El-Sharkawy, F., & Koga, S. (2024). Large Language Models in Pathology: A Comparative Study of ChatGPT and Bard With Pathology Trainees on Multiple-Choice Questions. https://doi.org/10.1101/2024.07.10.24310093
  • Fagbohun, O., Iduwe, N. P., Abdullahi, M., Ifaturoti, A., & Nwanna, O. M. (2024). Beyond Traditional Assessment: Exploring the Impact of Large Language Models on Grading Practices. Journal of Artificial Intelligence, Machine Learning and Data Science, 2(1), 1–8. https://doi.org/10.51219/jaimld/oluwole-fagbohun/19
  • Fang, C., Ling, J., Zhou, J., Wang, Y., Liu, X., Jiang, Y., Wu, Y., Chen, Y., Zhu, Z., Ma, J., Yan, Z., Yu, P., & Liu, X. (2023). How Does ChatGPT4 Preform on Non-English National Medical Licensing Examination? An Evaluation in Chinese Language. https://doi.org/10.1101/2023.05.03.23289443
  • Farhat, F., Chaudhry, B. M., Nadeem, M., Sohail, S. S., & Madsen, D. Ø. (2024). Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard. 10, e51523. https://doi.org/10.2196/51523
  • Fostier, J., Leemans, E., Meeussen, L., Wulleman, A., Van Doren, S., De Coninck, D., & Toelen, J. (2024). Dialogues With AI: Comparing ChatGPT, Bard, and Human Participants’ Responses in in-Depth Interviews on Adolescent Health Care. 2(1), 30–45. https://doi.org/10.3390/future2010003
  • Gandhi, A. P., Joesph, F. K., Rajagopal, V., Aparnavi, P., Katkuri, S., Dayama, S., Satapathy, P., Khatib, M. N., Gaidhane, S., Zahiruddin, Q. S., & Behera, A. (2024). Performance of ChatGPT on the India Undergraduate Community Medicine Examination: Cross-Sectional Study. 8, e49964. https://doi.org/10.2196/49964
  • Garabet, R., Mackey, B. P., Cross, M. B. A. J., & Weingarten, N. (2023). ChatGPT-4 Performance on USMLE Step 1 Questions and Its Implications for Medical Education: A Comparative Study Across Systems and Disciplines. https://doi.org/10.21203/rs.3.rs-3240108/v1
  • Geetha, S. D., Khan, A., Khan, A., Kannadath, B. S., & Vitkovski, T. (2023). Evaluation of ChatGPT’s Pathology Knowledge Using Board-Style Questions. https://doi.org/10.1101/2023.10.01.23296400
  • Ghanem, D., Covarrubias, O., Raad, M., LaPorte, D., & Shafiq, B. (2023). ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic in-Training Examination. 8(4). https://doi.org/10.2106/jbjs.oa.23.00103
  • Giannos, P., & Delardas, O. (2023). Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations. 9, e47737. https://doi.org/10.2196/47737
  • Gobira, M., Nakayama, L. F., Moreira, R., Andrade, E., Regatieri, C. V. S., & Belfort Jr., R. (2023). Performance of ChatGPT-4 in Answering Questions From the Brazilian National Examination for Medical Degree Revalidation. 69(10). https://doi.org/10.1590/1806-9282.20230848
  • Grewal, H., Dhillon, G., Monga, V., Sharma, P., Buddhavarapu, V. S., Sidhu, G., & Kashyap, R. (2023). Radiology Gets Chatty: The ChatGPT Saga Unfolds. https://doi.org/10.7759/cureus.40135
  • Guastafierro, V., Corbitt, D. N., Bressan, A., Fernandes, B., Mintemur, Ö., Magnoli, F., Ronchi, S., Rosa, S. La, Uccella, S., & Renne, S. L. (2024). Evaluation of ChatGPT’s Usefulness and Accuracy in Diagnostic Surgical Pathology. https://doi.org/10.1101/2024.03.12.24304153
  • Heston, T., & Khun, C. (2023). Prompt Engineering in Medical Education. 2(3), 198–205. https://doi.org/10.3390/ime2030019
  • Huang, H., Zheng, O., Wang, D., Yin, J., Wang, Z., Ding, S., Yin, H., Xu, C., Yang, R., Zheng, Q., & Shi, B. (2023). ChatGPT for Shaping the Future of Dentistry: The Potential of Multi-Modal Large Language Model. 15(1). https://doi.org/10.1038/s41368-023-00239-y
  • Iannantuono, G. M., Bracken-Clarke, D., Floudas, C. S., Roselli, M., Gulley, J. L., & Karzai, F. (2023). Applications of Large Language Models in Cancer Care: Current Evidence and Future Perspectives. 13. https://doi.org/10.3389/fonc.2023.1268915
  • Iannantuono, G. M., Bracken-Clarke, D., Karzai, F., Choo-Wosoba, H., Gulley, J. L., & Floudas, C. S. (2023). Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study. https://doi.org/10.1101/2023.10.31.23297825
  • Igarashi, Y., Nakahara, K., Norii, T., Miyake, N., Tagami, T., & Yokobori, S. (2024). Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations. 91(2), 155–161. https://doi.org/10.1272/jnms.jnms.2024_91-205
  • Jain, N., Gottlich, C., Fisher, J., Campano, D., & Winston, T. (2024). Assessing ChatGPT’s Orthopedic in-Service Training Exam Performance and Applicability in the Field. 19(1). https://doi.org/10.1186/s13018-023-04467-0
  • Keshtkar, A., Hayat, A.-A., Atighi, F., Ayare, N., Keshtkar, M., Yazdanpanahi, P., Sadeghi, E., Deilami, N., Reihani, H., Karimi, A., Mokhtari, H., & Hashempur, M. H. (2023). ChatGPT’s Performance on Iran’s Medical Licensing Exams. https://doi.org/10.21203/rs.3.rs-3253417/v1
  • Kim, S. E., Lee, J. H., Choi, B. S., Han, H.-S., Lee, M. C., & Ro, D. H. (2024). Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4. 16(4), 669. https://doi.org/10.4055/cios23179
  • Kılıç, M. E. (2023). AI in Medical Education: A Comparative Analysis of GPT-4 and GPT-3.5 on Turkish Medical Specialization Exam Performance. https://doi.org/10.1101/2023.07.12.23292564
  • Koga, S. (2023). Exploring the Pitfalls of Large Language Models: Inconsistency and Inaccuracy in Answering Pathology Board Examination‐style Questions. 73(12), 618–620. https://doi.org/10.1111/pin.13382
  • Kondo, T., Okamoto, M., & Kondo, Y. (2024). Pilot Study on Using Large Language Models for Educational Resource Development in Japanese Radiological Technologist Exams. https://doi.org/10.21203/rs.3.rs-4233784/v1
  • Kumari, A., Kumari, A., Singh, A., Singh, S. K., Juhi, A., Dhanvijay, A. K. D., Pinjar, M. J., & Mondal, H. (2023). Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing. https://doi.org/10.7759/cureus.43861
  • Kung, J. E., Marshall, C., Gauthier, C., Gonzalez, T. A., & Jackson, J. B. (2023). Evaluating ChatGPT Performance on the Orthopaedic in-Training Examination. 8(3). https://doi.org/10.2106/jbjs.oa.23.00056
  • Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., Leon, L. D., Elepaño, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted Medical Education Using Large Language Models. Plos Digital Health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198
  • Lai, U. H., Wu, K. S., Hsu, T.-Y., & Kan, J. K. C. (2023). Evaluating the Performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment. 10. https://doi.org/10.3389/fmed.2023.1240915
  • Li, Q., Tang, K., Li, S., Zhang, K., Li, Z., Chang, L., Li, W., Shen, B., Ding, J., & Min, X. (2023). Unleashing the Power of Language Models in Clinical Settings: A Trailblazing Evaluation Unveiling Novel Test Design. https://doi.org/10.1101/2023.07.11.23292512
  • Liu, J., Zheng, J., Cai, X., Wu, D., & Yin, C. (2023). A Descriptive Study Based on the Comparison of ChatGPT and Evidence-Based Neurosurgeons. 26(9), 107590. https://doi.org/10.1016/j.isci.2023.107590
  • Longwell, J. B., Hirsch, I., Binder, F., Gonzalez Conchas, G. A., Mau, D., Jang, R., Krishnan, R. G., & Grant, R. C. (2024). Performance of Large Language Models on Medical Oncology Examination Questions. 7(6), e2417641. https://doi.org/10.1001/jamanetworkopen.2024.17641
  • Lum, Z. C. (2023). Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT. 481(8), 1623–1630. https://doi.org/10.1097/corr.0000000000002704
  • Madrid-García, A., Rosales-Rosado, Z., Freites-Nuñez, D., Pérez-Sancristobal, I., Pato-Cour, E., Plasencia-Rodríguez, C., Cabeza-Osorio, L., León-Mateos, L., Abasolo-Alcázar, L., Fernández-Gutiérrez, B., & Rodríguez-Rodríguez, L. (2023). Harnessing ChatGPT and GPT-4 for Evaluating the Rheumatology Questions of the Spanish Access Exam to Specialized Medical Training. https://doi.org/10.1101/2023.07.21.23292821
  • Maitland, A., Fowkes, R., & Maitland, S. (2024). Can ChatGPT Pass the MRCP (UK) Written Examinations? Analysis of Performance and Errors Using a Clinical Decision-Reasoning Framework. 14(3), e080558. https://doi.org/10.1136/bmjopen-2023-080558
  • Mangul, S., Sarwal, V., Munteanu, V., Suhodolschi, T., Ciorba, D., Eskin, E., & Wang, W. (2024). BioBDMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics. https://doi.org/10.21203/rs.3.rs-3780193/v1
  • Mavrych, V., & Bolgova, O. (2023). Evaluating AI Performance in Answering Questions Related to Thoracic Anatomy. 10(1), 55–59. https://doi.org/10.15406/mojap.2023.10.00339
  • May, M., Körner-Riffard, K., Kollitsch, L., Burger, M., Brookman-May, S. D., Rauchenwald, M., Marszalek, M., & Eredics, K. (2024). Evaluating the Efficacy of AI Chatbots as Tutors in Urology: A Comparative Analysis of Responses to the 2022 In-Service Assessment of the European Board of Urology. Urologia Internationalis, 108(4), 359–366. https://doi.org/10.1159/000537854
  • Mediboina, A., Badam, R. K., & Chodavarapu, S. (2024). Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI. https://doi.org/10.7759/cureus.51544
  • Mistretta, S. (2023). The Singularity Is Emerging: Large Language Models and the Impact of Artificial Intelligence on Education. https://doi.org/10.5772/intechopen.1002650
  • Mohsen, M. (2024). Artificial Intelligence in Academic Translation: A Comparative Study of Large Language Models and Google Translate. 35(2), 134–156. https://doi.org/10.31470/2309-1797-2024-35-2-134-156
  • Nicikowski, J., Szczepański, M., Miedziaszczyk, M., & Kudliński, B. (2024). The Potential of ChatGPT in Medicine: An Example Analysis of Nephrology Specialty Exams in Poland. 17(8). https://doi.org/10.1093/ckj/sfae193
  • Noda, R., Izaki, Y., Kitano, F., Komatsu, J., Ichikawa, D., & Shibagaki, Y. (2023). Performance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal. https://doi.org/10.1101/2023.06.06.23291070
  • Oh, N., Choi, G. S., & Lee, W. Y. (2023). ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models. Annals of Surgical Treatment and Research, 104(5), 269–273. https://doi.org/10.4174/astr.2023.104.5.269
  • Ohta, K., & Ohta, S. (2023). The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study. https://doi.org/10.7759/cureus.50369
  • Onder, C. E., Koc, G., Gokbulut, P., Taskaldiran, I., & Kuskonmaz, S. M. (2024). Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy. Scientific Reports, 14(1), 243. https://doi.org/10.1038/s41598-023-50884-w
  • Pan, W., & Yang, Y. (2024). Integrating LLMs and Software-Defined Resources for Enhanced Demonstrative Cloud Computing Education in University Curricula. https://doi.org/10.31219/osf.io/c7kgy
  • Park, J. (2023). Medical Students’ Patterns of Using ChatGPT as a Feedback Tool and Perceptions of ChatGPT in a Leadership and Communication Course in Korea: A Cross-Sectional Study. 20, 29. https://doi.org/10.3352/jeehp.2023.20.29
  • Park, Y. J., Pillai, A., Deng, J., Guo, E., Gupta, M., Paget, M., & Naugler, C. (2024). Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Medical Informatics and Decision Making, 24(1), 1–14. https://doi.org/10.1186/s12911-024-02459-6
  • Pelletier, E. D., Jeffries, S. D., Song, K., & Hemmerling, T. M. (2024). Comparative Analysis of Machine-Learning Model Performance in Image Analysis: The Impact of Dataset Diversity and Size. Anesthesia & Analgesia, 139(6), 1332–1339. https://doi.org/10.1213/ANE.0000000000007088
  • Perkins, M. (2023). Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond. Journal of University Teaching and Learning Practice, 20(2). https://doi.org/10.53761/1.20.02.07
  • Preiksaitis, C., & Rose, C. (2023). Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review. 9, e48785. https://doi.org/10.2196/48785
  • Quttainah, M., Mishra, V., Madakam, S., Lurie, Y., & Mark, S. (2024). Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative Study. JMIR AI, 3, e51834. https://doi.org/10.2196/51834
  • Raimondi, R., Tzoumas, N., Salisbury, T., Di Simplicio, S., Romano, M. R., None, N., Bommireddy, T., Chawla, H., Chen, Y., Connolly, S., El Omda, S., Gough, M., Kishikova, L., McNally, T., Sadiq, S. N., Simpson, S., Teh, B. L., Toh, S., Vohra, V., & Al-Zubaidy, M. (2023). Comparative Analysis of Large Language Models in the Royal College of Ophthalmologists Fellowship Exams. 37(17), 3530–3533. https://doi.org/10.1038/s41433-023-02563-3
  • Rasul, T., Nair, S., Kalendra, D., Robin, M., Santini, F. de O., Ladeira, W. J., Sun, M., Day, I., Rather, R. A., & Heathcote, L. (2023). The role of ChatGPT in higher education: Benefits, challenges, and future research directions. Journal of Applied Learning and Teaching, 6(1), 41–56. https://doi.org/10.37074/jalt.2023.6.1.29
  • Rodrigues Alessi, M., Gomes, H. A., Lopes de Castro, M., & Terumy Okamoto, C. (2024). Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice. https://doi.org/10.7759/cureus.64924
  • Rosoł, M., Gąsior, J. S., Łaba, J., Korzeniewski, K., & Młyńczak, M. (2023). Evaluation of the Performance of GPT-3.5 and GPT-4 on the Medical Final Examination. https://doi.org/10.1101/2023.06.04.23290939
  • Rossettini, G., Rodeghiero, L., Corradi, F., Cook, C., Pillastrini, P., Turolla, A., Castellini, G., Chiappinotto, S., Gianola, S., & Palese, A. (2024). Comparative Accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian Entrance Test for Healthcare Sciences Degrees: A Cross-Sectional Study. 24(1). https://doi.org/10.1186/s12909-024-05630-9
  • Sallam, M. (2023a). ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare, 11(6), 887. https://doi.org/10.3390/healthcare11060887
  • Sallam, M. (2023b). The Utility of ChatGPT as an Example of Large Language Models in Healthcare Education, Research and Practice: Systematic Review on the Future Perspectives and Potential Limitations. https://doi.org/10.1101/2023.02.19.23286155
  • Sallam, M., & Al-Salahat, K. (2023). Below Average ChatGPT Performance in Medical Microbiology Exam Compared to University Students. 8. https://doi.org/10.3389/feduc.2023.1333415
  • Sallam, M., Al-Salahat, K., Eid, H., Egger, J., & Puladi, B. (2024). Human Versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5, and Humans in Clinical Chemistry Multiple-Choice Questions. https://doi.org/10.21203/rs.3.rs-3880412/v1
  • Samala, A. D., Zhai, X., Aoki, K., Bojic, L., & Zikic, S. (2024). An in-Depth Review of ChatGPT’s Pros and Cons for Learning and Teaching in Education. 18(02), 96–117. https://doi.org/10.3991/ijim.v18i02.46509
  • Sawamura, S., Bito, T., Ando, T., Masuda, K., Kameyama, S., & Ishida, H. (2024). Evaluation of the Accuracy of ChatGPT’s Responses to and References for Clinical Questions in Physical Therapy. 36(5), 234–239. https://doi.org/10.1589/jpts.36.234
  • Schubert, M. C., Wick, W., & Venkataramani, V. (2023a). Evaluating the Performance of Large Language Models on a Neurology Board-Style Examination. https://doi.org/10.1101/2023.07.13.23292598
  • Schubert, M. C., Wick, W., & Venkataramani, V. (2023b). Performance of Large Language Models on a Neurology Board–Style Examination. 6(12), e2346721. https://doi.org/10.1001/jamanetworkopen.2023.46721
  • Şensoy, E., & Çıtırık, M. (2024). Exploring the competence of artificial intelligence programs in the field of oculofacial plastic and orbital surgery. Ankyra Medical Journal, 3(3), 63–65. https://doi.org/10.51271/ANKMJ-0014
  • Shahab, O., El Kurdi, B., Shaukat, A., Nadkarni, G., & Soroush, A. (2024). Large language models: a primer and gastroenterology applications. Therapeutic Advances in Gastroenterology, 17. https://doi.org/10.1177/17562848241227031
  • Shin, E., Yu, Y., Bies, R. R., & Ramanathan, M. (2024). Evaluation of ChatGPT and Gemini Large Language Models for Pharmacometrics With NONMEM. https://doi.org/10.21203/rs.3.rs-4189234/v1
  • Song, H., Xia, Y., Luo, Z., Liu, H., Song, Y., Zeng, X., Li, T., Zhong, G., Li, J., Chen, M., Zhang, G., & Xiao, B. (2023). Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis. 47(1). https://doi.org/10.1007/s10916-023-02021-3
  • Sorin, V., Klang, E., Sobeh, T., Konen, E., Shrot, S., Livne, A., Weissbuch, Y., Hoffmann, C., & Barash, Y. (2024). Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology. Quantitative Imaging in Medicine and Surgery, 14(10), 7551–7560. https://doi.org/10.21037/qims-24-200
  • Sosnowski, W., Wroblewska, A., & Gawrysiak, P. (2022). Applying SoftTriple Loss for Supervised Language Model Fine Tuning. Proceedings of the 17th Conference on Computer Science and Intelligence Systems, FedCSIS 2022, 141–147. https://doi.org/10.15439/2022F185
  • Taira, K., Itaya, T., & Hanada, A. (2023). Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study (Preprint). https://doi.org/10.2196/preprints.47305
  • Takagi, S., Watari, T., Erabi, A., & Sakaguchi, K. (2023). Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study (Preprint). https://doi.org/10.2196/preprints.48002
  • Tasar, D. E., & Öcal Taşar, C. (2023). Bridging History With AI: A Comparative Evaluation of GPT- 3.5, GPT-4, and Google-Bard in Predictive Accuracy and Fact- Checking. https://doi.org/10.20944/preprints202305.1047.v1
  • Tepe, M., & Emekli, E. (2024). Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. https://doi.org/10.7759/cureus.59960
  • Tong, W., Guan, Y., Chen, J., Huang, X., Zhong, Y., Zhang, C., & Zhang, H. (2023). Artificial Intelligence in Global Health Equity: An Evaluation and Discussion on the Application of ChatGPT, in the Chinese National Medical Licensing Examination. 10. https://doi.org/10.3389/fmed.2023.1237432
  • Toyama, Y., Harigai, A., Abe, M., Nagano, M., Kawabata, M., Seki, Y., & Takase, K. (2023). Performance Evaluation of ChatGPT, GPT-4, and Bard on the Official Board Examination of the Japan Radiology Society. 42(2), 201–207. https://doi.org/10.1007/s11604-023-01491-2
  • van Nuland, M., Lobbezoo, A.-F. H., van de Garde, E. M. W., Herbrink, M., van Heijl, I., Bognàr, T., Houwen, J. P. A., Dekens, M., Wannet, D., Egberts, T., & van der Linden, P. D. (2024). Assessing Accuracy of ChatGPT in Response to Questions From Day to Day Pharmaceutical Care in Hospitals. 15, 100464. https://doi.org/10.1016/j.rcsop.2024.100464
  • Wang, X., & Reynolds, B. L. (2024). Beyond the Books: Exploring Factors Shaping Chinese English Learners’ Engagement with Large Language Models for Vocabulary Learning. Education Sciences, 14(5). https://doi.org/10.3390/educsci14050496
  • Williams, C. Y. K., Zack, T., Miao, B. Y., Sushil, M., Wang, M., Kornblith, A. E., & Butte, A. J. (2024). Use of a Large Language Model to Assess Clinical Acuity of Adults in the Emergency Department. 7(5), e248895. https://doi.org/10.1001/jamanetworkopen.2024.8895
  • Xu, X., Chen, Y., & Miao, J. (2024). Opportunities, Challenges, and Future Directions of Large Language Models, Including ChatGPT in Medical Education: A Systematic Scoping Review. 21, 6. https://doi.org/10.3352/jeehp.2024.21.6
  • Yan, L., Sha, L., Zhao, L., Li, Y., Martinez-Maldonado, R., Chen, G., Li, X., Jin, Y., & Gašević, D. (2024). Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology, 55(1), 90–112. https://doi.org/10.1111/bjet.13370
  • Yang, Z., Yao, Z., Tasmin, M., Vashisht, P., Jang, W. S., Ouyang, F., Wang, B., Berlowitz, D., & Yu, H. (2023). Performance of Multimodal GPT-4V on USMLE With Image: Potential for Imaging Diagnostic Support With Explanations. https://doi.org/10.1101/2023.10.26.23297629
  • Yousif M. Mahmood, N., Rebaz O. Mohammed, N., Imad J. Habibullah, N., Hawbash M. Rahim, N., & Abdulwahid M. Salih, N. (2024). Comparing ChatGPT and Google Bard: Assessing AI-Powered Information Retrieval in Nursing. https://doi.org/10.58742/hsn32c73
  • Yu, P., Fang, C., Liu, X., Fu, W., Ling, J., Yan, Z., Jiang, Y., Cao, Z., Wu, M., Chen, Z., Zhu, W., Zhang, Y., Abudukeremu, A., Wang, Y., Liu, X., & Wang, J. (2024). Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study. 10, e48514. https://doi.org/10.2196/48514
  • Zhu, L., Mou, W., & Chen, R. (2023). Can the ChatGPT and Other Large Language Models With Internet-Connected Database Solve the Questions and Concerns of Patient With Prostate Cancer? https://doi.org/10.1101/2023.03.06.23286827
  • Zhu, Y., Brettin, T., Evrard, Y. A., Partin, A., Xia, F., Shukla, M., Yoo, H., Doroshow, J. H., & Stevens, R. L. (2020). Ensemble transfer learning for the prediction of anti-cancer drug response. Scientific Reports, 10(1), 1–11. https://doi.org/10.1038/s41598-020-74921-0
  • Zhui, L., Fenghe, L., Xuehu, W., Qining, F., & Wei, R. (2024). Ethical Considerations and Fundamental Principles of Large Language Models in Medical Education: Viewpoint. Journal of Medical Internet Research, 26, e60083. https://doi.org/10.2196/60083
  • Zong, H., Li, J., Wu, E., Wu, R., Lu, J., & Shen, B. (2024). Performance of ChatGPT on Chinese National Medical Licensing Examinations: A Five-Year Examination Evaluation Study for Physicians, Pharmacists and Nurses. 24(1). https://doi.org/10.1186/s12909-024-05125-7
Toplam 113 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgi İşlem Eğitimi
Bölüm Derlemeler
Yazarlar

Pınar Çetin Bu kişi benim 0000-0002-6480-4986

Aytuğ Onan 0000-0002-9434-5880

Yayımlanma Tarihi 31 Mart 2025
Gönderilme Tarihi 1 Mart 2025
Kabul Tarihi 30 Mart 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 1 Sayı: 1

Kaynak Göster

APA Çetin, P., & Onan, A. (2025). Büyük Dil Modellerinin Eğitimde Soru Yanıtlama Sistemlerinde Kullanımı: Potansiyel, Zorluklar ve Gelecek Yönelimleri. Fen, Matematik Ve Bilgisayar Eğitiminde Yenilikler Dergisi, 1(1), 82-108.



Bu eser Creative Commons Atıf-Gayri Ticari 4.0 Uluslararası Lisansı ile lisanslanmıştır.


This work is licensed under a Creative Commons Attribution-Non Commercial 4.0 International License.