Yapay Zekâ Destekli Hazırlanan 8. Sınıf Eşitsizlik Matematik Soruları ile LGS Çıkmış Sorularının Psikometrik Özelliklerinin Karşılaştırılması

Hatice Dökmeci; Şeyma Uyar

Research Article

Comparison of the Psychometric Properties of 8th Grade Inequality Math Questions Prepared with Artificial Intelligence Support and Past LGS Questions

Year 2025, Volume: 7 Issue: Özel Sayı, 185 - 210, 29.11.2025

Abstract

The study examined the functionality of AI-supported automatic item generation in measurement and evaluation processes in education. Questions on the topic of “inequalities” for 8th grade mathematics, generated using the Python programming language through the Claude AI model, were psychometrically compared with questions previously administered by the Ministry of National Education as part of the High School Transition System (LGS). The study group consisted of 500 eighth-grade students attending four public middle schools in Uşak province. The AI questions, whose content validity was ensured by expert opinions, and the past LGS questions were administered to the students as a 18-item test form with a response time of 40 minutes. The data obtained were analyzed using TAP and JAMOVI. The findings revealed that the questions generated by artificial intelligence showed similar characteristics to LGS questions in terms of difficulty, discriminative power, and reliability levels. In conclusion, it was determined that AI-supported automatic item generation provides advantages in terms of time and cost in measurement and evaluation processes, but it must be supported by expert evaluations in terms of validity and pedagogical appropriateness.

Keywords

Artificial Intelligence , Automated Item Generation , Item Difficulty , Item Discrimination , LGS

References

Adiguzel, T., Kaya, M. H., & Cansu, F. K. (2023). Revolutionizing education with AI: Exploring the transformative potential of ChatGPT. Contemporary Educational Technology, 15(3). https://doi.org/10.30935/cedtech/13152 Agarwal, M., Sharma, P., & Wani, P. (2025). Evaluating the accuracy and reliability of large language models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in answering item-analyzed multiple-choice questions on blood physiology. Cureus 17(4): e81871. https://doi.org/10.7759/cureus.81871
Antropic, (2024). Claude pro. https://www.anthropic.com/news/claude-pro
Ayre, C., & Scally, A. J. (2014). Critical values for Lawshe’s content validity ratio: Revisiting the original methods of calculation. Measurement and Evaluation in Counseling and Development, 47(1), 79–86. https://doi.org/10.1177/0748175613513808
Baykul, Y. (2010). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Pegem Akademi.
Bhandari, S., Liu, Y., & Pardos, Z. A. (2023). Evaluating ChatGPT-generated textbook questions using IRT. NeurIPS 2023 Workshop on Generative AI for Education (GAIED), University of California, Berkeley. https://doi.org/10.1109/TLT.2022.3224232
Bozkurt, A. (2023). Chatgpt, üretken yapay zeka ve algoritmik paradigma değişikliği. Alanyazın, 4(1), 63-72. https://doi.org/10.59320/alanyazin.1283282
Boztepe, C. (2025). Eğitimde yapay zekâ uygulamaları: Fırsatlar, sınırlılıklar ve etik tartışmalar [Review article]. Dumlupınar Üniversitesi Eğitim Bilimleri Enstitüsü Dergisi, 9(1). https://doi.org/10.71272/debder.1706141
Bulut, G., & Akyıldız, M. (2024). Yapay zekâ ile üretilen soruların ve madde parametrelerinin MST test koşullarında karşılaştırılması. Dijital Teknolojiler ve Eğitim Dergisi, 3(1), 1-12. https://doi.org/10.5281/zenodo.12637347
Bulut, O., & Yıldırım-Erbaşlı, S. N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9(Special Issue), 72-87. https://doi.org/10.21449/ijate.1124382
Büyüköztürk, Ş. (2016). Sınavlar üzerine düşünceler. Kalem Eğitim ve İnsan Bilimleri Dergisi, 6 (2), 345 – 356
Büyüköztürk, Ş. (2018). Sosyal bilimler için veri analizi el kitabı: İstatistik, araştırma deseni, SPSS uygulamaları ve yorum (24. baskı). Pegem Akademi.
Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2020). Bilimsel araştırma yöntemleri (27. baskı). Pegem Akademi.
Büyükuslu, A. R. (2020). Koronavirüs sonrası yenidünya düzeni ekonomi-devlet-yapay zekâ. Der Yayınları.
Can, A. (2014). SPSS ile bilimsel araştırma sürecinde nicel veri analizi (2. Baskı). Pegem A Yayıncılık.
Chalasani, S. H., Syed, J., Ramesh, M., Patil, V., & Kumar, T. P. (2023). Artificial intelligence in the field of pharmacy practice: A literature review. Exploratory research in clinical and social pharmacy, 12, 100346. https://doi.org/10.1016/j.rcsop.2023.100346
Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. IEEE Access, 8, 75264-75278. https://doi.org/10.1109/ACCESS.2020.2988510.
Chen, B., Zilles, C., West, M., & Bretl, T. (2019). Effect of discrete and continuous parameter variation on difficulty in automatic item generation. Artificial Intelligence in Education. 20th International Conference, AIED 2019, Chicago, IL, USA, June 25-29, Proceedings, Part I 20,
Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). SAGE Publications.
Crocker, L., & Algina, J. (2006). Introduction to classical and modern test theory. Cengage Learning.
Çavuş, M. N. (2024). Eğitimde yapay zekâ tabanlı ölçme ve değerlendirme üzerine bir derleme. Uluslararası Özel Amaçlar için İngilizce Dergisi, 2(1), 39-54. https://dergipark.org.tr/en/pub/joinesp/issue/85082/1492424
Çetin, M. ve Baklavacı, G. Y. (2024). Endüstri̇ 4.0 perspekti̇fi̇nde yapay zekanin eği̇ti̇mde uygulanabi̇li̇rli̇ği̇ i̇le i̇lgi̇li̇ öğretmen görüşleri̇ni̇n i̇ncelenmesi̇. İstanbul Ticaret Üniversitesi Girişimcilik Dergisi, 7(14), 1-21. https://doi.org/10.55830/tje.1404165
Derici, S. (2025). Karar vermede yapay zeka tabanlı derin öğrenme ve makine öğrenmesi algoritmaları: Yapay zeka ile bütünleşme. E. Atalay ve S. Derici (Ed.) Yapay Zekâ: Dijital Çağın Anahtarı içinde, s. 83-97. Akademisyen Kitapevi
Doğan, S., & Oktay, Y. (2022). Liselere geçiş sınavı (LGS) hazırlık sürecinin değerlendirilmesi. İnsan ve Toplum Bilimleri Araştırmaları Dergisi, 11(2), 963-992. https://doi.org/10.15869/itobiad.1054829
Dorsey, D. W., & Michaels, H. R. (2022). Validity arguments meet artificial intelligence in innovative educational assessment: A discussion and look forward. Journal of Educational Measurement, 59(3), 389-394.
Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10(2), 133-43. https://doi.org/10.1007/s10459-004-4019-5.
Ebel, R. L. & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Psychology Press.
Ertel, W. (2024). Introduction to artificial intelligence. Springer Nature
Feng, W., Lee, J., McNichols, H., Scarlatos, A., Smith, D., Woodhead, S., ... & Lan, A. (2024). Exploring automated distractor generation for math multiple-choice questions via large language models. Findings of the Association for Computational Linguistics: NAACL 2024, 3067–3082. https://doi.org/10.48550/arxiv.2404.02124 .
Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2024). Generative AI. Business & Information Systems Engineering, 66 (1), 111–126. https://doi.org/10.1007/s12599-023-00834-7
Field, A. (2013). Discovering statistics using IBM SPSS Statistics (4th ed.). SAGE Publications.
Gierl, M.J., & Haladyna, T.M. (2012). Automatic item generation: Theory and practice. Routledge. Gierl, M.J., & Lai, H. (2012). The role of item models in automatic item generation. International Journal of Testing, 12(3), 273-298.
Göksün, D. O. (2025). Öğretmen adaylarının üst düzey düşünme becerilerini geliştirecek soru etkinliklerinde yapay zekâ araçlarının kullanımına yönelik swot analizleri. Erzincan Üniversitesi Eğitim Fakültesi Dergisi, 27(2), 278-293. https://doi.org/10.17556/erziefd.1648080
Gündeğer Kılcı, C. (2025). ChatGPT vs. DeepSeek: A comparative psychometric evaluation of AI tools in generating multiple-choice questions. International Journal of Assessment Tools in Education, 12(4), 1055-1079. https://doi.org/10.21449/ijate.1674995
Haladyna, T. M. & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Halaweh, M. (2023). ChatGPT in education: Strategies for responsible implementation. Contemporary Educational Technology, 15(2), ep421. https://doi.org/10.30935/cedtech/13036
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE.
Holmes, W., Bialik, M. & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. Center for Curriculum Redesign.
Hou, J., Li, Z., & Liu, G. (2022). Macro education approach to improve learning interest under the background of artificial intelligence. Wireless Communications and Mobile Computing, 2022, 4295887. https://doi.org/10.1155/2022/4295887
Huang, A. Y., Lu, O. H., & Yang, S. J. (2023). Effects of artificial intelligence-enabled personalized recommendations on learners’ learning engagement, motivation, and outcomes in a flipped classroom. Computers & Education, 194, 104684. https://doi.org/10.1016/j.compedu.2022.104684
Indran, I. R., Paranthaman, P., Gupta, N., & Mustafa, N. (2024). Twelve tips to leverage AI for efficient and effective medical question generation: A guide for educators using Chat GPT. Medical Teacher, 46(8):1021-1026 https://doi.org/10.1080/0142159X.2023.2294703
Jiang, Q., Gao, Z., & Karniadakis, G.E. (2025). DeepSeek vs. ChatGPT vs. Claude: A comparative study for scientific computing and scientific machine learning tasks. Theoretical and Applied Mechanics Letters. https://doi.org/10.1016/j.taml.2025.100583
Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24. https://doi.org/10.1037/h0057123
Kıyak, Y. S., Budakoğlu, I. İ., Coşkun Ö. & Koyun E. (2023). The first automatic ıtem generation in Turkish for assessment of clinical reasoning in medical education. World of Medical Education, 22(66), 72-90. https://doi.org/10.25282/ted.1225814
Kline, P. (2000). The handbook of psychological testing (2nd ed.). Routledge.
Kosh, A. E., Simpson, M. A., Bickel, L., Kellogg, M., & Sanford‐Moore, E. (2019). A cost– benefit analysis of automatic item generation. Educational Measurement: Issues and Practice, 38(1), 48-53. https://doi.org/10.1111/emip.12237
Krumm, S., Thiel, A. M., Reznik, N., Freudenstein, J.-P., Schäpers, P., & Mussel, P. (2024). Creating a psychological test in a few seconds: Can ChatGPT develop a psychometrically sound situational judgment test?European Journal of Psychological Assessment. Advance online publication. https://doi.org/10.1027/1015-5759/a000878
Kumar, N. S. (2019). Implementation of artificial intelligence in imparting education and evaluating student performance. Journal of Artificial Intelligence, 1(01), 1-9. https://doi.org/10.36548/jaicn.2019.1. 001
Laverghetta, A., & Licato, J. (2023). Generating better items for cognitive assessments using large language models. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 414–428). Association for Computational Linguistics.
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lee, K., Park, J., Kim, I., & Choi, Y. (2018). Predicting movie success with machine learning techniques: ways to improve accuracy. Information Systems Frontiers, 20, 577-588. https://doi.org/10.1007/s10796-016-9689-z
Lee, J., Smith, D., Woodhead, S., & Lan, A. (2024). Math multiple choice question generation via human-large language model collaboration. 17th International Conference on Educational Data Mining (EDM 2024). arXiv, abs/2405.00864. https://doi.org/10.48550/arxiv.2405.00864.
Lin, P. Y., Chai, C. S., Jong, M. S. Y., Dai, Y., Guo, Y., & Qin, J. (2021). Modelling the structural relationship among primary students’ motivation to learn artificial intelligence. Computers and Education: Artificial Intelligence, 2, 100006. https://doi.org/ 10.1016/j.caeai.2020.100006
Lin, Z., & Chen, H. (2024). Investigating the capability of ChatGPT for generating multiple-choice reading comprehension items. System, 123, 103344. https://doi.org/10.1016/j.system.2024.103344
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H. ve Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
Luo, Y., Han, X., & Zhang, C. (2022). Prediction of learning outcomes with a machine learning algorithm based on online learning behavior data in blended courses. Asia Pacific Education Review. https://doi.org/10.1007/s12564-022-09749-6
Luckin, R., Holmes, W., Griffiths, M. & Forcier, L. B. (2016). Intelligence unleashed: An argument for AI in education. Pearson.
Malec, W. (2024). Investigating the quality of ai-generated distractors for a multiple-choice vocabulary test. CSEDU, 1, 836-843. https://doi.org/10.5220/0012762400003693
Meehan, K., Wilson, S., Farrelly, W. (2024). Evaluating alternatives to essay-based assessment in a world where generative AI is pervasive, ICERI2024 Proceedings, pp. 1152-1159. 11-13 November, Spain.
Memarian, B., & Doleck, T. (2024). A review of assessment for learning with artificial intelligence. Computers in Human Behavior: Artificial Humans, 2(1), 100040. https://doi.org/10.1016/j.chbah.2023.100040
Mena-Guacas, A. F., Urueña Rodríguez, J. A., Santana Trujillo, D. M., Gómez-Galán, J., & López-Meneses, E. (2023). Collaborative learning and skill development for educational growth of artificial intelligence: A systematic review. Contemporary Educational Technology, 15(3), ep428. https://doi.org/10.30935/cedtech/13123
Millî Eğitim Bakanlığı [MEB]. (2018–2023). LGS örnek ve çıkmış sorular arşivi. https://odsgm.meb.gov.tr/
Millî Eğitim Bakanlığı [MEB]. (2023). LGS Değerlendirme Raporu. Ankara: MEB.
Omopekunola, M. O., & Kardanova, E. Y. (2024). Automatic generation of physics items with large language models (LLMs). Research and Evaluation in Education, 10(2). https://doi.org/10.21831/ reid.v10i2.76864
Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., & Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. https://doi.org/10.29333/ejmste/13428
Özgeldi, M. (2019). Yapay zeka ve insan kaynakları. G. Telli (Ed.), Yapay zeka ve gelecek içinde (s. 198-222). İstanbul: Doğu Kitapevi.
Pais, J., Silva, A., Guimarães, B., Povo, A., Coelho, E., Silva-Pereira, F., ... & Severo, M. (2016). Do item-writing flaws reduce examinations psychometric quality?. BMC Research Notes, 9, 399. https://doi.org/10.1186/s13104-016-2202-4
Peters, U., & Chin‐Yee, B. (2025). Generalization bias in large language model summarization of scientific research. Royal Society Open Science, 12(4). https://doi.org/10.1098/rsos.241776
Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
Pressey, S. L. (1950). Development and appraisal of devices providing immediate automatic scoring of objective tests and concomitant self-instruction. The Journal of Psychology, 29(2), 417-447.
Rodrigues, L., Pereira, F. D., Cabral, L., Gašević, D., Ramalho, G., & Mello, R. F. (2024). Assessing the quality of automatic-generated short answers using GPT4. Computers and Education: Artificial Intelligence, 7, 100248. https://doi.org/10.1016/j.caeai.2024.100248
Rubinstein, S. M., Muhsin, A., Banerjee, R., Mishra, S., Kwok, M., Yang, P., Warner, J. L., & Cowan, A. W. (2025). Summarizing clinical evidence utilizing large language models: A blinded comparative analysis. Frontiers in Digital Health, 5, 1569554. https://doi.org/10.3389/fdgth.2025.1569554
Rudner, L.M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York. https://doi.org/10.1007/978-0-387-85461-8_8
Sayın, A., & Bozdağ, S. (2024). Çeşitli yönleri ile otomatik madde üretimi. EPODDER
Schermelleh-Engel, K., Moosbrugger, H., & Müller, H. (2003). Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Methods of Psychological Research, 8(2), 23–74. https://psycnet.apa.org/record/2003-08119-003
Schumacker, R. E., & Lomax, R. G. (2010). A beginners guide to structural equation modeling. New York: Routledge
Shin, D., & Lee, J. H. (2023). Can ChatGPT make reading comprehension testing items on par with human experts? Language Learning & Technology, 27(3), 27–40. https://hdl.handle.net/10125/73530
Soylu, A., Coşkun, Ö., Budakoğlu, I. İ., & Peker, T. V.(2024). Discrimination and difficulty indices of ChatGPT- generated multiple-choice questions in anatomy. 24. Ulusal Anatomi Kongresi (pp.76-77). İstanbul, Turkey
Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Pearson.
Şad, S. N., & Şahiner, Y. K. (2016). Temel eğitimden ortaöğretime geçiş (TEOG) sistemine ilişkin öğrenci, öğretmen ve veli görüşleri. İlköğretim Online, 15(1). https://doi.org/10.17051/io.2016.78720
Tan, B., Armoush, N., Mazzullo, E., Bulut, O., & Gierl, M. J. (2025). A review of automatic item generation techniques leveraging large language models. International Journal of Assessment Tools in Education, 12(2), 317-340. https://doi.org/10.21449/ijate.1602294
Taşdelen-Teker, G., Bakan-Kalaycıoğlu, D., Şahin, M. G., & Esenboğa, S. (2024). Yapay zekâ tarafından oluşturulan olgu temelli çoktan seçmeli maddelerin değerlendirilmesi: Pediatri örneği. Uluslararası Ölçme, Seçme ve Yerleştirme Sempozyumu. ÖSYM, Ankara.
Tarrant, M., Knierim, A., Hayes, S. K., & Ware, J. (2006). The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Education Today, 26(8), 662-671. https://doi.org/10.1016/j.nedt.2006.07.006
Tavşancıl, E. (2010). Tutumların ölçülmesi ve SPSS ile veri analizi (5. baskı). Nobel Yayıncılık.
Tekin, H. (1996). Eğitimde ölçme ve değerlendirme. Yargı Yayınları.
Turgut, M. F., & Baykul, Y. (2012). Eğitimde ölçme ve değerlendirme. Pegem Akademi Yayıncılık.
Ulusoy, B. (2020). 8. sınıf öğrencilerinin liselere geçiş sınavına (LGS) ilişkin algılarının metaforlar aracılığıyla incelenmesi. Necmettin Erbakan Üniversitesi Ereğli Eğitim Fakültesi Dergisi, 2(2), 186-202. https://doi.org/10.51119/ereegf.2020.5
Wallen, N. E., & Fraenkel, J. R. (2013). Educational research: A guide to the process. Routledge.
Woolf, B. P. (2010). Building intelligent interactive tutors: Student-centered strategies for revolutionizing e-learning. Morgan Kaufmann.
Yapıcı Coşkun, Z., Kıyak, Y. S., Coşkun, Ö., Budakoğlu, I. İ., & Özdemir, Ö. (2025). Large language models for generating script concordance test in obstetrics and gynecology: ChatGPT and Claude. Medical Teacher, 47(11), 1767–1771. https://doi.org/10.1080/0142159X.2025.2497888
Yeşilyurt, S., Dündar, R., & Aydın, M. (2024). Sosyal bilgiler eğitimi̇ alanında lisansüstü eğitimini̇ sürdüren öğrencilerin yapay zekâ hakkındaki̇ görüşleri̇. Asya Studies, 8(27), 1-14. https://doi.org/10.31455/asya.1406649
Yüzüak, B. N., & Yılmaz, F. N. (2025). Soruları gözden geçirmede farklı yapay zeka uygulamalarının karşılaştırılması: Doğru-yanlış önermeleri üzerine bir uygulama. Dokuz Eylül Üniversitesi Buca Eğitim Fakültesi Dergisi, 63, 500-520. https://doi.org/10.53444/deubefd.1532545
Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education, 16(1), 39. https://doi.org/10.1186/s41239-019-0171-0

Yapay Zekâ Destekli Hazırlanan 8. Sınıf Eşitsizlik Matematik Soruları ile LGS Çıkmış Sorularının Psikometrik Özelliklerinin Karşılaştırılması

Year 2025, Volume: 7 Issue: Özel Sayı, 185 - 210, 29.11.2025

Hatice Dökmeci Şeyma Uyar

Abstract

Araştırmada, yapay zekâ destekli otomatik madde üretiminin eğitimde ölçme-değerlendirme süreçlerindeki işlevselliği incelenmiştir. Claude yapay zekâ modeli aracılığıyla Python programlama dili kullanılarak üretilen 8. sınıf matematik “eşitsizlikler” konusuna ait sorular ile Millî Eğitim Bakanlığı tarafından Liselere Geçiş Sistemi (LGS) kapsamında uygulanan çıkmış sorular psikometrik açıdan karşılaştırılmıştır. Uşak ilindeki 4 devlet ortaokulunda öğrenim gören 500 sekizinci sınıf öğrencisi çalışma grubunu oluşturmuştur. Uzman görüşleriyle kapsam geçerliği sağlananyapay zeka soruları ile çıkmış LGS soruları toplam 18 maddelik test formu ve form için yanıt süresi 40 dakika olarak öğrencilere uygulanmış, elde edilen veriler TAP and JAMOVI programları ile analiz edilmiştir. Bulgular, yapay zekâ tarafından üretilen soruların güçlük, ayırtedicilik ve güvenirlik düzeyleri bakımından LGS sorularıyla benzer özellikler gösterdiğini ortaya koymuştur. Sonuç olarak, yapay zekâ destekli otomatik madde üretiminin ölçme-değerlendirme süreçlerinde zaman ve maliyet açısından avantaj sağladığı, ancak geçerlik ve pedagojik uygunluk açısından uzman değerlendirmeleriyle desteklenmesinin gerektiği belirlenmiştir.

Keywords

Yapay zeka , Otomatik madde üretimi , Madde güçlüğü. Madde ayırt edicilik indeksi , LGS

Ethical Statement

Bu çalışma MAKÜ Girişimsel Olmayan Klinik Araştırmalar Etik Kurulundan alınan GO 2025/1527 karar numaralı etik kurul izni ile yürütülmüştür.

Thanks

Bu çalışmanın bir bölümü 3. Uluslararası 21. Yüzyıl Eğitim Araştırmaları Kongresi’nde (INER-2025 25-27 Nisan) sözlü bildiri olarak sunulmuştur.

References

Adiguzel, T., Kaya, M. H., & Cansu, F. K. (2023). Revolutionizing education with AI: Exploring the transformative potential of ChatGPT. Contemporary Educational Technology, 15(3). https://doi.org/10.30935/cedtech/13152 Agarwal, M., Sharma, P., & Wani, P. (2025). Evaluating the accuracy and reliability of large language models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in answering item-analyzed multiple-choice questions on blood physiology. Cureus 17(4): e81871. https://doi.org/10.7759/cureus.81871
Antropic, (2024). Claude pro. https://www.anthropic.com/news/claude-pro
Ayre, C., & Scally, A. J. (2014). Critical values for Lawshe’s content validity ratio: Revisiting the original methods of calculation. Measurement and Evaluation in Counseling and Development, 47(1), 79–86. https://doi.org/10.1177/0748175613513808
Baykul, Y. (2010). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Pegem Akademi.
Bhandari, S., Liu, Y., & Pardos, Z. A. (2023). Evaluating ChatGPT-generated textbook questions using IRT. NeurIPS 2023 Workshop on Generative AI for Education (GAIED), University of California, Berkeley. https://doi.org/10.1109/TLT.2022.3224232
Bozkurt, A. (2023). Chatgpt, üretken yapay zeka ve algoritmik paradigma değişikliği. Alanyazın, 4(1), 63-72. https://doi.org/10.59320/alanyazin.1283282
Boztepe, C. (2025). Eğitimde yapay zekâ uygulamaları: Fırsatlar, sınırlılıklar ve etik tartışmalar [Review article]. Dumlupınar Üniversitesi Eğitim Bilimleri Enstitüsü Dergisi, 9(1). https://doi.org/10.71272/debder.1706141
Bulut, G., & Akyıldız, M. (2024). Yapay zekâ ile üretilen soruların ve madde parametrelerinin MST test koşullarında karşılaştırılması. Dijital Teknolojiler ve Eğitim Dergisi, 3(1), 1-12. https://doi.org/10.5281/zenodo.12637347
Bulut, O., & Yıldırım-Erbaşlı, S. N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9(Special Issue), 72-87. https://doi.org/10.21449/ijate.1124382
Büyüköztürk, Ş. (2016). Sınavlar üzerine düşünceler. Kalem Eğitim ve İnsan Bilimleri Dergisi, 6 (2), 345 – 356
Büyüköztürk, Ş. (2018). Sosyal bilimler için veri analizi el kitabı: İstatistik, araştırma deseni, SPSS uygulamaları ve yorum (24. baskı). Pegem Akademi.
Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2020). Bilimsel araştırma yöntemleri (27. baskı). Pegem Akademi.
Büyükuslu, A. R. (2020). Koronavirüs sonrası yenidünya düzeni ekonomi-devlet-yapay zekâ. Der Yayınları.
Can, A. (2014). SPSS ile bilimsel araştırma sürecinde nicel veri analizi (2. Baskı). Pegem A Yayıncılık.
Chalasani, S. H., Syed, J., Ramesh, M., Patil, V., & Kumar, T. P. (2023). Artificial intelligence in the field of pharmacy practice: A literature review. Exploratory research in clinical and social pharmacy, 12, 100346. https://doi.org/10.1016/j.rcsop.2023.100346
Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. IEEE Access, 8, 75264-75278. https://doi.org/10.1109/ACCESS.2020.2988510.
Chen, B., Zilles, C., West, M., & Bretl, T. (2019). Effect of discrete and continuous parameter variation on difficulty in automatic item generation. Artificial Intelligence in Education. 20th International Conference, AIED 2019, Chicago, IL, USA, June 25-29, Proceedings, Part I 20,
Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). SAGE Publications.
Crocker, L., & Algina, J. (2006). Introduction to classical and modern test theory. Cengage Learning.
Çavuş, M. N. (2024). Eğitimde yapay zekâ tabanlı ölçme ve değerlendirme üzerine bir derleme. Uluslararası Özel Amaçlar için İngilizce Dergisi, 2(1), 39-54. https://dergipark.org.tr/en/pub/joinesp/issue/85082/1492424
Çetin, M. ve Baklavacı, G. Y. (2024). Endüstri̇ 4.0 perspekti̇fi̇nde yapay zekanin eği̇ti̇mde uygulanabi̇li̇rli̇ği̇ i̇le i̇lgi̇li̇ öğretmen görüşleri̇ni̇n i̇ncelenmesi̇. İstanbul Ticaret Üniversitesi Girişimcilik Dergisi, 7(14), 1-21. https://doi.org/10.55830/tje.1404165
Derici, S. (2025). Karar vermede yapay zeka tabanlı derin öğrenme ve makine öğrenmesi algoritmaları: Yapay zeka ile bütünleşme. E. Atalay ve S. Derici (Ed.) Yapay Zekâ: Dijital Çağın Anahtarı içinde, s. 83-97. Akademisyen Kitapevi
Doğan, S., & Oktay, Y. (2022). Liselere geçiş sınavı (LGS) hazırlık sürecinin değerlendirilmesi. İnsan ve Toplum Bilimleri Araştırmaları Dergisi, 11(2), 963-992. https://doi.org/10.15869/itobiad.1054829
Dorsey, D. W., & Michaels, H. R. (2022). Validity arguments meet artificial intelligence in innovative educational assessment: A discussion and look forward. Journal of Educational Measurement, 59(3), 389-394.
Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10(2), 133-43. https://doi.org/10.1007/s10459-004-4019-5.
Ebel, R. L. & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.
Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Psychology Press.
Ertel, W. (2024). Introduction to artificial intelligence. Springer Nature
Feng, W., Lee, J., McNichols, H., Scarlatos, A., Smith, D., Woodhead, S., ... & Lan, A. (2024). Exploring automated distractor generation for math multiple-choice questions via large language models. Findings of the Association for Computational Linguistics: NAACL 2024, 3067–3082. https://doi.org/10.48550/arxiv.2404.02124 .
Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2024). Generative AI. Business & Information Systems Engineering, 66 (1), 111–126. https://doi.org/10.1007/s12599-023-00834-7
Field, A. (2013). Discovering statistics using IBM SPSS Statistics (4th ed.). SAGE Publications.
Gierl, M.J., & Haladyna, T.M. (2012). Automatic item generation: Theory and practice. Routledge. Gierl, M.J., & Lai, H. (2012). The role of item models in automatic item generation. International Journal of Testing, 12(3), 273-298.
Göksün, D. O. (2025). Öğretmen adaylarının üst düzey düşünme becerilerini geliştirecek soru etkinliklerinde yapay zekâ araçlarının kullanımına yönelik swot analizleri. Erzincan Üniversitesi Eğitim Fakültesi Dergisi, 27(2), 278-293. https://doi.org/10.17556/erziefd.1648080
Gündeğer Kılcı, C. (2025). ChatGPT vs. DeepSeek: A comparative psychometric evaluation of AI tools in generating multiple-choice questions. International Journal of Assessment Tools in Education, 12(4), 1055-1079. https://doi.org/10.21449/ijate.1674995
Haladyna, T. M. & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Halaweh, M. (2023). ChatGPT in education: Strategies for responsible implementation. Contemporary Educational Technology, 15(2), ep421. https://doi.org/10.30935/cedtech/13036
Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE.
Holmes, W., Bialik, M. & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. Center for Curriculum Redesign.
Hou, J., Li, Z., & Liu, G. (2022). Macro education approach to improve learning interest under the background of artificial intelligence. Wireless Communications and Mobile Computing, 2022, 4295887. https://doi.org/10.1155/2022/4295887
Huang, A. Y., Lu, O. H., & Yang, S. J. (2023). Effects of artificial intelligence-enabled personalized recommendations on learners’ learning engagement, motivation, and outcomes in a flipped classroom. Computers & Education, 194, 104684. https://doi.org/10.1016/j.compedu.2022.104684
Indran, I. R., Paranthaman, P., Gupta, N., & Mustafa, N. (2024). Twelve tips to leverage AI for efficient and effective medical question generation: A guide for educators using Chat GPT. Medical Teacher, 46(8):1021-1026 https://doi.org/10.1080/0142159X.2023.2294703
Jiang, Q., Gao, Z., & Karniadakis, G.E. (2025). DeepSeek vs. ChatGPT vs. Claude: A comparative study for scientific computing and scientific machine learning tasks. Theoretical and Applied Mechanics Letters. https://doi.org/10.1016/j.taml.2025.100583
Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24. https://doi.org/10.1037/h0057123
Kıyak, Y. S., Budakoğlu, I. İ., Coşkun Ö. & Koyun E. (2023). The first automatic ıtem generation in Turkish for assessment of clinical reasoning in medical education. World of Medical Education, 22(66), 72-90. https://doi.org/10.25282/ted.1225814
Kline, P. (2000). The handbook of psychological testing (2nd ed.). Routledge.
Kosh, A. E., Simpson, M. A., Bickel, L., Kellogg, M., & Sanford‐Moore, E. (2019). A cost– benefit analysis of automatic item generation. Educational Measurement: Issues and Practice, 38(1), 48-53. https://doi.org/10.1111/emip.12237
Krumm, S., Thiel, A. M., Reznik, N., Freudenstein, J.-P., Schäpers, P., & Mussel, P. (2024). Creating a psychological test in a few seconds: Can ChatGPT develop a psychometrically sound situational judgment test?European Journal of Psychological Assessment. Advance online publication. https://doi.org/10.1027/1015-5759/a000878
Kumar, N. S. (2019). Implementation of artificial intelligence in imparting education and evaluating student performance. Journal of Artificial Intelligence, 1(01), 1-9. https://doi.org/10.36548/jaicn.2019.1. 001
Laverghetta, A., & Licato, J. (2023). Generating better items for cognitive assessments using large language models. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 414–428). Association for Computational Linguistics.
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lee, K., Park, J., Kim, I., & Choi, Y. (2018). Predicting movie success with machine learning techniques: ways to improve accuracy. Information Systems Frontiers, 20, 577-588. https://doi.org/10.1007/s10796-016-9689-z
Lee, J., Smith, D., Woodhead, S., & Lan, A. (2024). Math multiple choice question generation via human-large language model collaboration. 17th International Conference on Educational Data Mining (EDM 2024). arXiv, abs/2405.00864. https://doi.org/10.48550/arxiv.2405.00864.
Lin, P. Y., Chai, C. S., Jong, M. S. Y., Dai, Y., Guo, Y., & Qin, J. (2021). Modelling the structural relationship among primary students’ motivation to learn artificial intelligence. Computers and Education: Artificial Intelligence, 2, 100006. https://doi.org/ 10.1016/j.caeai.2020.100006
Lin, Z., & Chen, H. (2024). Investigating the capability of ChatGPT for generating multiple-choice reading comprehension items. System, 123, 103344. https://doi.org/10.1016/j.system.2024.103344
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H. ve Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
Luo, Y., Han, X., & Zhang, C. (2022). Prediction of learning outcomes with a machine learning algorithm based on online learning behavior data in blended courses. Asia Pacific Education Review. https://doi.org/10.1007/s12564-022-09749-6
Luckin, R., Holmes, W., Griffiths, M. & Forcier, L. B. (2016). Intelligence unleashed: An argument for AI in education. Pearson.
Malec, W. (2024). Investigating the quality of ai-generated distractors for a multiple-choice vocabulary test. CSEDU, 1, 836-843. https://doi.org/10.5220/0012762400003693
Meehan, K., Wilson, S., Farrelly, W. (2024). Evaluating alternatives to essay-based assessment in a world where generative AI is pervasive, ICERI2024 Proceedings, pp. 1152-1159. 11-13 November, Spain.
Memarian, B., & Doleck, T. (2024). A review of assessment for learning with artificial intelligence. Computers in Human Behavior: Artificial Humans, 2(1), 100040. https://doi.org/10.1016/j.chbah.2023.100040
Mena-Guacas, A. F., Urueña Rodríguez, J. A., Santana Trujillo, D. M., Gómez-Galán, J., & López-Meneses, E. (2023). Collaborative learning and skill development for educational growth of artificial intelligence: A systematic review. Contemporary Educational Technology, 15(3), ep428. https://doi.org/10.30935/cedtech/13123
Millî Eğitim Bakanlığı [MEB]. (2018–2023). LGS örnek ve çıkmış sorular arşivi. https://odsgm.meb.gov.tr/
Millî Eğitim Bakanlığı [MEB]. (2023). LGS Değerlendirme Raporu. Ankara: MEB.
Omopekunola, M. O., & Kardanova, E. Y. (2024). Automatic generation of physics items with large language models (LLMs). Research and Evaluation in Education, 10(2). https://doi.org/10.21831/ reid.v10i2.76864
Owan, V. J., Abang, K. B., Idika, D. O., Etta, E. O., & Bassey, B. A. (2023). Exploring the potential of artificial intelligence tools in educational measurement and assessment. Eurasia Journal of Mathematics, Science and Technology Education, 19(8), em2307. https://doi.org/10.29333/ejmste/13428
Özgeldi, M. (2019). Yapay zeka ve insan kaynakları. G. Telli (Ed.), Yapay zeka ve gelecek içinde (s. 198-222). İstanbul: Doğu Kitapevi.
Pais, J., Silva, A., Guimarães, B., Povo, A., Coelho, E., Silva-Pereira, F., ... & Severo, M. (2016). Do item-writing flaws reduce examinations psychometric quality?. BMC Research Notes, 9, 399. https://doi.org/10.1186/s13104-016-2202-4
Peters, U., & Chin‐Yee, B. (2025). Generalization bias in large language model summarization of scientific research. Royal Society Open Science, 12(4). https://doi.org/10.1098/rsos.241776
Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
Pressey, S. L. (1950). Development and appraisal of devices providing immediate automatic scoring of objective tests and concomitant self-instruction. The Journal of Psychology, 29(2), 417-447.
Rodrigues, L., Pereira, F. D., Cabral, L., Gašević, D., Ramalho, G., & Mello, R. F. (2024). Assessing the quality of automatic-generated short answers using GPT4. Computers and Education: Artificial Intelligence, 7, 100248. https://doi.org/10.1016/j.caeai.2024.100248
Rubinstein, S. M., Muhsin, A., Banerjee, R., Mishra, S., Kwok, M., Yang, P., Warner, J. L., & Cowan, A. W. (2025). Summarizing clinical evidence utilizing large language models: A blinded comparative analysis. Frontiers in Digital Health, 5, 1569554. https://doi.org/10.3389/fdgth.2025.1569554
Rudner, L.M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York. https://doi.org/10.1007/978-0-387-85461-8_8
Sayın, A., & Bozdağ, S. (2024). Çeşitli yönleri ile otomatik madde üretimi. EPODDER
Schermelleh-Engel, K., Moosbrugger, H., & Müller, H. (2003). Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Methods of Psychological Research, 8(2), 23–74. https://psycnet.apa.org/record/2003-08119-003
Schumacker, R. E., & Lomax, R. G. (2010). A beginners guide to structural equation modeling. New York: Routledge
Shin, D., & Lee, J. H. (2023). Can ChatGPT make reading comprehension testing items on par with human experts? Language Learning & Technology, 27(3), 27–40. https://hdl.handle.net/10125/73530
Soylu, A., Coşkun, Ö., Budakoğlu, I. İ., & Peker, T. V.(2024). Discrimination and difficulty indices of ChatGPT- generated multiple-choice questions in anatomy. 24. Ulusal Anatomi Kongresi (pp.76-77). İstanbul, Turkey
Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Pearson.
Şad, S. N., & Şahiner, Y. K. (2016). Temel eğitimden ortaöğretime geçiş (TEOG) sistemine ilişkin öğrenci, öğretmen ve veli görüşleri. İlköğretim Online, 15(1). https://doi.org/10.17051/io.2016.78720
Tan, B., Armoush, N., Mazzullo, E., Bulut, O., & Gierl, M. J. (2025). A review of automatic item generation techniques leveraging large language models. International Journal of Assessment Tools in Education, 12(2), 317-340. https://doi.org/10.21449/ijate.1602294
Taşdelen-Teker, G., Bakan-Kalaycıoğlu, D., Şahin, M. G., & Esenboğa, S. (2024). Yapay zekâ tarafından oluşturulan olgu temelli çoktan seçmeli maddelerin değerlendirilmesi: Pediatri örneği. Uluslararası Ölçme, Seçme ve Yerleştirme Sempozyumu. ÖSYM, Ankara.
Tarrant, M., Knierim, A., Hayes, S. K., & Ware, J. (2006). The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Education Today, 26(8), 662-671. https://doi.org/10.1016/j.nedt.2006.07.006
Tavşancıl, E. (2010). Tutumların ölçülmesi ve SPSS ile veri analizi (5. baskı). Nobel Yayıncılık.
Tekin, H. (1996). Eğitimde ölçme ve değerlendirme. Yargı Yayınları.
Turgut, M. F., & Baykul, Y. (2012). Eğitimde ölçme ve değerlendirme. Pegem Akademi Yayıncılık.
Ulusoy, B. (2020). 8. sınıf öğrencilerinin liselere geçiş sınavına (LGS) ilişkin algılarının metaforlar aracılığıyla incelenmesi. Necmettin Erbakan Üniversitesi Ereğli Eğitim Fakültesi Dergisi, 2(2), 186-202. https://doi.org/10.51119/ereegf.2020.5
Wallen, N. E., & Fraenkel, J. R. (2013). Educational research: A guide to the process. Routledge.
Woolf, B. P. (2010). Building intelligent interactive tutors: Student-centered strategies for revolutionizing e-learning. Morgan Kaufmann.
Yapıcı Coşkun, Z., Kıyak, Y. S., Coşkun, Ö., Budakoğlu, I. İ., & Özdemir, Ö. (2025). Large language models for generating script concordance test in obstetrics and gynecology: ChatGPT and Claude. Medical Teacher, 47(11), 1767–1771. https://doi.org/10.1080/0142159X.2025.2497888
Yeşilyurt, S., Dündar, R., & Aydın, M. (2024). Sosyal bilgiler eğitimi̇ alanında lisansüstü eğitimini̇ sürdüren öğrencilerin yapay zekâ hakkındaki̇ görüşleri̇. Asya Studies, 8(27), 1-14. https://doi.org/10.31455/asya.1406649
Yüzüak, B. N., & Yılmaz, F. N. (2025). Soruları gözden geçirmede farklı yapay zeka uygulamalarının karşılaştırılması: Doğru-yanlış önermeleri üzerine bir uygulama. Dokuz Eylül Üniversitesi Buca Eğitim Fakültesi Dergisi, 63, 500-520. https://doi.org/10.53444/deubefd.1532545
Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education, 16(1), 39. https://doi.org/10.1186/s41239-019-0171-0

There are 94 citations in total.

Details

Primary Language	Turkish
Journal Section	Research Article
Authors	Hatice Dökmeci This is me 0009-0007-2955-1788 Şeyma Uyar 0000-0002-8315-2637
Early Pub Date	November 28, 2025
Publication Date	November 29, 2025
Submission Date	August 31, 2025
Acceptance Date	November 9, 2025
Published in Issue	Year 2025 Volume: 7 Issue: Özel Sayı

Cite

APA	Dökmeci, H., & Uyar, Ş. (2025). Yapay Zekâ Destekli Hazırlanan 8. Sınıf Eşitsizlik Matematik Soruları ile LGS Çıkmış Sorularının Psikometrik Özelliklerinin Karşılaştırılması. Necmettin Erbakan Üniversitesi Ereğli Eğitim Fakültesi Dergisi, 7(Özel Sayı), 185-210.

Download Cover Image

Article Files

Full Text