Araştırma Makalesi
BibTex RIS Kaynak Göster

Çoklu Gruplarda Madde Tepki Kuramı Parametre Kestirimi ve Güvenirliğinin İncelenmesi

Yıl 2023, , 825 - 855, 02.09.2023
https://doi.org/10.17152/gefad.1202751

Öz

Bu çalışmada, aynı evrendeki çoklu gruplardan elde edilen ikili verilerde madde tepki kuramı (MTK) parametre kestirimi ve güvenirliğinin incelenmesi amaçlanmıştır. Araştırma kapsamında TEOG 2017 (Nisan) matematik alt testi kullanılmıştır. Araştırma 7500 kişilik bir alt grupta ve 3750 kişilik iki alt grupta yer alan öğrencilerin verileri ile gerçekleştirilmiştir. Araştırmada öncelikle MTK varsayımları incelenmiştir. Varsayımlar sağlandıktan sonra, ikili puanlanan veriler için 1PLM, 2PLM, 3PLM ve 4PLM ile madde ve yetenek kestirimleri gerçekleştirilmiştir. Model veri uyumları incelendiğinde her koşulda en iyi uyumun 3PLM ile elde edildiği görülmüştür. Örneklem değiştikçe madde parametrelerinin önemli ölçüde farklılaşmadığı gözlemlenmiştir. a ve b parametrelerinin farklı MTK modellerine göre farklılık gösterdiği bulgusuna ulaşılmıştır. Yetenek parametreleri arasında örneklemler değiştikçe kısmi farklılık bulunurken, kullanılan modeller değiştikçe de farklılık olduğu bulunmuştur. Yetenek kestirim yöntemlerine (Beklenen A Posteriori (EAP) ve Maksiimum A Posteriori (MAP)) göre elde edilen yetenek parametreleri arasında bazı küçük farklılıkların olduğu görülmüştür. Marjinal güvenilirlik katsayıları tüm koşullarda benzerlik göstermiştir. Bu çalışmadan yola çıkarak, MTK'de analiz yaparken daha fazla bilgi sağlamak için araştırmacıların 3PLM veya 4PLM'den en iyi model veri uyumuna sahip olan modelle parametre kestirimi yapmaları önerilir

Kaynakça

  • Acar, T., & Kelecioğlu, H. (2008). Genelleştirilmiş aşamalı doğrusal model ile rasch modelinin parametre değişmezliğinin karşılaştırılması. Ist National Congress of Measurement and Evaluation in Education and Psychology, 14-16 Mayıs, Ankara, 181-193.
  • Adedoyin, O. O. (2010). Investigating the invariance of person parameter estimates based on classical test and item response theories. International Journal of Education Science, 2(2), 107-113. https://doi.org/10.1080/09751122.2010.11889987
  • Adedoyin, O. O., Nenty, H. J., & Chilasa, B. (2008). Investigating the invariance of item difficulty parameter estimates based on CTT and IRT. Educational Research and Review, 3(2), 83-93. https://doi.org/10.5897/ERR.9000209
  • Baker, F. B. (2001). The basics of item response theory. United States of America: ERIC Clearinghouse on Assessment and Evaluation.
  • Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item response model. Research Bulletin, 81-20.
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord & M.R. Novick, Statistical theories of mental test scores (pp. 392-479). Reading, MA: Addison-Wesley.
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. 2nd edition. New York: Academic Press.
  • Custer, M., Sharairi, S., Yamazaki, K., Signatur, D., Swift, D., & Frey, S. (2008). A paradox between IRT invariance and model-data fit when utilizing the one-parameter and three-parameter models. Annual Meeting of the American Educational Research Association, 24-28 March, New York, 70-71.
  • DeMars, C. (2010). Item response theory. New York: Oxford University Press.
  • Doğan, N., & Kılıç, A. F. (2017). Madde tepki kuramı yetenek ve madde parametreleri kestirimlerinin değişmezliğinin incelenmesi. ss 297-314. Demirel, Ö., Dinçer, S., ed. Küreselleşen Dünyada Eğitim, Pegem Akademi, Ankara.
  • Doğan, N., & Tezbaşaran, A. A. (2003). Klasik test kuramı ve örtük özellikler kuramının örneklemler bağlamında karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 25(25), 58-67.
  • Doğruöz, E., & Arıkan, Ç. A. (2020). Comparison of different ability estimation methods based on 3 and 4PL item response theory. PAU Journal of Education 50, 50-69. https://doi.org/10.9779/pauefd.585774
  • Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5-18. https://doi.org/10.1007/s11136-007-9198-0
  • Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. New Jersey: Lawrence Erlbaum.
  • Erdemir, A., & Önen, E. (2019). Bir, iki, üç ve dört parametreli lojistik madde tepki kuramı modellerinin karşılaştırılması [Comparison of 1PL, 2PL, 3PL and 4PL item response theory models]. e-Turkish Studies, 14(1), 307-332. https://doi.org/10.7827/TurkishStudies.14745
  • Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357–381. https://doi.org/10.1177/0013164498058003001
  • Fan, X., & Ping, Y. (1999). Assessing the effect of model-data misfit on the invariance. Journal of Mathematical and Statistical Psychology, 42, 139-167.
  • Feuerstahler, L. M., & Waller, N. G. (2014). Estimation of the 4-parameter model with marginal maximum likelihood. Multivariate behavioral research, 49(3), 285-285. https://doi.org/10.1080/00273171.2014.912889
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer Nijhoff.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.
  • Han, K. T., & Hambleton, R. K. (2014). User's manual for WINGEN 3: windows software that generates IRT model parameters and item responses (Center for Educational Assessment Report No. 642). Amherst, MA: University of Massachusetts.
  • Kalkan, Ö. K. (2022). The comparison of estimation methods for the four-parameter logistic item response theory model. Measurement: Interdisciplinary Research and Perspectives, 20(2), 73-90. https://doi.org/10.1080/15366367.2021.1897398
  • Kaplan, R. M. & Saccuzo, D. P. (1997). Psychological testing: principles, applications and issues. Pacific Grove: Brooks Cole Pub. Company.
  • Kean, J., & Reilly, J. (2014). Item response theory. Handbook for clinical research: Design, statistics and implementation, 195-198.
  • Kelkar, V., Wightman, L.F., & Luecht, R.M. (2000). Evaluation of the IRT parameter Invariance property for the MCAT. Annual Meeting of the National Council on Measurement in Education, 25-27 April, New Orleans.
  • Kolen, M. J. & Brennan, R. L. (2014). Test equating, scalling, and linking. (third edition). USA: Springer.
  • Lembke, E. & Stecker, P. (2007). Curriculum-based measurement in mathematics: an evidence-based formative assessment procedure. Portsmouth, NH: RMC Research Corporation, Center on Instruction.
  • Liao, W., Ho, R., & Yen, Y. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality, 40(10), 1679–1694. https://doi.org/10.2224/sbp.2012.40.10.1679
  • Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. The British Journal of Mathematical and Statistical Psychology, 63(3), 509–25. https://doi.org/10.1348/000711009X474502
  • Lord, F. M. (1952). A theory of test scores (Psychometric Monograph No. 7). Iowa City, IA: Psychometric Society, 35.
  • Magis, D. (2013). A note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37(4), 304-315. https://doi.org/10.1177/0146621613475471
  • R Core Team. (2021). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
  • Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods, 8(2), 164–184. https://doi.org/10.1037/1082-989X.8.2.164
  • Robitzsch, A. (2021). sirt: Supplementary item response theory models. R package version 3.11-21, https://cran.r-project.org/web/packages/sirt/sirt.pdf
  • Rulison, K. L., & Loken, E. (2009). I’ve fallen and i can’t get up: can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83–101. https://doi.org/10.1177/0146621608324023
  • Rupp, A. A. (2003). Item response modeling with BILOG-MG and MULTILOG for windows. International Journal of Testing, 3(4), 365–384. https://doi.org/10.1207/S15327574IJT0304_5
  • Sünbül, Ö., & Erkuş, A. (2013). Madde parametrelerinin değişmezliğinin çeşitli boyutluluk özelliği gösteren yapılarda madde tepki kuramına göre incelenmesi. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 9(2), 378- 398.
  • U. S. Department of Education (2001). The elementary and secondary education act (The No Child Left Behind Act of 2001). Retrieved September 3, 2019, from http://www.ed.gov/legislation/ESEA02
  • Waller, N. G., & Reise, S. P. (2010). Measuring psychopathology with nonstandard item response theory models: fitting the four-parameter model to the Minnesota Multiphasic Personality Inventory. S. E. Embretson (Ed.), In Measuring psychological constructs: Advances in modelbased approaches (147-173). Washington, DC, US: American Psychological Association. http://dx.doi.org/10.1037/12074-007
  • Wu, M., Tam, H. P., & Jen, T. H. (2016). Classical test theory. In Educational measurement for applied researchers (pp. 73-90). Springer, Singapore.
  • Yalçın, S. (2018). Data fit comparison of mixture item response theory models and traditional models. International Journal of Assessment Tools in Education, 5(2), 301-313. https://doi.org/10.21449/ijate.402806
  • Yen, Y., Ho, R., Liao, W., & Chen, L. (2012). Reducing the impact of inappropriate items on reviewable computerized adaptive testing. Educational Technology & Society, 15, 231–243.

An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups

Yıl 2023, , 825 - 855, 02.09.2023
https://doi.org/10.17152/gefad.1202751

Öz

This study aimed to investigate the parameters estimation of item response theory (IRT) and their reliability in the context of binary data across multiple groups derived from the same population. Within the scope of the research, 2017 (April) mathematics subtest of the Transition from Primary to Secondary Education exam (TPSEE) was used. The dataset encompassed 7500 students as a single-sample subgroup and 3750 students distributed across two subgroups. In the research, IRT assumptions were examined first. After meeting the assumptions, item and ability estimations were performed with 1PLM, 2PLM, 3PLM, and 4PLM for dichotomous data. When the model data fits were examined, it was found that the best fit was obtained with 3PLM in all conditions. It was observed that the item parameters did not differ significantly as the sample changed. The a and b parameters differ according to the different IRT models. While there is a partial difference between the ability parameters as the samples change, there are also differences as the models change. Minor differences have been observed among the ability parameters obtained through ability estimation methods (Expected A Posteriori (EAP) and Maximum A Posteriori (MAP)). The marginal reliability coefficients were similar in all conditions. It is recommended that researchers perform parameter estimation with which have the best model data fit out of 3PLM or 4PLM to provide more information while performing analysis in IRT.

Kaynakça

  • Acar, T., & Kelecioğlu, H. (2008). Genelleştirilmiş aşamalı doğrusal model ile rasch modelinin parametre değişmezliğinin karşılaştırılması. Ist National Congress of Measurement and Evaluation in Education and Psychology, 14-16 Mayıs, Ankara, 181-193.
  • Adedoyin, O. O. (2010). Investigating the invariance of person parameter estimates based on classical test and item response theories. International Journal of Education Science, 2(2), 107-113. https://doi.org/10.1080/09751122.2010.11889987
  • Adedoyin, O. O., Nenty, H. J., & Chilasa, B. (2008). Investigating the invariance of item difficulty parameter estimates based on CTT and IRT. Educational Research and Review, 3(2), 83-93. https://doi.org/10.5897/ERR.9000209
  • Baker, F. B. (2001). The basics of item response theory. United States of America: ERIC Clearinghouse on Assessment and Evaluation.
  • Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item response model. Research Bulletin, 81-20.
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord & M.R. Novick, Statistical theories of mental test scores (pp. 392-479). Reading, MA: Addison-Wesley.
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. 2nd edition. New York: Academic Press.
  • Custer, M., Sharairi, S., Yamazaki, K., Signatur, D., Swift, D., & Frey, S. (2008). A paradox between IRT invariance and model-data fit when utilizing the one-parameter and three-parameter models. Annual Meeting of the American Educational Research Association, 24-28 March, New York, 70-71.
  • DeMars, C. (2010). Item response theory. New York: Oxford University Press.
  • Doğan, N., & Kılıç, A. F. (2017). Madde tepki kuramı yetenek ve madde parametreleri kestirimlerinin değişmezliğinin incelenmesi. ss 297-314. Demirel, Ö., Dinçer, S., ed. Küreselleşen Dünyada Eğitim, Pegem Akademi, Ankara.
  • Doğan, N., & Tezbaşaran, A. A. (2003). Klasik test kuramı ve örtük özellikler kuramının örneklemler bağlamında karşılaştırılması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 25(25), 58-67.
  • Doğruöz, E., & Arıkan, Ç. A. (2020). Comparison of different ability estimation methods based on 3 and 4PL item response theory. PAU Journal of Education 50, 50-69. https://doi.org/10.9779/pauefd.585774
  • Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5-18. https://doi.org/10.1007/s11136-007-9198-0
  • Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. New Jersey: Lawrence Erlbaum.
  • Erdemir, A., & Önen, E. (2019). Bir, iki, üç ve dört parametreli lojistik madde tepki kuramı modellerinin karşılaştırılması [Comparison of 1PL, 2PL, 3PL and 4PL item response theory models]. e-Turkish Studies, 14(1), 307-332. https://doi.org/10.7827/TurkishStudies.14745
  • Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357–381. https://doi.org/10.1177/0013164498058003001
  • Fan, X., & Ping, Y. (1999). Assessing the effect of model-data misfit on the invariance. Journal of Mathematical and Statistical Psychology, 42, 139-167.
  • Feuerstahler, L. M., & Waller, N. G. (2014). Estimation of the 4-parameter model with marginal maximum likelihood. Multivariate behavioral research, 49(3), 285-285. https://doi.org/10.1080/00273171.2014.912889
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer Nijhoff.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.
  • Han, K. T., & Hambleton, R. K. (2014). User's manual for WINGEN 3: windows software that generates IRT model parameters and item responses (Center for Educational Assessment Report No. 642). Amherst, MA: University of Massachusetts.
  • Kalkan, Ö. K. (2022). The comparison of estimation methods for the four-parameter logistic item response theory model. Measurement: Interdisciplinary Research and Perspectives, 20(2), 73-90. https://doi.org/10.1080/15366367.2021.1897398
  • Kaplan, R. M. & Saccuzo, D. P. (1997). Psychological testing: principles, applications and issues. Pacific Grove: Brooks Cole Pub. Company.
  • Kean, J., & Reilly, J. (2014). Item response theory. Handbook for clinical research: Design, statistics and implementation, 195-198.
  • Kelkar, V., Wightman, L.F., & Luecht, R.M. (2000). Evaluation of the IRT parameter Invariance property for the MCAT. Annual Meeting of the National Council on Measurement in Education, 25-27 April, New Orleans.
  • Kolen, M. J. & Brennan, R. L. (2014). Test equating, scalling, and linking. (third edition). USA: Springer.
  • Lembke, E. & Stecker, P. (2007). Curriculum-based measurement in mathematics: an evidence-based formative assessment procedure. Portsmouth, NH: RMC Research Corporation, Center on Instruction.
  • Liao, W., Ho, R., & Yen, Y. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality, 40(10), 1679–1694. https://doi.org/10.2224/sbp.2012.40.10.1679
  • Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. The British Journal of Mathematical and Statistical Psychology, 63(3), 509–25. https://doi.org/10.1348/000711009X474502
  • Lord, F. M. (1952). A theory of test scores (Psychometric Monograph No. 7). Iowa City, IA: Psychometric Society, 35.
  • Magis, D. (2013). A note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37(4), 304-315. https://doi.org/10.1177/0146621613475471
  • R Core Team. (2021). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
  • Reise, S. P., & Waller, N. G. (2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods, 8(2), 164–184. https://doi.org/10.1037/1082-989X.8.2.164
  • Robitzsch, A. (2021). sirt: Supplementary item response theory models. R package version 3.11-21, https://cran.r-project.org/web/packages/sirt/sirt.pdf
  • Rulison, K. L., & Loken, E. (2009). I’ve fallen and i can’t get up: can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83–101. https://doi.org/10.1177/0146621608324023
  • Rupp, A. A. (2003). Item response modeling with BILOG-MG and MULTILOG for windows. International Journal of Testing, 3(4), 365–384. https://doi.org/10.1207/S15327574IJT0304_5
  • Sünbül, Ö., & Erkuş, A. (2013). Madde parametrelerinin değişmezliğinin çeşitli boyutluluk özelliği gösteren yapılarda madde tepki kuramına göre incelenmesi. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 9(2), 378- 398.
  • U. S. Department of Education (2001). The elementary and secondary education act (The No Child Left Behind Act of 2001). Retrieved September 3, 2019, from http://www.ed.gov/legislation/ESEA02
  • Waller, N. G., & Reise, S. P. (2010). Measuring psychopathology with nonstandard item response theory models: fitting the four-parameter model to the Minnesota Multiphasic Personality Inventory. S. E. Embretson (Ed.), In Measuring psychological constructs: Advances in modelbased approaches (147-173). Washington, DC, US: American Psychological Association. http://dx.doi.org/10.1037/12074-007
  • Wu, M., Tam, H. P., & Jen, T. H. (2016). Classical test theory. In Educational measurement for applied researchers (pp. 73-90). Springer, Singapore.
  • Yalçın, S. (2018). Data fit comparison of mixture item response theory models and traditional models. International Journal of Assessment Tools in Education, 5(2), 301-313. https://doi.org/10.21449/ijate.402806
  • Yen, Y., Ho, R., Liao, W., & Chen, L. (2012). Reducing the impact of inappropriate items on reviewable computerized adaptive testing. Educational Technology & Society, 15, 231–243.
Toplam 42 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları
Bölüm Makaleler
Yazarlar

Serap Büyükkıdık 0000-0003-4335-2949

Hatice İnal 0000-0002-2813-0873

Yayımlanma Tarihi 2 Eylül 2023
Yayımlandığı Sayı Yıl 2023

Kaynak Göster

APA Büyükkıdık, S., & İnal, H. (2023). An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, 43(2), 825-855. https://doi.org/10.17152/gefad.1202751
AMA Büyükkıdık S, İnal H. An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups. GEFAD. Eylül 2023;43(2):825-855. doi:10.17152/gefad.1202751
Chicago Büyükkıdık, Serap, ve Hatice İnal. “An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups”. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi 43, sy. 2 (Eylül 2023): 825-55. https://doi.org/10.17152/gefad.1202751.
EndNote Büyükkıdık S, İnal H (01 Eylül 2023) An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi 43 2 825–855.
IEEE S. Büyükkıdık ve H. İnal, “An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups”, GEFAD, c. 43, sy. 2, ss. 825–855, 2023, doi: 10.17152/gefad.1202751.
ISNAD Büyükkıdık, Serap - İnal, Hatice. “An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups”. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi 43/2 (Eylül 2023), 825-855. https://doi.org/10.17152/gefad.1202751.
JAMA Büyükkıdık S, İnal H. An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups. GEFAD. 2023;43:825–855.
MLA Büyükkıdık, Serap ve Hatice İnal. “An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups”. Gazi Üniversitesi Gazi Eğitim Fakültesi Dergisi, c. 43, sy. 2, 2023, ss. 825-5, doi:10.17152/gefad.1202751.
Vancouver Büyükkıdık S, İnal H. An Investigation of Item Response Theory Parameter Estimations and Reliability in Multiple Groups. GEFAD. 2023;43(2):825-5.