Determination of Type I Error and Power Rate in Differential Item Functıoning By Several Methods

Şeyma Erbay Mermer; Yasemin Kuzu; Hülya Kelecioğlu

Araştırma Makalesi

Değişen Madde Fonksiyonunda Tip I Hata ve Güç Oranının Farklı Yöntemlere Göre Belirlenmesi

Yıl 2023, Cilt: 7 Sayı: 3, 902 - 921, 30.11.2023

Şeyma Erbay Mermer , Yasemin Kuzu , Hülya Kelecioğlu

Öz

Bu çalışmada Değişen Madde Fonksiyonunda Tip I hata ve güç oranlarının belirlenmesinde Klasik Test Kuramı ve Madde Tepki Kuramına dayalı yöntemler karşılaştırmalı olarak kullanılmıştır. Analizler için Lojistik regresyon, Mantel-Haenszel, Lord’un Ki-Kare, Breslow-Day ve Raju’nun alan indeks yöntemleri kullanılmış ve analizler R.3.0.1 programı kullanılarak gerçekleştirilmiştir. Çalışmanın sonuçlarına göre genel olarak DMF içeren madde oranının artması ile Tip I hata artmış ve güc oranı azalmıştır. Madde Tepki Kuramına dayalı yöntemlerden Lord χ2 ve Raju’nun alan indeksi yöntemlerinin düşük hata ve yüksek güç ile diğer yöntemlere göre daha iyi sonuçlar verdiği görülmüştür.

Anahtar Kelimeler

MTK , DMF , Tip I hata , güç oranı

Kaynakça

American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Educatio [NCME]. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistics in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.
Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.
Awuor, R. A. (2008). Effect of unequal sample sizes on the power of DIF detection: An IRT-based monte carlo study with sıbtest and Mantel-Haenszel procedures [Unpublished master thesis]. Virginia Polytechnic Institute and State University.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. SAGE.
Chen, J. H., Chen, C. T., & Shih, C. L. (2014). Improving the control of Type I error rate in assessing differential item functioning for hierarchical generalized linear model when impact is presented. Applied Psychological Measurement, 38(1), 18–36.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning items. Educational Measurement: Issues and Practice, 17, 31–44.
Çepni, Z. (2011). Değişen madde fonksiyonlarının sibtest, Mantel Haenzsel, lojistik regresyon ve madde tepki kuramı yöntemleriyle incelenmesi [Differential item functioning analysis using sıbtest, Mantel Haenszel, logistic regression and item response theory methods]. [Unpublished master thesis], Hacettepe University.
Ellis, B., & Raju, N. (2003): Test and item bias: what they are, what they aren’t, and how to detect them (ED480042). ERIC. https://eric.ed.gov/?id=ED480042
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Psychology Press.
Embretson, S. E. (2007). Construct validity: A universal validity system or just another test evaluation prosedure?, Educational Researcher, 36(8), 449-455.
Erdem-Keklik, D. (2012). İki kategorili maddelerde tek biçimli değişen madde fonksiyonu belirleme tekniklerinin karşılaştırılması: Bir simülasyon çalışması [Comparison of techniques in detecting uniform differential item functioning in dichotomous items: A simulation study]. (Tez No.311744). [Doctoral dissertation, Ankara University], National Thesis Center.
Furlow, C. F., Ross, T. R., & Gagné, P. (2009). The impact of multidimensionality on the detection of differential bundle functioning using simultaneous item bias test. Applied Psychological Measurement, 33(6), 441–464.
Gierl, M. J., Rogers, W. T., & Klinger, D. A. (1999). Using statistical and judgmental reviews to identify and interpret translation differential item functioning. Alberta Journal of Educational Research, 45(4), 353–376.
Gierl, M. J., Jodoin, M. G., & Ackerman, T. A. (2000, April 24–27). Performance of Mantel-Haenszel, simultaneous item bias test, and logistic regression when the proportion of DIF items is large [Paper presentation]. The Annual Meeting of the American Educational Research Association (AERA), New Orleans, Louisiana, USA.
Gök, B., Kabasakal, K. A., & Kelecioğlu, H. (2014). PISA 2009 öğrenci anketi tutum maddelerinin kültüre göre değişen madde fonksiyonu açısından incelenmesi [Analysis of attitude items in PISA 2009 student questionnaire in terms of differential item functioning based on culture]. Journal of Measurement and Evaluation in Education and Psychology, 5(1), 72–87.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE.
Hou, L.,de la Torre, J. D., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: Application of the Wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98–125.
Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2013). Modeling differential item functioning using a generalization of the multiple-group bifactor model. Journal of Educational and Behavioral Statistics, 38(1), 32–60.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349.
Kabasakal, K. A., & Kelecioglu, H. (2015). Effect of differential item functioning on test equating. Educational Sciences: Theory and Practice, 15(5), 1229–1246.
Kan, A., Sünbül, Ö., & Ömür, S. (2013). 6.- 8. Sınıf seviye belirleme sınavları alt testlerinin çeşitli yöntemlere göre değişen madde fonksiyonlarının incelenmesi [Investigating the differential item functions of the 6th-8th grade subtests of the Level Assessment Examination according to various methods]. Mersin University Journal of the Faculty of Education, 9(2), 207–222.
Kane, M. (2006). Content-related validity evidence in test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 131–153). Lawrence Erlbaum Associates.
Karami H., & Nodoushan M. A. S. (2011). Differential item functioning (DIF): Current problems and future directions. International Journal of Language Studies, 5(4), 133–142.
Lee, S., Bulut, O., & Suh, Y. (2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement, 77(4), 545–569.
Lee, K. (2003). Parametric and nonparametric IRT models for assessing differential item functioning [Unpublished doctoral dissertation]. Wayne State University.
Li, H., Qin, Q., & Lei, PW. (2017). An examination of the ınstructional sensitivity of the TIMSS math items: a hierarchical differential ıtem functioning approach, Educatıonal Assessment, 22(1), 1–17.
Mertler, C. A., & Vannatta, R. A. (2005). Advanced and multivariate statistical methods: Practical application and interpretation (3rd ed.). Pyrczak.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). American Council on Education.
Penfield, R. D., & Lam, T. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215–230.
Sireci, S. G., & Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20(2), 148–166.
Sireci, S. G., & Rios, J. A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170–187.
Sünbül, Ö., & Sünbül, S. Ö. (2016). Type I error rates and power study of several differential item functioning determination methods. Elementary Education Online, 15(3), 882–897.
Şahin, M. G. (2017). Comparison of objective and subjective methods on determination of differential item functioning. Universal Journal of Educational Research 5(9), 1435–1446.
Turhan, A. (2006). Multilevel 2PL item response model vertical equating with the presence of differential item functioning [Unpublished doctoral dissertation]. The Florida State University.
Vaughn, B. K., & Wang, Q. (2010). DIF trees: Using classifications trees to detect differential item functioning. Educational and Psychological Measurement, 70(6) 941–952.
Walker, C. M., & Gocer Sahin, S. (2016). Using a multidimensional IRT framework to better understand differential ıtem functioning (DIF): A tale of three dıf detection procedures. Educational and Psychological Measurement, 77(6), 945–970.
Zheng, Y., Gierl, M. J., & Cui, Y. (2007). Using real data to compare DIF detection and effect size measures among Mantel-Haenszel, SIBTEST and logistic regression procedures [Paper presentation]. NCME, Chicago.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–347). Lawrence Erlbaum Associates.
Zumbo, B. D. (1999). A Handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5(1), 23.
Zumbo, B. D. A., & Thomas, D. R. (1996). A measure of dif effect size using logistic regression procedures [Paper presentation]. National Board of Medical Examiners. US, Philadelphia.
Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1).

Determination of Type I Error and Power Rate in Differential Item Functıoning By Several Methods

Yıl 2023, Cilt: 7 Sayı: 3, 902 - 921, 30.11.2023

Şeyma Erbay Mermer , Yasemin Kuzu , Hülya Kelecioğlu

Öz

In this study, methods based on Classical Test Theory and Item Response Theory were used comparatively to determine Type I error and power rates in Differential Item Functioning. Logistic regression, Mantel-Haenszel, Lord's χ^2, Breslow-Day and Raju's area index methods were used for the analyses, and the analyzes were performed using the R.3.0.1 program. According to the results of the study, in general when the ratio of items containing DIF increased, Type I error increased and the power ratio decreased. Among the methods based on Matter Response Theory, Lord's χ^2and Raju's area index methods gave better results than other methods with low error and high power.

Anahtar Kelimeler

IRT , DIF , Type I error , power rate

Kaynakça

American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Educatio [NCME]. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistics in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.
Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.
Awuor, R. A. (2008). Effect of unequal sample sizes on the power of DIF detection: An IRT-based monte carlo study with sıbtest and Mantel-Haenszel procedures [Unpublished master thesis]. Virginia Polytechnic Institute and State University.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. SAGE.
Chen, J. H., Chen, C. T., & Shih, C. L. (2014). Improving the control of Type I error rate in assessing differential item functioning for hierarchical generalized linear model when impact is presented. Applied Psychological Measurement, 38(1), 18–36.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning items. Educational Measurement: Issues and Practice, 17, 31–44.
Çepni, Z. (2011). Değişen madde fonksiyonlarının sibtest, Mantel Haenzsel, lojistik regresyon ve madde tepki kuramı yöntemleriyle incelenmesi [Differential item functioning analysis using sıbtest, Mantel Haenszel, logistic regression and item response theory methods]. [Unpublished master thesis], Hacettepe University.
Ellis, B., & Raju, N. (2003): Test and item bias: what they are, what they aren’t, and how to detect them (ED480042). ERIC. https://eric.ed.gov/?id=ED480042
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Psychology Press.
Embretson, S. E. (2007). Construct validity: A universal validity system or just another test evaluation prosedure?, Educational Researcher, 36(8), 449-455.
Erdem-Keklik, D. (2012). İki kategorili maddelerde tek biçimli değişen madde fonksiyonu belirleme tekniklerinin karşılaştırılması: Bir simülasyon çalışması [Comparison of techniques in detecting uniform differential item functioning in dichotomous items: A simulation study]. (Tez No.311744). [Doctoral dissertation, Ankara University], National Thesis Center.
Furlow, C. F., Ross, T. R., & Gagné, P. (2009). The impact of multidimensionality on the detection of differential bundle functioning using simultaneous item bias test. Applied Psychological Measurement, 33(6), 441–464.
Gierl, M. J., Rogers, W. T., & Klinger, D. A. (1999). Using statistical and judgmental reviews to identify and interpret translation differential item functioning. Alberta Journal of Educational Research, 45(4), 353–376.
Gierl, M. J., Jodoin, M. G., & Ackerman, T. A. (2000, April 24–27). Performance of Mantel-Haenszel, simultaneous item bias test, and logistic regression when the proportion of DIF items is large [Paper presentation]. The Annual Meeting of the American Educational Research Association (AERA), New Orleans, Louisiana, USA.
Gök, B., Kabasakal, K. A., & Kelecioğlu, H. (2014). PISA 2009 öğrenci anketi tutum maddelerinin kültüre göre değişen madde fonksiyonu açısından incelenmesi [Analysis of attitude items in PISA 2009 student questionnaire in terms of differential item functioning based on culture]. Journal of Measurement and Evaluation in Education and Psychology, 5(1), 72–87.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE.
Hou, L.,de la Torre, J. D., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: Application of the Wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98–125.
Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2013). Modeling differential item functioning using a generalization of the multiple-group bifactor model. Journal of Educational and Behavioral Statistics, 38(1), 32–60.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349.
Kabasakal, K. A., & Kelecioglu, H. (2015). Effect of differential item functioning on test equating. Educational Sciences: Theory and Practice, 15(5), 1229–1246.
Kan, A., Sünbül, Ö., & Ömür, S. (2013). 6.- 8. Sınıf seviye belirleme sınavları alt testlerinin çeşitli yöntemlere göre değişen madde fonksiyonlarının incelenmesi [Investigating the differential item functions of the 6th-8th grade subtests of the Level Assessment Examination according to various methods]. Mersin University Journal of the Faculty of Education, 9(2), 207–222.
Kane, M. (2006). Content-related validity evidence in test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 131–153). Lawrence Erlbaum Associates.
Karami H., & Nodoushan M. A. S. (2011). Differential item functioning (DIF): Current problems and future directions. International Journal of Language Studies, 5(4), 133–142.
Lee, S., Bulut, O., & Suh, Y. (2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement, 77(4), 545–569.
Lee, K. (2003). Parametric and nonparametric IRT models for assessing differential item functioning [Unpublished doctoral dissertation]. Wayne State University.
Li, H., Qin, Q., & Lei, PW. (2017). An examination of the ınstructional sensitivity of the TIMSS math items: a hierarchical differential ıtem functioning approach, Educatıonal Assessment, 22(1), 1–17.
Mertler, C. A., & Vannatta, R. A. (2005). Advanced and multivariate statistical methods: Practical application and interpretation (3rd ed.). Pyrczak.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). American Council on Education.
Penfield, R. D., & Lam, T. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5–15.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215–230.
Sireci, S. G., & Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20(2), 148–166.
Sireci, S. G., & Rios, J. A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170–187.
Sünbül, Ö., & Sünbül, S. Ö. (2016). Type I error rates and power study of several differential item functioning determination methods. Elementary Education Online, 15(3), 882–897.
Şahin, M. G. (2017). Comparison of objective and subjective methods on determination of differential item functioning. Universal Journal of Educational Research 5(9), 1435–1446.
Turhan, A. (2006). Multilevel 2PL item response model vertical equating with the presence of differential item functioning [Unpublished doctoral dissertation]. The Florida State University.
Vaughn, B. K., & Wang, Q. (2010). DIF trees: Using classifications trees to detect differential item functioning. Educational and Psychological Measurement, 70(6) 941–952.
Walker, C. M., & Gocer Sahin, S. (2016). Using a multidimensional IRT framework to better understand differential ıtem functioning (DIF): A tale of three dıf detection procedures. Educational and Psychological Measurement, 77(6), 945–970.
Zheng, Y., Gierl, M. J., & Cui, Y. (2007). Using real data to compare DIF detection and effect size measures among Mantel-Haenszel, SIBTEST and logistic regression procedures [Paper presentation]. NCME, Chicago.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–347). Lawrence Erlbaum Associates.
Zumbo, B. D. (1999). A Handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5(1), 23.
Zumbo, B. D. A., & Thomas, D. R. (1996). A measure of dif effect size using logistic regression procedures [Paper presentation]. National Board of Medical Examiners. US, Philadelphia.
Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1).

Toplam 45 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Sosyal Hizmetler (Diğer)
Bölüm	Research Article
Yazarlar	Şeyma Erbay Mermer 0000-0002-7747-9545 Yasemin Kuzu 0000-0003-4301-2645 Hülya Kelecioğlu 0000-0002-0741-9934
Erken Görünüm Tarihi	31 Ekim 2023
Yayımlanma Tarihi	30 Kasım 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 7 Sayı: 3

Kaynak Göster

APA	Erbay Mermer, Ş., Kuzu, Y., & Kelecioğlu, H. (2023). Determination of Type I Error and Power Rate in Differential Item Functıoning By Several Methods. Türk Akademik Yayınlar Dergisi (TAY Journal), 7(3), 902-921.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

26139 28412 28976 19030 27281 27280 27284 27285 27290 27291 27292 27294 27937 28409