BibTex RIS Cite

-

Year 2014, Volume: 5 Issue: 2, 12 - 25, 26.04.2014
https://doi.org/10.21031/epod.71099

Abstract

The purpose of this study was to compare Type I error rates and power for Mantel-Haenszel (MH) and logistic regression (LR) tests for reference and focal groups with varying ability distributions, sample size and sample size ratios. In this study WinGen3 was used to simulate ability estimates and to generate response data. The two-parameter logistic item response model was used to generate the item parameters for both non-DIF and DIF items. The results showed that with equal group distribution, Type I error rates of MH and LR techniques had similar and near the nominal α level. On the other hand, with unequal group distribution, inflated Type I error rates were observed in MH and LR procedures. Furthermore, the highest type I error rates were observed when difference between ability distributions was high and sample size was large. Results also showed that conditions leading to type I error rate inflation also result in artificially enhanced power rates

References

  • Ankenmann, R. D., Witt, E. A. and Dunbar, S. B. (1999). An Investigation of the Power of the Likelihood Ratio Goodness-of-Fit Statistics in Detecting Differential Item Functioning. Journal of Educational Measurement, 36, 4, 277–300.
  • Atar, B. (2007). Differential Item Functioning Analyses For Mixed Response Data Using IRT Likelihood-Ratio Test, Logistic Regression, and GLLAMM Procedures. Unpublished Doctoral Dissertation, Florida State University, Tallahassee.
  • Atar, B. and Kamata, A. (2011). Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41, 36–47.
  • Camilli, G. (2006). Test Fairness. In Brennan, R. L. (Ed). Educational Measurement (p. 221–257). CT: Wesport. American Council on Education.
  • Camilli, G. and Shepard, L. A. (1994). Methods for Identifying Biased Test Items. V 4. Thousand Oaks: Sage Publications, Inc.
  • Chang, H., Mazzeo, J. and Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of SIBTEST procedure. Journal of Educational Measurement, 33(3), 333–353.
  • Clauser, B. E., Mazor, K. M., and Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6, 269–279.
  • Clauser, B., Mazor, K., and Hambleton, R. K. (1994). The effects of score group width on the Mantel-Haenszel procedure. Journal of Educational Measurement, 31, 67–78.
  • DeMars, C. E. (2009). Modifcation of the Mantel-Haenszel and logistic regression DIF procedures to incorporate the SIBTEST regression correction. Journal of Educational and Behavioral Statistics, 34, 149-170.
  • DeMars, C. E. (2010). Type I error inflation for detecting DIF in the presence of impact. Educational and Psychological Measurement, 70, 961–972.
  • Dorans, N. J. and Holland, P. W. (1993). Detection of Differential item functioning using the parameters of item response models. In P. W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 25–66). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Donoghue, J. R., and Allen, N. L. (1993). Thin versus thick matching in the Mantel- Haenszel procedure for detecting DIF. Journal of Educational Statistics, 18, 131–154.
  • Fidalgo, A. M.; Mellenberg, G. J., and Muniz, J. (2000). Effects of Amount of DIF, Test Length, and Purification Type on Robustness and Power of Mantel-Haenszel Procedures. Methods of Psychological Research Online, 5(3), 43–53.
  • Finch, H. (2005). The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio. Applied Psychological Measurement, 29, 278–295.
  • Gierl, M. J., Jodoin, M. G., and Ackerman, T. A. (2000). Performance of Mantel-Haenszel, Simultaneous Item Bias Test, and Logistic Regression When the Proportion of DIF Items is Large. Paper Presented at the Annual Meeting of the American Educational Research Association (AERA) New Orleans, Louisiana, USA April 24–27.
  • Gierl, M. J. and McEwen, N. (1998, May). Differential item functioning on the Alberta Education Social Studies 30 diploma examination. Paper presented at the annual meeting of the Canadian Society for the Study of Education, Ottawa, ON, Canada.
  • Güler, N., and Penfield, R. D. (2009). A comparison of logistic regression and contingency table methods for simultaneous detection of uniform and nonuniform DIF. Journal of Educational Measurement, 46, 314-329.
  • Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31(5), 457-459.
  • Hambleton, R. K., Swaminathan, H. and Rogers, J. H. (1991). Fundamentals of Item Response Theory. Sage Publications. Buston.
  • Holland, P. W. and Thayer, D. T. (1988). Differential Item Functioning Detection and the Mantel-Haenszel procedure. In H. Wainer and H. Braun (Eds.), Test Validty (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
  • Holland, P. W. and Wainer, H. (Eds.). (1993). Differential Item Functioning. Hillsdale, NJ: Erlbaum.
  • Jodoin, M. G. and Gierl, M. J. (2001). Evaluating Type I Error and Power Using an Effect Size Measure with the Logistic Regression Procedure for DIF Detection. Applied Measurement in Education, 14(4), 329–349.
  • Kristjansson, E. (2001). Detecting DIF in Polytomous Items: An Empirical Comparison of the Ordinal Logistic Regression, Logistic Discriminant Function Analysis, Mantel, and Generalized Mantel Haenszel Procedures. Unpublished Doctoral Dissertation, University of Ottawa, Ottawa.
  • Li, H. and Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647–677.
  • Li, Y., Brooks, G. P. and Johanson, G. A. (2012). Item Discrimination and Type I Error in the Detection of Differential Item Functioning. Educational and Psychological Measurement, 72(5), 847-861.
  • Magis, D. and De Boeck, P. (2012). A Robust Outlier Approach to Prevent Type I Error Inflation in Differential Item Functioning. Educational and Psychological Measurement, 72(2), 291-311.
  • Mazor, K. M., Clauser, R. E. and Hambleton, R. K. (1994). Identification of nonuniform differential item functioning using a variation of the Mantel-Haenszel procedure. Educational and Psychological Measurement, 54, 284–291.
  • Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–118.
  • Meredith, W. and Millsap, R. E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289–311.
  • Millsap, R. E. and Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.
  • Narayanan, P., and Swaminathan, H. (1994). Performance of the Mantel-Haenszel and Simultaneus Item Bias Procedures for Detecting Differential Item Functioning. Applied Psychological Measurement,18 (4), 315–328.
  • Narayanan, P., and Swaminathan, H. (1996). Identification of Items that Show Nonuniform DIF. Applied Psychological Measurement, 20 (3), 257–274.
  • Penfield, R. D. (2000). The effects of matching criterion contamination on the Mantel- Haenszel procedure. Unpublished doctoral thesis, University of Toronto.
  • Potenza, M. T. and Dorans, N. J. (1995). DIF Assessment For Polytomously Scored Items: A Framework For Classification and Evaluation. Applied Psychological Measurement, 19, 23–37.
  • Rogers, H. J. and Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116.
  • Roussos, L. A., and Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215-230.
  • Swaminathan, H., and Rogers, H. J. (1990). Detecting Differential Item Functioning Using Lojistic Regression Procedures. Journal of Educational Measurement, 27, 361–370.
  • Sweeney, K. P. (1996). A Monte Carlo investigation of the likelihood-ratio procedure in the detection of differential item functioning. Unpublished doctoral dissertation, Fordham University, New York, NY.
  • Stark, S., Chernyshenko, O. S., Drasgow, F. ve Williams, B. A. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292–1306.
  • Thissen, D., Steinberg, L. ve Wainer, H. (1993). Detection of Differential item functioning using the parameters of item response models. In P. W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 67–114). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Uttaro, T. (1992). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Unpublished doctoral dissertation, Graduate Center, City University of New York.
  • Vaughn, B. K. and Wang, Q. (2010). DIF Trees: Using Classifications Trees to Detect Differential Item Functioning. Educational and Psychological Measurement, 70(6) 941–952.
  • Wang, W.-C., and Su, Y. H. (2004). Factors Influencing The Mantel and Generalized Mantel-Haenszel Methods for the Assessment of Differential Item Functioning in Polytomous Items. Applied Psychological Measurement, 28, 450–481.
  • Wang, W. C. and Yeh, L. Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
  • Wiberg, M. (2007). Measuring and Detecting Differential Item Functioning in Criterion-Referenced Licensing Test: A Theoretic Comparison of Methods. Umea university. EM No 60.
  • Woods, C. M. (2008a). Likelihood-ratio DIF testing: Effects of nonnormality. Applied Psychological Measurement, 32, 511–526.
  • Woods, C. M. (2008b). IRT-LR-DIF with estimation of the focal-group density as an empirical histogram. Educational and Psychological Measurement, 68, 571–586
  • Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57.
  • Woods, C. M. (2011). DIF Testing for Ordinal Items With Poly-SIBTEST, the Mantel and GMH Tests, and IRT-LR-DIF When the Latent Distribution Is Nonnormal for Both Groups. Applied Psychological Measurement, 35(2), 145-164.
  • Zumbo, B. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DMF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert Type (Ordinal) Item Scores. Ottowa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185–197.
  • Zwick, R., Donoghue, J. R., and Grima, A. (1993). Assessment of Differential Item Functioning for Performance Tasks. Journal of Educational Measurement, 30, 233–251.
  • Zwick, R., Thayer, D. T., & Mazzeo, J. (1997). Descriptive and inferential procedures for assessing differential item functioning in polytomous items. Applied Measurement in Education, 10(4), 321-344.

Değişen Madde Fonksiyonunu Belirlemede Mantel-Haenszel ve Lojistik Regresyon Tekniklerinin Karşılaştırılması

Year 2014, Volume: 5 Issue: 2, 12 - 25, 26.04.2014
https://doi.org/10.21031/epod.71099

Abstract

Bu araştırmada, iki kategorili verilerde değişen madde fonksiyonu (DMF) belirlemede kullanılan Mantel-Haenszel (MH) ve lojistik regresyon (LR)   tekniklerinin I. Tip hata oranları ve istatistiksel güç değerlerinin odak ve referans grubun yetenek dağılımı, örneklem büyüklüğü ve örneklem büyüklüğü oranlarının değiştiği çeşitli koşullar altında karşılaştırılması amaçlanmıştır. Araştırmada, yetenek kestirimleri ve cevaplayıcı tepkileri WinGen3 simülasyon programı kullanılarak elde edilmiştir. DMF içeren ve DMF içermeyen madde parametreleri iki parametreli lojistik modele uygun olarak üretilmiştir. Araştırma sonucunda, referans ve odak grup yetenek dağılımları birim normal dağılım gösterdiğinde MH ve LR  tekniklerinde benzer ve nominal α düzeyine yakın I. Tip hata oranları ortaya çıkmıştır. Buna karşın, referans ve odak grup yetenek dağılımları farklılaştığında MH ve LR tekniklerinde şişirilmiş I. Tip hataların ortaya çıktığı gözlenmiştir.

References

  • Ankenmann, R. D., Witt, E. A. and Dunbar, S. B. (1999). An Investigation of the Power of the Likelihood Ratio Goodness-of-Fit Statistics in Detecting Differential Item Functioning. Journal of Educational Measurement, 36, 4, 277–300.
  • Atar, B. (2007). Differential Item Functioning Analyses For Mixed Response Data Using IRT Likelihood-Ratio Test, Logistic Regression, and GLLAMM Procedures. Unpublished Doctoral Dissertation, Florida State University, Tallahassee.
  • Atar, B. and Kamata, A. (2011). Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 41, 36–47.
  • Camilli, G. (2006). Test Fairness. In Brennan, R. L. (Ed). Educational Measurement (p. 221–257). CT: Wesport. American Council on Education.
  • Camilli, G. and Shepard, L. A. (1994). Methods for Identifying Biased Test Items. V 4. Thousand Oaks: Sage Publications, Inc.
  • Chang, H., Mazzeo, J. and Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of SIBTEST procedure. Journal of Educational Measurement, 33(3), 333–353.
  • Clauser, B. E., Mazor, K. M., and Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6, 269–279.
  • Clauser, B., Mazor, K., and Hambleton, R. K. (1994). The effects of score group width on the Mantel-Haenszel procedure. Journal of Educational Measurement, 31, 67–78.
  • DeMars, C. E. (2009). Modifcation of the Mantel-Haenszel and logistic regression DIF procedures to incorporate the SIBTEST regression correction. Journal of Educational and Behavioral Statistics, 34, 149-170.
  • DeMars, C. E. (2010). Type I error inflation for detecting DIF in the presence of impact. Educational and Psychological Measurement, 70, 961–972.
  • Dorans, N. J. and Holland, P. W. (1993). Detection of Differential item functioning using the parameters of item response models. In P. W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 25–66). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Donoghue, J. R., and Allen, N. L. (1993). Thin versus thick matching in the Mantel- Haenszel procedure for detecting DIF. Journal of Educational Statistics, 18, 131–154.
  • Fidalgo, A. M.; Mellenberg, G. J., and Muniz, J. (2000). Effects of Amount of DIF, Test Length, and Purification Type on Robustness and Power of Mantel-Haenszel Procedures. Methods of Psychological Research Online, 5(3), 43–53.
  • Finch, H. (2005). The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio. Applied Psychological Measurement, 29, 278–295.
  • Gierl, M. J., Jodoin, M. G., and Ackerman, T. A. (2000). Performance of Mantel-Haenszel, Simultaneous Item Bias Test, and Logistic Regression When the Proportion of DIF Items is Large. Paper Presented at the Annual Meeting of the American Educational Research Association (AERA) New Orleans, Louisiana, USA April 24–27.
  • Gierl, M. J. and McEwen, N. (1998, May). Differential item functioning on the Alberta Education Social Studies 30 diploma examination. Paper presented at the annual meeting of the Canadian Society for the Study of Education, Ottawa, ON, Canada.
  • Güler, N., and Penfield, R. D. (2009). A comparison of logistic regression and contingency table methods for simultaneous detection of uniform and nonuniform DIF. Journal of Educational Measurement, 46, 314-329.
  • Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31(5), 457-459.
  • Hambleton, R. K., Swaminathan, H. and Rogers, J. H. (1991). Fundamentals of Item Response Theory. Sage Publications. Buston.
  • Holland, P. W. and Thayer, D. T. (1988). Differential Item Functioning Detection and the Mantel-Haenszel procedure. In H. Wainer and H. Braun (Eds.), Test Validty (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
  • Holland, P. W. and Wainer, H. (Eds.). (1993). Differential Item Functioning. Hillsdale, NJ: Erlbaum.
  • Jodoin, M. G. and Gierl, M. J. (2001). Evaluating Type I Error and Power Using an Effect Size Measure with the Logistic Regression Procedure for DIF Detection. Applied Measurement in Education, 14(4), 329–349.
  • Kristjansson, E. (2001). Detecting DIF in Polytomous Items: An Empirical Comparison of the Ordinal Logistic Regression, Logistic Discriminant Function Analysis, Mantel, and Generalized Mantel Haenszel Procedures. Unpublished Doctoral Dissertation, University of Ottawa, Ottawa.
  • Li, H. and Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647–677.
  • Li, Y., Brooks, G. P. and Johanson, G. A. (2012). Item Discrimination and Type I Error in the Detection of Differential Item Functioning. Educational and Psychological Measurement, 72(5), 847-861.
  • Magis, D. and De Boeck, P. (2012). A Robust Outlier Approach to Prevent Type I Error Inflation in Differential Item Functioning. Educational and Psychological Measurement, 72(2), 291-311.
  • Mazor, K. M., Clauser, R. E. and Hambleton, R. K. (1994). Identification of nonuniform differential item functioning using a variation of the Mantel-Haenszel procedure. Educational and Psychological Measurement, 54, 284–291.
  • Mellenberg, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–118.
  • Meredith, W. and Millsap, R. E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289–311.
  • Millsap, R. E. and Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389–402.
  • Narayanan, P., and Swaminathan, H. (1994). Performance of the Mantel-Haenszel and Simultaneus Item Bias Procedures for Detecting Differential Item Functioning. Applied Psychological Measurement,18 (4), 315–328.
  • Narayanan, P., and Swaminathan, H. (1996). Identification of Items that Show Nonuniform DIF. Applied Psychological Measurement, 20 (3), 257–274.
  • Penfield, R. D. (2000). The effects of matching criterion contamination on the Mantel- Haenszel procedure. Unpublished doctoral thesis, University of Toronto.
  • Potenza, M. T. and Dorans, N. J. (1995). DIF Assessment For Polytomously Scored Items: A Framework For Classification and Evaluation. Applied Psychological Measurement, 19, 23–37.
  • Rogers, H. J. and Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116.
  • Roussos, L. A., and Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215-230.
  • Swaminathan, H., and Rogers, H. J. (1990). Detecting Differential Item Functioning Using Lojistic Regression Procedures. Journal of Educational Measurement, 27, 361–370.
  • Sweeney, K. P. (1996). A Monte Carlo investigation of the likelihood-ratio procedure in the detection of differential item functioning. Unpublished doctoral dissertation, Fordham University, New York, NY.
  • Stark, S., Chernyshenko, O. S., Drasgow, F. ve Williams, B. A. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292–1306.
  • Thissen, D., Steinberg, L. ve Wainer, H. (1993). Detection of Differential item functioning using the parameters of item response models. In P. W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 67–114). Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Uttaro, T. (1992). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Unpublished doctoral dissertation, Graduate Center, City University of New York.
  • Vaughn, B. K. and Wang, Q. (2010). DIF Trees: Using Classifications Trees to Detect Differential Item Functioning. Educational and Psychological Measurement, 70(6) 941–952.
  • Wang, W.-C., and Su, Y. H. (2004). Factors Influencing The Mantel and Generalized Mantel-Haenszel Methods for the Assessment of Differential Item Functioning in Polytomous Items. Applied Psychological Measurement, 28, 450–481.
  • Wang, W. C. and Yeh, L. Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
  • Wiberg, M. (2007). Measuring and Detecting Differential Item Functioning in Criterion-Referenced Licensing Test: A Theoretic Comparison of Methods. Umea university. EM No 60.
  • Woods, C. M. (2008a). Likelihood-ratio DIF testing: Effects of nonnormality. Applied Psychological Measurement, 32, 511–526.
  • Woods, C. M. (2008b). IRT-LR-DIF with estimation of the focal-group density as an empirical histogram. Educational and Psychological Measurement, 68, 571–586
  • Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57.
  • Woods, C. M. (2011). DIF Testing for Ordinal Items With Poly-SIBTEST, the Mantel and GMH Tests, and IRT-LR-DIF When the Latent Distribution Is Nonnormal for Both Groups. Applied Psychological Measurement, 35(2), 145-164.
  • Zumbo, B. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DMF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert Type (Ordinal) Item Scores. Ottowa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185–197.
  • Zwick, R., Donoghue, J. R., and Grima, A. (1993). Assessment of Differential Item Functioning for Performance Tasks. Journal of Educational Measurement, 30, 233–251.
  • Zwick, R., Thayer, D. T., & Mazzeo, J. (1997). Descriptive and inferential procedures for assessing differential item functioning in polytomous items. Applied Measurement in Education, 10(4), 321-344.
There are 53 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

Devrim Erdem Keklik

Publication Date April 26, 2014
Published in Issue Year 2014 Volume: 5 Issue: 2

Cite

APA Erdem Keklik, D. (2014). Değişen Madde Fonksiyonunu Belirlemede Mantel-Haenszel ve Lojistik Regresyon Tekniklerinin Karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 5(2), 12-25. https://doi.org/10.21031/epod.71099