Performances of MIMIC and Logistic Regression Procedures in Detecting DIF

Seçil Uğurlu; Burcu Atar

doi:10.21031/epod.531509

Research Article

Year 2020, Volume: 11 Issue: 1, 1 - 12, 24.03.2020

Seçil Uğurlu , Burcu Atar

https://doi.org/10.21031/epod.531509

Cited By: 2

Abstract

References

Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Crane, P. K., Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23(2), 241–256. doi: 10.1002/sim.1713
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355-368.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278–295. doi: 10.1177/0146621605275728
Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 67(4), 565–582. doi: 10.1177/0013164406296975
Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of differential item functioning on age and gender differences in functional disability. Journal of Gerontology: Social Sciences, 57B(5), 275–284.
Holland, P. W., & Wainer, H. (1993). Differential Item Functioning. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum Associates.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Mazor, K. M., Kanjee, A., & Clauser, B. E. (1995). Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect differential item functioning. Journal of Educational Measurement, 32(2), 131–144.
Muthen, B. O. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 213-238). Hillsdale, NJ: Lawrence Erlbaum Associates.
Muthén, L. K., & Muthén, B. O. (1998-2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén.
Oort, F. J. (1998). Simulation study of item bias detection with restricted factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 5(2), 107–124. doi: 10.1080/10705519809540095
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Sari, H. I. & Huggins, A. C. (2014). Differential item functioning detection across two methods of defining group comparisons: Pairwise and composite group comparisons. Educational and Psychological Measurement, 75(4), 648-676. doi: 10.1177/0013164414549764
SAS Institute Inc. (2007). SAS® 9.1.3 qualification tools user’s guide. Cary, NC: SAS Institute Inc.
Shealy, R., & Stout W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum Associates.
Vaughn, B. K., & Wang, Q. (2010). DIF trees: Using classification trees to detect differential item functioning. Educational and Psychological Measurement, 70(6), 941–952. doi: 10.1177/0013164410379326
Wang, W.-C., & Shih, C.-L. (2010). MIMIC methods for assessing differential item functioning in polytomous items. Applied Psychological Measurement, 34(3), 166–180. doi: 10.1177/0146621609355279
Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69(5), 713–731. doi: 10.1177/0013164409332228
Woods, C. M. (2009). Evaluation of MIMIC-Model Methods for DIF Testing With Comparison to Two-Group Analysis, Multivariate Behavioral Research,44(1),1–27. doi: 10.1080/00273170802620121
Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2009). Illustration of MIMIC-Model DIF Testing with the Schedule for Nonadaptive and Adaptive Personality, J Psychopathol Behav Assess, 31, 320–330. doi: 10.1007/s10862-008-9118-9
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottowa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Performances of MIMIC and Logistic Regression Procedures in Detecting DIF

Year 2020, Volume: 11 Issue: 1, 1 - 12, 24.03.2020

Seçil Uğurlu , Burcu Atar

https://doi.org/10.21031/epod.531509

Cited By: 2

Abstract

In this
study, differential item functioning (DIF) detection performances of multiple
indicators, multiple causes (MIMIC) and logistic regression (LR) methods for
dichotomous data were investigated. Performances of these two methods were
compared by calculating the Type I error rates and power for each simulation
condition. Conditions covered in the study were: sample size (2000 and 4000
respondents), ability distribution of focal group [N(0, 1) and N(-0.5, 1)], and
the percentage of items with DIF (10% and 20%). Ability distributions of the
respondents in the reference group [N(0, 1)], ratio of focal group to reference
group (1:1), test length (30 items), and variation in difficulty parameters
between groups for the items that contain DIF (0.6) were the conditions that
were held constant. When the two methods were compared according to their Type
I error rates, it was concluded that the change in sample size was more
effective for MIMIC method. On the other hand, the change in the percentage of
items with DIF was more effective for LR. When the two methods were compared
according to their power, the most effective variable for both methods was the
sample size.

Keywords

Differential item functioning, MIMIC model, Logistic regression, Uniform DIF, Type I error rate and power

References

Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Crane, P. K., Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23(2), 241–256. doi: 10.1002/sim.1713
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355-368.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278–295. doi: 10.1177/0146621605275728
Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 67(4), 565–582. doi: 10.1177/0013164406296975
Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of differential item functioning on age and gender differences in functional disability. Journal of Gerontology: Social Sciences, 57B(5), 275–284.
Holland, P. W., & Wainer, H. (1993). Differential Item Functioning. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum Associates.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Mazor, K. M., Kanjee, A., & Clauser, B. E. (1995). Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect differential item functioning. Journal of Educational Measurement, 32(2), 131–144.
Muthen, B. O. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 213-238). Hillsdale, NJ: Lawrence Erlbaum Associates.
Muthén, L. K., & Muthén, B. O. (1998-2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén.
Oort, F. J. (1998). Simulation study of item bias detection with restricted factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 5(2), 107–124. doi: 10.1080/10705519809540095
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Sari, H. I. & Huggins, A. C. (2014). Differential item functioning detection across two methods of defining group comparisons: Pairwise and composite group comparisons. Educational and Psychological Measurement, 75(4), 648-676. doi: 10.1177/0013164414549764
SAS Institute Inc. (2007). SAS® 9.1.3 qualification tools user’s guide. Cary, NC: SAS Institute Inc.
Shealy, R., & Stout W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer, & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum Associates.
Vaughn, B. K., & Wang, Q. (2010). DIF trees: Using classification trees to detect differential item functioning. Educational and Psychological Measurement, 70(6), 941–952. doi: 10.1177/0013164410379326
Wang, W.-C., & Shih, C.-L. (2010). MIMIC methods for assessing differential item functioning in polytomous items. Applied Psychological Measurement, 34(3), 166–180. doi: 10.1177/0146621609355279
Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69(5), 713–731. doi: 10.1177/0013164409332228
Woods, C. M. (2009). Evaluation of MIMIC-Model Methods for DIF Testing With Comparison to Two-Group Analysis, Multivariate Behavioral Research,44(1),1–27. doi: 10.1080/00273170802620121
Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2009). Illustration of MIMIC-Model DIF Testing with the Schedule for Nonadaptive and Adaptive Personality, J Psychopathol Behav Assess, 31, 320–330. doi: 10.1007/s10862-008-9118-9
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottowa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

There are 25 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Seçil Uğurlu 0000-0002-3495-7797 Burcu Atar 0000-0003-3527-686X
Publication Date	March 24, 2020
Acceptance Date	November 22, 2019
Published in Issue	Year 2020 Volume: 11 Issue: 1

Cite

APA	Uğurlu, S., & Atar, B. (2020). Performances of MIMIC and Logistic Regression Procedures in Detecting DIF. Journal of Measurement and Evaluation in Education and Psychology, 11(1), 1-12. https://doi.org/10.21031/epod.531509

Cited By

Gender-based Differential Item Functioning Analysis of the Medical Specialization Education Entrance Examination

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.998592

The Impact of Missing Data on the Performances of DIF Detection Methods

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.1183617

Download Cover Image

Article Files

Full Text