A Comparison of Logistic Regression Models for DIF Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality of Ability Distributions

Yasemin Kaya; Walter L. Leite; M. David Miller

doi:10.21449/ijate.239563

Research Article

A Comparison of Logistic Regression Models for DIF Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality of Ability Distributions

Year 2015, Volume: 2 Issue: 1, 22 - 39, 11.07.2016

Yasemin Kaya Walter L. Leite M. David Miller

https://doi.org/10.21449/ijate.239563

Cited By: 6

Abstract

This study investigated the effectiveness of logistic regression models to detect uniform and non-uniform DIF in polytomous items across small sample sizes and non-normality of ability distributions. A simulation study was used to compare three logistic regression models, which were the cumulative logits model, the continuation ratio model, and the adjacent categories model. The results revealed that logistic regression was a powerful method to detect DIF in polytomous items, but not useful to distinguish the type of DIF. Continuation ratio model worked best to detect uniform DIF, but the cumulative logits model gave more acceptable type I error results. As sample size increased, type I errors increased at cumulative logits model results. Skewness of ability distributions reduced power of logistic regression to detect non-uniform DIF. Small sample sizes reduced power of logistic regression.

Keywords

DIF , logistic regression , polytomous items , non-normality , uniform , non-uniform

References

Agresti A. (2002). Categorical data analysis. Hoboken, NJ: John Wiley.
Ankenmann R.D., Witt E.A. & Dunbar S.B., (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277–300.
Armstrong B.G. & Sloan M., (1989). Ordinal regression models for epidemiologic data. American Journal of Epidemiology, 129, 191–204.
Bock R.D., & Aitkin M., (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 46, 443–459.
Bolt D.M., (2002). A monte carlo comparison of parametric and nonparametric polytomous DIF detection dethods. Applied Measurement in Education, 15, 113–141.
Chang H.H., Mazzeo J. & Roussos L., (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333–353.
Cole S.R. & Ananth C.V., (2001). Regression models for unconstrained, partially or fully constrained continuation odds ratio. International Journal of Epidemiology, 30, 1379– 1382.
Crane P.K., Hart D.L., Gibbons L.E. & Cook K.F., (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59, 478–484.
Benjamini Y. & Hochberg Y., (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300.
Embretson S.E. & Reise S.P., (2000). Psychometric methods: Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Fleishman A.I., (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521–532.
French A.W., & Miller, T. R., (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33, 315– 332.
Gelin M.N. & Zumbo B.D., (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63, 65–74.
Herrera A.N. & Gomez J., (2008). Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel–Haenszel and logistic regression techniques. Quality & Quantity, 42, 739–755
Holland P.W., & Wainer H., (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Kim J., (2010). Controlling Type 1 Error Rate in Evaluating Differential Item Functioning for Four DIF Methods: Use of Three Procedures for Adjustment of Multiple Item Testing. Educational Policy Studies Dissertations, 67.
Kristjansson E., Aylesworth R., McDowell I. & Zumbo B.D., (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65, 933–953.
Monaco M.K., (1997). A Monte Carlo assessment of skewed theta distributions on differential item functioning indices. Dissertation Abstracts International: Section B: The Sciences and Engineering, 58(5-B), 2746.
R development Core Team, (2010).R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Rogers H. and Swaminathan H., (1993). A comparison of logistic regression and Mantel– Haenszel procedures for detecting differential item functioning, Applied Psychological Measurement 17, 105–116.
Roussos L.A. & Stout W.F., (1996). Simulation studies of the effects of small sample and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215–230.
Samejima F., (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17.
Samejima F., (1996). Evaluation of mathematical models for ordered polychotomous responses. Behaviormetrika, 23, 17–35.
Scott N.W., Fayers P.M., Aaronson N.K., Bottomley A., DeGraeff A., Groenvold M., Gundy C., Koller M., Petersen M.A. & Sprangers M.A.G., (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62, 288–295.
Scott N.W., Fayers P.M., Aaronson N.K., Bottomley A., DeGraeff A., Groenvold M., Gundy C., Koller M., Petersen M.A. & Sprangers M.A.G., (2007). The use of differential item functioning analyses to identify cultural differences in responses to the EORTC QLQ- C30. Quality of Life Research, 16, 115–129.
Spray J. & Miller T., (1994). Identifying nonuniform DIF polytomously scored test items (94- 1). ACS Research Report Series.
Swaminathan H. & Gifford J.A., (1985). Bayesian estimation in the two parameter logistic model. Psychometrika, 50, 349–364.
Swaminathan H. and Rogers H., (1990). Detecting differential item functioning using logistic regression procedures, Journal of Educational Measurement, 27, 361–370.
Vaughn B.K., (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A Bayesian multilevel approach (Unpublished Doctoral dissertation). Florida State University, Tallahassee, FL.
Wang N. & Lane S., (1996). Detection of Gender-Related Differential Item Functioning in a Mathematics Performance Assessment. Applied Measurement in Education, 9, 175– 199.
Welch C.J. & Hoover H.D., (1993). Procedures for extending item bias techniques to polytomously scored items. Applied Measurement in Education, 6, 1–19.
Welkenhuysen-Gybels, J. (2004). The performance of some observed and unobserved conditional invariance techniques for the detection of differential item functioning. Quality & Quantity, 38, 681–702.
Yee T.W., (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32.
Zumbo B.D., (1999). A handbook on the theory and methods for differential item functioning: Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.

A Comparison of Logistic Regression Models for DIF Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality of Ability Distributions

Year 2015, Volume: 2 Issue: 1, 22 - 39, 11.07.2016

Yasemin Kaya Walter L. Leite M. David Miller

https://doi.org/10.21449/ijate.239563

Cited By: 6

Abstract

Keywords

DIF , logistic regression , polytomous items , non-normality , uniform , non-uniform

References

Agresti A. (2002). Categorical data analysis. Hoboken, NJ: John Wiley.
Ankenmann R.D., Witt E.A. & Dunbar S.B., (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277–300.
Armstrong B.G. & Sloan M., (1989). Ordinal regression models for epidemiologic data. American Journal of Epidemiology, 129, 191–204.
Bock R.D., & Aitkin M., (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 46, 443–459.
Bolt D.M., (2002). A monte carlo comparison of parametric and nonparametric polytomous DIF detection dethods. Applied Measurement in Education, 15, 113–141.
Chang H.H., Mazzeo J. & Roussos L., (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333–353.
Cole S.R. & Ananth C.V., (2001). Regression models for unconstrained, partially or fully constrained continuation odds ratio. International Journal of Epidemiology, 30, 1379– 1382.
Crane P.K., Hart D.L., Gibbons L.E. & Cook K.F., (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59, 478–484.
Benjamini Y. & Hochberg Y., (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300.
Embretson S.E. & Reise S.P., (2000). Psychometric methods: Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Fleishman A.I., (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521–532.
French A.W., & Miller, T. R., (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33, 315– 332.
Gelin M.N. & Zumbo B.D., (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63, 65–74.
Herrera A.N. & Gomez J., (2008). Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel–Haenszel and logistic regression techniques. Quality & Quantity, 42, 739–755
Holland P.W., & Wainer H., (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Kim J., (2010). Controlling Type 1 Error Rate in Evaluating Differential Item Functioning for Four DIF Methods: Use of Three Procedures for Adjustment of Multiple Item Testing. Educational Policy Studies Dissertations, 67.
Kristjansson E., Aylesworth R., McDowell I. & Zumbo B.D., (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65, 933–953.
Monaco M.K., (1997). A Monte Carlo assessment of skewed theta distributions on differential item functioning indices. Dissertation Abstracts International: Section B: The Sciences and Engineering, 58(5-B), 2746.
R development Core Team, (2010).R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Rogers H. and Swaminathan H., (1993). A comparison of logistic regression and Mantel– Haenszel procedures for detecting differential item functioning, Applied Psychological Measurement 17, 105–116.
Roussos L.A. & Stout W.F., (1996). Simulation studies of the effects of small sample and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215–230.
Samejima F., (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17.
Samejima F., (1996). Evaluation of mathematical models for ordered polychotomous responses. Behaviormetrika, 23, 17–35.
Scott N.W., Fayers P.M., Aaronson N.K., Bottomley A., DeGraeff A., Groenvold M., Gundy C., Koller M., Petersen M.A. & Sprangers M.A.G., (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62, 288–295.
Scott N.W., Fayers P.M., Aaronson N.K., Bottomley A., DeGraeff A., Groenvold M., Gundy C., Koller M., Petersen M.A. & Sprangers M.A.G., (2007). The use of differential item functioning analyses to identify cultural differences in responses to the EORTC QLQ- C30. Quality of Life Research, 16, 115–129.
Spray J. & Miller T., (1994). Identifying nonuniform DIF polytomously scored test items (94- 1). ACS Research Report Series.
Swaminathan H. & Gifford J.A., (1985). Bayesian estimation in the two parameter logistic model. Psychometrika, 50, 349–364.
Swaminathan H. and Rogers H., (1990). Detecting differential item functioning using logistic regression procedures, Journal of Educational Measurement, 27, 361–370.
Vaughn B.K., (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A Bayesian multilevel approach (Unpublished Doctoral dissertation). Florida State University, Tallahassee, FL.
Wang N. & Lane S., (1996). Detection of Gender-Related Differential Item Functioning in a Mathematics Performance Assessment. Applied Measurement in Education, 9, 175– 199.
Welch C.J. & Hoover H.D., (1993). Procedures for extending item bias techniques to polytomously scored items. Applied Measurement in Education, 6, 1–19.
Welkenhuysen-Gybels, J. (2004). The performance of some observed and unobserved conditional invariance techniques for the detection of differential item functioning. Quality & Quantity, 38, 681–702.
Yee T.W., (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32.
Zumbo B.D., (1999). A handbook on the theory and methods for differential item functioning: Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.

There are 34 citations in total.

Details

Primary Language	English
Subjects	Studies on Education
Other ID	JA43AG87ZU
Journal Section	Articles
Authors	Yasemin Kaya This is me Walter L. Leite This is me M. David Miller This is me
Publication Date	July 11, 2016
Submission Date	July 11, 2016
Published in Issue	Year 2015 Volume: 2 Issue: 1

Cite

APA	Kaya, Y., Leite, W. L., & Miller, M. D. (2016). A Comparison of Logistic Regression Models for DIF Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality of Ability Distributions. International Journal of Assessment Tools in Education, 2(1), 22-39. https://doi.org/10.21449/ijate.239563