Differential item functioning across gender with MIMIC modeling: PISA 2018 financial literacy items
Year 2022,
, 631 - 653, 30.09.2022
Fatıma Münevver Saatçioğlu
Abstract
The aim of this study is to investigate the presence of DIF over the gender variable with the latent class modeling approach. Data were 953 students from the USA who participated in the PISA 2018 8th-grade financial literacy assessment. Latent class analysis (LCA) approach was used to determine the latent classes and the data fit the three-class model better in line with fit indices. To obtain more information about the characteristics of the emerging classes, uniform and non-uniform DIF sources were determined by using the Multiple Indicator Multiple Causes (MIMIC) model. The findings are very important in terms of contributing to the interpretation of latent classes. According to the results, the gender variable is a potential source of DIF for latent class indicators. Gathering unbiased estimates for the measurement and structural parameters, it is important to include direct effects in the classes. Ignoring these effects can lead to incorrect determination of implicit classess. An example of the application of Multiple Indicator Multiple Causes (MIMIC) model showed in a latent class framework with a stepwise approach with this study.
References
- Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91. https://doi.org/10.1111/j.1745-3984.1992.tb00368.x
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.
- Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345 370. https://doi.org/10.1007/BF02294361
- Cheng, Y., Shao, C., & Lathrop, Q.N. (2016). The mediated MIMIC model for understanding the underlying mechanism of DIF. Educational and Psychological Measurement, 76(1), 43-63. https://doi.org/10.1177/0013164415576187
- Cho, S.J. (2007). A multilevel mixture IRT model for DIF analysis [Unpublished doctoral dissertation]. University of Georgia: Athens.
- Choi, Y., Alexeev, N., & Cohen, A.S. (2015). Differential item functioning analysis using a mixture 3-parameter logistic model with a covariate on the TIMSS 2007 mathematics test. International Journal of Testing, 15(3), 239 253. https://doi.org/10.1080/15305058.2015.1007241
- Clark, S.L., & Muthén, B. (2009). Relating latent class analysis results to variables not included in the analysis. Avaliable online at: http://www.statmodel.com/download/relatinglca.pdf
- Cohen, A.S., & Bolt, D.M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. https://doi.org/10.1111/j.1745-3984.2005.00007
- De Ayala, R.J., Kim, S.H., Stapleton, L.M., & Dayton, C.M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276. https://doi.org/10.1080/15305058.2002.9669495
- De Ayala, R.J., & Santiago, S.Y. (2017). An introduction to mixture item response theory models. Journal of School Psychology, 60(1), 25 40. https://doi.org/10.1016/j.jsp.2016.01.002
- Educational Testing Service. (2019). Standards for Quality and Fairness. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf.
- Embretson, S.E., & Reise, S.P. (2000). Item Response Theory for Psychologists. Erlbaum.
- Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295. https://doi.org/10.1177/0146621605275728
- Finch, W.H., & French, B.F. (2012). Parameter estimation with mixture item response theory models: A Monte Carlo comparison of maximum likelihood and Bayesian methods. Journal of Modern Applied Statistical Methods, 11(1), 14.
- Gallagher, A., Bennett, R.E., Cahalan, C., & Rock, D.A. (2002). Validity and fairness in technology- based assessment: detecting construct- irrelevant variance in an open-ended, computerized mathematics task. Educational Assessment, 8(1), 27 41. https://doi.org/10.1207/S15326977EA0801_02
- Glockner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544 565. https://doi.org/10.1207/S15328007SEM1004_4
- Haladyna, T.M., & Downing, S.M. (2004). Construct‐irrelevant variance in high‐stakes testing. Educational Assessment Issues and Practice, 23 (1), 17 27. https://doi.org/10.1111/j.1745-3992.2004.tb00149.x
- IEA. (2017a). TIMSS 2015 user guide for the international database. Chestnut, MA: Lynch School of Education, Boston College & International Association for the Evaluation of Educational Achievement (IEA).
- Kang, T., & Cohen, A.S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31(1), 331 358. https://doi.org/10.1177%2F0146621606292213
- Kankaraš, M., Moors, G., & Vermunt, J.K. (2011). Testing for measurement invariance with latent class analysis. In E. Davidov, P. Schmidt, J. Billiet, & B. Mueleman (Eds.), Cross-cultural analysis: Methods and applications (pp. 359–384). Routledge.
- Lee, Y., & Zhang, J. (2017). Effects of differential item functioning on examinees’ test performance and reliability of test. International Journal of Testing, 17(1), 23–54. https://doi.org/10.1080/15305058.2016.1224888
- Lin, P.-Y., and Lin, Y.-C. (2014). Examining student factors in sources of setting accommodation DIF. Educational and Psychological Measurement 74(1), 759–794. https://doi.org/10.1177%2F0013164413514053
- Masyn, K. (2013). “Latent class analysis and finite mixture modeling,” in The Oxford handbook of quantitative methods in psychology, Vol. 2, ed. T. D. Little (Oxford University Press), 551–611.
- Masyn, K. (2017). Measurement invariance and differential item functioning in latent class analysis with stepwise multiple indicator multiple cause modeling. Structural Equation Modeling: A Multidisciplinar Jounal, 24(2), 180 197. https://doi.org/10.1080/10705511.2016.1254049
- Messick, S. (1989). “Validity,” in Educational Measurement. Editor R. L. Linn 3rd ed. (NewYork: American Councilon Education and Macmillan), 13–103.
- Millsap, R.E. (2011). Statistical approaches to measurement invariance. Taylor & Francis.
- Mislevy, R.J., & Verhelst, N.D. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(1), 195-215.
- Nylund, K.L., Asparouhov, T., & Muthén, B.O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural equation modeling: A multidisciplinary Journal, 14(4), 535-569. https://doi.org/10.1080/10705510701575396
- Nagin, D. (2005). Group-based modeling of development. Harvard University Press.
- Nylund-Gibson, K., & Masyn, K.E. (2016). Covariates and mixture modeling: results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling, 23(1), 782 797. https://doi.org/10.1080/10705511.2016.1221313
- OECD. (2019a). PISA 2018 assessment and analytical framework. PISA, OECD Publishing.
- OECD. (2019b). Technical report of the Survey of Adult Skills (PIAAC) (3rd Edition). OECD Publishing.
- Oliveri, M.E., & von Davier, M. (2017). Analyzing the invariance of item parameters used to estimate trends in international large-scale assessments. In H. Jiao & R. W. Lissitz (Eds.), Test fairness in the new generation of large‐scale assessment (pp. 121–146). Information Age Publishing, Inc.
- Oliveri, M., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class differential item functioning in international assessments. International Journal of Testing, 13(3), 272–293. https://doi.org/10.1080/15305058.2012.738266
- Oliveri, M.E., & Ercikan, K. (2011). Do different approaches to examining construct comparability in multilanguage assessments lead to similar conclusions? Applied Measurement in Education, 24(4), 349 366. https://doi.org/10.1080/08957347.2011.607063
- Penfield, R.D., & Lam, T.C.M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(1), 5–15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
- Raju, N. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(1), 197–207. https://doi.org/10.1177/014662169001400208
- Rost, J. (1990). Rasch Models in Latent Classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.
- Samuelsen, K.M. (2008). Examining differential item functioning from a latent mixture perspective. In Hancock, G.R., & Samuelsen, K.M. (Eds.) Advances in latent variable mixture models, Information Age.
- Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
- Tsaousis, I., Sideridis, G.D., & AlGhamdi, H.M. (2020). Measurement invariance and differential ıtem functioning across gender within a latent class analysis framework: evidence from a high-stakes test for university admission in Saudi Arabia. Frontiers in Psychology, 11, 1-13. https://doi.org/10.3389/fpsyg.2020.00622
- Vandenberg, R.J., & Lance, C.E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4 69. https://doi.org/10.1177%2F109442810031002
- Yun, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes [Unpublished doctoral dissertation]. University of California, Los Angeles.
- Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1), i 30. http://dx.doi.org/10.1002/j.2333 8504.2012.tb02290.x
Differential item functioning across gender with MIMIC modeling: PISA 2018 financial literacy items
Year 2022,
, 631 - 653, 30.09.2022
Fatıma Münevver Saatçioğlu
Abstract
The aim of this study is to investigate the presence of DIF over the gender variable with the latent class modeling approach. The data were collected from 953 students who participated in the PISA 2018 8th-grade financial literacy assessment in the USA. Latent Class Analysis (LCA) approach was used to identify the latent classes, and the data fit the three-class model better in line with fit indices. In order to obtain more information about the characteristics of the emerging classes, uniform and non-uniform DIF sources were identified by using the Multiple Indicator Multiple Causes (MIMIC) model. The findings are very important in terms of contributing to the interpretation of latent classes. According to the results, the gender variable was a source of DIF for latent classes. It is important to include direct effects by gathering unbiased estimates for the measurement and structural parameters. Disregarding these effects can lead to incorrect identification of implicit classess. A sample application of MIMIC model was performed in a latent class framework with a stepwise approach in this study.
References
- Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91. https://doi.org/10.1111/j.1745-3984.1992.tb00368.x
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.
- Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345 370. https://doi.org/10.1007/BF02294361
- Cheng, Y., Shao, C., & Lathrop, Q.N. (2016). The mediated MIMIC model for understanding the underlying mechanism of DIF. Educational and Psychological Measurement, 76(1), 43-63. https://doi.org/10.1177/0013164415576187
- Cho, S.J. (2007). A multilevel mixture IRT model for DIF analysis [Unpublished doctoral dissertation]. University of Georgia: Athens.
- Choi, Y., Alexeev, N., & Cohen, A.S. (2015). Differential item functioning analysis using a mixture 3-parameter logistic model with a covariate on the TIMSS 2007 mathematics test. International Journal of Testing, 15(3), 239 253. https://doi.org/10.1080/15305058.2015.1007241
- Clark, S.L., & Muthén, B. (2009). Relating latent class analysis results to variables not included in the analysis. Avaliable online at: http://www.statmodel.com/download/relatinglca.pdf
- Cohen, A.S., & Bolt, D.M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. https://doi.org/10.1111/j.1745-3984.2005.00007
- De Ayala, R.J., Kim, S.H., Stapleton, L.M., & Dayton, C.M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276. https://doi.org/10.1080/15305058.2002.9669495
- De Ayala, R.J., & Santiago, S.Y. (2017). An introduction to mixture item response theory models. Journal of School Psychology, 60(1), 25 40. https://doi.org/10.1016/j.jsp.2016.01.002
- Educational Testing Service. (2019). Standards for Quality and Fairness. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf.
- Embretson, S.E., & Reise, S.P. (2000). Item Response Theory for Psychologists. Erlbaum.
- Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295. https://doi.org/10.1177/0146621605275728
- Finch, W.H., & French, B.F. (2012). Parameter estimation with mixture item response theory models: A Monte Carlo comparison of maximum likelihood and Bayesian methods. Journal of Modern Applied Statistical Methods, 11(1), 14.
- Gallagher, A., Bennett, R.E., Cahalan, C., & Rock, D.A. (2002). Validity and fairness in technology- based assessment: detecting construct- irrelevant variance in an open-ended, computerized mathematics task. Educational Assessment, 8(1), 27 41. https://doi.org/10.1207/S15326977EA0801_02
- Glockner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544 565. https://doi.org/10.1207/S15328007SEM1004_4
- Haladyna, T.M., & Downing, S.M. (2004). Construct‐irrelevant variance in high‐stakes testing. Educational Assessment Issues and Practice, 23 (1), 17 27. https://doi.org/10.1111/j.1745-3992.2004.tb00149.x
- IEA. (2017a). TIMSS 2015 user guide for the international database. Chestnut, MA: Lynch School of Education, Boston College & International Association for the Evaluation of Educational Achievement (IEA).
- Kang, T., & Cohen, A.S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31(1), 331 358. https://doi.org/10.1177%2F0146621606292213
- Kankaraš, M., Moors, G., & Vermunt, J.K. (2011). Testing for measurement invariance with latent class analysis. In E. Davidov, P. Schmidt, J. Billiet, & B. Mueleman (Eds.), Cross-cultural analysis: Methods and applications (pp. 359–384). Routledge.
- Lee, Y., & Zhang, J. (2017). Effects of differential item functioning on examinees’ test performance and reliability of test. International Journal of Testing, 17(1), 23–54. https://doi.org/10.1080/15305058.2016.1224888
- Lin, P.-Y., and Lin, Y.-C. (2014). Examining student factors in sources of setting accommodation DIF. Educational and Psychological Measurement 74(1), 759–794. https://doi.org/10.1177%2F0013164413514053
- Masyn, K. (2013). “Latent class analysis and finite mixture modeling,” in The Oxford handbook of quantitative methods in psychology, Vol. 2, ed. T. D. Little (Oxford University Press), 551–611.
- Masyn, K. (2017). Measurement invariance and differential item functioning in latent class analysis with stepwise multiple indicator multiple cause modeling. Structural Equation Modeling: A Multidisciplinar Jounal, 24(2), 180 197. https://doi.org/10.1080/10705511.2016.1254049
- Messick, S. (1989). “Validity,” in Educational Measurement. Editor R. L. Linn 3rd ed. (NewYork: American Councilon Education and Macmillan), 13–103.
- Millsap, R.E. (2011). Statistical approaches to measurement invariance. Taylor & Francis.
- Mislevy, R.J., & Verhelst, N.D. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(1), 195-215.
- Nylund, K.L., Asparouhov, T., & Muthén, B.O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural equation modeling: A multidisciplinary Journal, 14(4), 535-569. https://doi.org/10.1080/10705510701575396
- Nagin, D. (2005). Group-based modeling of development. Harvard University Press.
- Nylund-Gibson, K., & Masyn, K.E. (2016). Covariates and mixture modeling: results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling, 23(1), 782 797. https://doi.org/10.1080/10705511.2016.1221313
- OECD. (2019a). PISA 2018 assessment and analytical framework. PISA, OECD Publishing.
- OECD. (2019b). Technical report of the Survey of Adult Skills (PIAAC) (3rd Edition). OECD Publishing.
- Oliveri, M.E., & von Davier, M. (2017). Analyzing the invariance of item parameters used to estimate trends in international large-scale assessments. In H. Jiao & R. W. Lissitz (Eds.), Test fairness in the new generation of large‐scale assessment (pp. 121–146). Information Age Publishing, Inc.
- Oliveri, M., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class differential item functioning in international assessments. International Journal of Testing, 13(3), 272–293. https://doi.org/10.1080/15305058.2012.738266
- Oliveri, M.E., & Ercikan, K. (2011). Do different approaches to examining construct comparability in multilanguage assessments lead to similar conclusions? Applied Measurement in Education, 24(4), 349 366. https://doi.org/10.1080/08957347.2011.607063
- Penfield, R.D., & Lam, T.C.M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(1), 5–15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
- Raju, N. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(1), 197–207. https://doi.org/10.1177/014662169001400208
- Rost, J. (1990). Rasch Models in Latent Classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.
- Samuelsen, K.M. (2008). Examining differential item functioning from a latent mixture perspective. In Hancock, G.R., & Samuelsen, K.M. (Eds.) Advances in latent variable mixture models, Information Age.
- Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
- Tsaousis, I., Sideridis, G.D., & AlGhamdi, H.M. (2020). Measurement invariance and differential ıtem functioning across gender within a latent class analysis framework: evidence from a high-stakes test for university admission in Saudi Arabia. Frontiers in Psychology, 11, 1-13. https://doi.org/10.3389/fpsyg.2020.00622
- Vandenberg, R.J., & Lance, C.E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4 69. https://doi.org/10.1177%2F109442810031002
- Yun, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes [Unpublished doctoral dissertation]. University of California, Los Angeles.
- Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1), i 30. http://dx.doi.org/10.1002/j.2333 8504.2012.tb02290.x