Differential item functioning across gender with MIMIC modeling: PISA 2018 financial literacy items

Fatıma Münevver Saatçioğlu

doi:10.21449/ijate.1076464

Research Article

Differential item functioning across gender with MIMIC modeling: PISA 2018 financial literacy items

Year 2022, Volume: 9 Issue: 3, 631 - 653, 30.09.2022

Fatıma Münevver Saatçioğlu

https://doi.org/10.21449/ijate.1076464

Cited By: 2

Abstract

The aim of this study is to investigate the presence of DIF over the gender variable with the latent class modeling approach. Data were 953 students from the USA who participated in the PISA 2018 8th-grade financial literacy assessment. Latent class analysis (LCA) approach was used to determine the latent classes and the data fit the three-class model better in line with fit indices. To obtain more information about the characteristics of the emerging classes, uniform and non-uniform DIF sources were determined by using the Multiple Indicator Multiple Causes (MIMIC) model. The findings are very important in terms of contributing to the interpretation of latent classes. According to the results, the gender variable is a potential source of DIF for latent class indicators. Gathering unbiased estimates for the measurement and structural parameters, it is important to include direct effects in the classes. Ignoring these effects can lead to incorrect determination of implicit classess. An example of the application of Multiple Indicator Multiple Causes (MIMIC) model showed in a latent class framework with a stepwise approach with this study.

Keywords

Differential item functioning, Latent class analysis, Measurement invariance, Mixture modeling, PISA 2018

References

Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91. https://doi.org/10.1111/j.1745-3984.1992.tb00368.x
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345 370. https://doi.org/10.1007/BF02294361
Cheng, Y., Shao, C., & Lathrop, Q.N. (2016). The mediated MIMIC model for understanding the underlying mechanism of DIF. Educational and Psychological Measurement, 76(1), 43-63. https://doi.org/10.1177/0013164415576187
Cho, S.J. (2007). A multilevel mixture IRT model for DIF analysis [Unpublished doctoral dissertation]. University of Georgia: Athens.
Choi, Y., Alexeev, N., & Cohen, A.S. (2015). Differential item functioning analysis using a mixture 3-parameter logistic model with a covariate on the TIMSS 2007 mathematics test. International Journal of Testing, 15(3), 239 253. https://doi.org/10.1080/15305058.2015.1007241
Clark, S.L., & Muthén, B. (2009). Relating latent class analysis results to variables not included in the analysis. Avaliable online at: http://www.statmodel.com/download/relatinglca.pdf
Cohen, A.S., & Bolt, D.M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. https://doi.org/10.1111/j.1745-3984.2005.00007
De Ayala, R.J., Kim, S.H., Stapleton, L.M., & Dayton, C.M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276. https://doi.org/10.1080/15305058.2002.9669495
De Ayala, R.J., & Santiago, S.Y. (2017). An introduction to mixture item response theory models. Journal of School Psychology, 60(1), 25 40. https://doi.org/10.1016/j.jsp.2016.01.002
Educational Testing Service. (2019). Standards for Quality and Fairness. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf.
Embretson, S.E., & Reise, S.P. (2000). Item Response Theory for Psychologists. Erlbaum.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295. https://doi.org/10.1177/0146621605275728
Finch, W.H., & French, B.F. (2012). Parameter estimation with mixture item response theory models: A Monte Carlo comparison of maximum likelihood and Bayesian methods. Journal of Modern Applied Statistical Methods, 11(1), 14.
Gallagher, A., Bennett, R.E., Cahalan, C., & Rock, D.A. (2002). Validity and fairness in technology- based assessment: detecting construct- irrelevant variance in an open-ended, computerized mathematics task. Educational Assessment, 8(1), 27 41. https://doi.org/10.1207/S15326977EA0801_02
Glockner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544 565. https://doi.org/10.1207/S15328007SEM1004_4
Haladyna, T.M., & Downing, S.M. (2004). Construct‐irrelevant variance in high‐stakes testing. Educational Assessment Issues and Practice, 23 (1), 17 27. https://doi.org/10.1111/j.1745-3992.2004.tb00149.x
IEA. (2017a). TIMSS 2015 user guide for the international database. Chestnut, MA: Lynch School of Education, Boston College & International Association for the Evaluation of Educational Achievement (IEA).
Kang, T., & Cohen, A.S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31(1), 331 358. https://doi.org/10.1177%2F0146621606292213
Kankaraš, M., Moors, G., & Vermunt, J.K. (2011). Testing for measurement invariance with latent class analysis. In E. Davidov, P. Schmidt, J. Billiet, & B. Mueleman (Eds.), Cross-cultural analysis: Methods and applications (pp. 359–384). Routledge.
Lee, Y., & Zhang, J. (2017). Effects of differential item functioning on examinees’ test performance and reliability of test. International Journal of Testing, 17(1), 23–54. https://doi.org/10.1080/15305058.2016.1224888
Lin, P.-Y., and Lin, Y.-C. (2014). Examining student factors in sources of setting accommodation DIF. Educational and Psychological Measurement 74(1), 759–794. https://doi.org/10.1177%2F0013164413514053
Masyn, K. (2013). “Latent class analysis and finite mixture modeling,” in The Oxford handbook of quantitative methods in psychology, Vol. 2, ed. T. D. Little (Oxford University Press), 551–611.
Masyn, K. (2017). Measurement invariance and differential item functioning in latent class analysis with stepwise multiple indicator multiple cause modeling. Structural Equation Modeling: A Multidisciplinar Jounal, 24(2), 180 197. https://doi.org/10.1080/10705511.2016.1254049
Messick, S. (1989). “Validity,” in Educational Measurement. Editor R. L. Linn 3rd ed. (NewYork: American Councilon Education and Macmillan), 13–103.
Millsap, R.E. (2011). Statistical approaches to measurement invariance. Taylor & Francis.
Mislevy, R.J., & Verhelst, N.D. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(1), 195-215.
Nylund, K.L., Asparouhov, T., & Muthén, B.O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural equation modeling: A multidisciplinary Journal, 14(4), 535-569. https://doi.org/10.1080/10705510701575396
Nagin, D. (2005). Group-based modeling of development. Harvard University Press.
Nylund-Gibson, K., & Masyn, K.E. (2016). Covariates and mixture modeling: results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling, 23(1), 782 797. https://doi.org/10.1080/10705511.2016.1221313
OECD. (2019a). PISA 2018 assessment and analytical framework. PISA, OECD Publishing.
OECD. (2019b). Technical report of the Survey of Adult Skills (PIAAC) (3rd Edition). OECD Publishing.
Oliveri, M.E., & von Davier, M. (2017). Analyzing the invariance of item parameters used to estimate trends in international large-scale assessments. In H. Jiao & R. W. Lissitz (Eds.), Test fairness in the new generation of large‐scale assessment (pp. 121–146). Information Age Publishing, Inc.
Oliveri, M., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class differential item functioning in international assessments. International Journal of Testing, 13(3), 272–293. https://doi.org/10.1080/15305058.2012.738266
Oliveri, M.E., & Ercikan, K. (2011). Do different approaches to examining construct comparability in multilanguage assessments lead to similar conclusions? Applied Measurement in Education, 24(4), 349 366. https://doi.org/10.1080/08957347.2011.607063
Penfield, R.D., & Lam, T.C.M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(1), 5–15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
Raju, N. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(1), 197–207. https://doi.org/10.1177/014662169001400208
Rost, J. (1990). Rasch Models in Latent Classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.
Samuelsen, K.M. (2008). Examining differential item functioning from a latent mixture perspective. In Hancock, G.R., & Samuelsen, K.M. (Eds.) Advances in latent variable mixture models, Information Age.
Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
Tsaousis, I., Sideridis, G.D., & AlGhamdi, H.M. (2020). Measurement invariance and differential ıtem functioning across gender within a latent class analysis framework: evidence from a high-stakes test for university admission in Saudi Arabia. Frontiers in Psychology, 11, 1-13. https://doi.org/10.3389/fpsyg.2020.00622
Vandenberg, R.J., & Lance, C.E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4 69. https://doi.org/10.1177%2F109442810031002
Yun, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes [Unpublished doctoral dissertation]. University of California, Los Angeles.
Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1), i 30. http://dx.doi.org/10.1002/j.2333 8504.2012.tb02290.x

Differential item functioning across gender with MIMIC modeling: PISA 2018 financial literacy items

Year 2022, Volume: 9 Issue: 3, 631 - 653, 30.09.2022

Fatıma Münevver Saatçioğlu

https://doi.org/10.21449/ijate.1076464

Cited By: 2

Abstract

The aim of this study is to investigate the presence of DIF over the gender variable with the latent class modeling approach. The data were collected from 953 students who participated in the PISA 2018 8th-grade financial literacy assessment in the USA. Latent Class Analysis (LCA) approach was used to identify the latent classes, and the data fit the three-class model better in line with fit indices. In order to obtain more information about the characteristics of the emerging classes, uniform and non-uniform DIF sources were identified by using the Multiple Indicator Multiple Causes (MIMIC) model. The findings are very important in terms of contributing to the interpretation of latent classes. According to the results, the gender variable was a source of DIF for latent classes. It is important to include direct effects by gathering unbiased estimates for the measurement and structural parameters. Disregarding these effects can lead to incorrect identification of implicit classess. A sample application of MIMIC model was performed in a latent class framework with a stepwise approach in this study.

Keywords

Differential item functioning, Latent class analysis, Measurement invariance, Mixture modeling, PISA 2018

References

Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91. https://doi.org/10.1111/j.1745-3984.1992.tb00368.x
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345 370. https://doi.org/10.1007/BF02294361
Cheng, Y., Shao, C., & Lathrop, Q.N. (2016). The mediated MIMIC model for understanding the underlying mechanism of DIF. Educational and Psychological Measurement, 76(1), 43-63. https://doi.org/10.1177/0013164415576187
Cho, S.J. (2007). A multilevel mixture IRT model for DIF analysis [Unpublished doctoral dissertation]. University of Georgia: Athens.
Choi, Y., Alexeev, N., & Cohen, A.S. (2015). Differential item functioning analysis using a mixture 3-parameter logistic model with a covariate on the TIMSS 2007 mathematics test. International Journal of Testing, 15(3), 239 253. https://doi.org/10.1080/15305058.2015.1007241
Clark, S.L., & Muthén, B. (2009). Relating latent class analysis results to variables not included in the analysis. Avaliable online at: http://www.statmodel.com/download/relatinglca.pdf
Cohen, A.S., & Bolt, D.M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. https://doi.org/10.1111/j.1745-3984.2005.00007
De Ayala, R.J., Kim, S.H., Stapleton, L.M., & Dayton, C.M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276. https://doi.org/10.1080/15305058.2002.9669495
De Ayala, R.J., & Santiago, S.Y. (2017). An introduction to mixture item response theory models. Journal of School Psychology, 60(1), 25 40. https://doi.org/10.1016/j.jsp.2016.01.002
Educational Testing Service. (2019). Standards for Quality and Fairness. Retrieved from https://www.ets.org/s/about/pdf/standards.pdf.
Embretson, S.E., & Reise, S.P. (2000). Item Response Theory for Psychologists. Erlbaum.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295. https://doi.org/10.1177/0146621605275728
Finch, W.H., & French, B.F. (2012). Parameter estimation with mixture item response theory models: A Monte Carlo comparison of maximum likelihood and Bayesian methods. Journal of Modern Applied Statistical Methods, 11(1), 14.
Gallagher, A., Bennett, R.E., Cahalan, C., & Rock, D.A. (2002). Validity and fairness in technology- based assessment: detecting construct- irrelevant variance in an open-ended, computerized mathematics task. Educational Assessment, 8(1), 27 41. https://doi.org/10.1207/S15326977EA0801_02
Glockner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544 565. https://doi.org/10.1207/S15328007SEM1004_4
Haladyna, T.M., & Downing, S.M. (2004). Construct‐irrelevant variance in high‐stakes testing. Educational Assessment Issues and Practice, 23 (1), 17 27. https://doi.org/10.1111/j.1745-3992.2004.tb00149.x
IEA. (2017a). TIMSS 2015 user guide for the international database. Chestnut, MA: Lynch School of Education, Boston College & International Association for the Evaluation of Educational Achievement (IEA).
Kang, T., & Cohen, A.S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31(1), 331 358. https://doi.org/10.1177%2F0146621606292213
Kankaraš, M., Moors, G., & Vermunt, J.K. (2011). Testing for measurement invariance with latent class analysis. In E. Davidov, P. Schmidt, J. Billiet, & B. Mueleman (Eds.), Cross-cultural analysis: Methods and applications (pp. 359–384). Routledge.
Lee, Y., & Zhang, J. (2017). Effects of differential item functioning on examinees’ test performance and reliability of test. International Journal of Testing, 17(1), 23–54. https://doi.org/10.1080/15305058.2016.1224888
Lin, P.-Y., and Lin, Y.-C. (2014). Examining student factors in sources of setting accommodation DIF. Educational and Psychological Measurement 74(1), 759–794. https://doi.org/10.1177%2F0013164413514053
Masyn, K. (2013). “Latent class analysis and finite mixture modeling,” in The Oxford handbook of quantitative methods in psychology, Vol. 2, ed. T. D. Little (Oxford University Press), 551–611.
Masyn, K. (2017). Measurement invariance and differential item functioning in latent class analysis with stepwise multiple indicator multiple cause modeling. Structural Equation Modeling: A Multidisciplinar Jounal, 24(2), 180 197. https://doi.org/10.1080/10705511.2016.1254049
Messick, S. (1989). “Validity,” in Educational Measurement. Editor R. L. Linn 3rd ed. (NewYork: American Councilon Education and Macmillan), 13–103.
Millsap, R.E. (2011). Statistical approaches to measurement invariance. Taylor & Francis.
Mislevy, R.J., & Verhelst, N.D. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(1), 195-215.
Nylund, K.L., Asparouhov, T., & Muthén, B.O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural equation modeling: A multidisciplinary Journal, 14(4), 535-569. https://doi.org/10.1080/10705510701575396
Nagin, D. (2005). Group-based modeling of development. Harvard University Press.
Nylund-Gibson, K., & Masyn, K.E. (2016). Covariates and mixture modeling: results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling, 23(1), 782 797. https://doi.org/10.1080/10705511.2016.1221313
OECD. (2019a). PISA 2018 assessment and analytical framework. PISA, OECD Publishing.
OECD. (2019b). Technical report of the Survey of Adult Skills (PIAAC) (3rd Edition). OECD Publishing.
Oliveri, M.E., & von Davier, M. (2017). Analyzing the invariance of item parameters used to estimate trends in international large-scale assessments. In H. Jiao & R. W. Lissitz (Eds.), Test fairness in the new generation of large‐scale assessment (pp. 121–146). Information Age Publishing, Inc.
Oliveri, M., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class differential item functioning in international assessments. International Journal of Testing, 13(3), 272–293. https://doi.org/10.1080/15305058.2012.738266
Oliveri, M.E., & Ercikan, K. (2011). Do different approaches to examining construct comparability in multilanguage assessments lead to similar conclusions? Applied Measurement in Education, 24(4), 349 366. https://doi.org/10.1080/08957347.2011.607063
Penfield, R.D., & Lam, T.C.M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(1), 5–15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
Raju, N. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(1), 197–207. https://doi.org/10.1177/014662169001400208
Rost, J. (1990). Rasch Models in Latent Classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.
Samuelsen, K.M. (2008). Examining differential item functioning from a latent mixture perspective. In Hancock, G.R., & Samuelsen, K.M. (Eds.) Advances in latent variable mixture models, Information Age.
Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
Tsaousis, I., Sideridis, G.D., & AlGhamdi, H.M. (2020). Measurement invariance and differential ıtem functioning across gender within a latent class analysis framework: evidence from a high-stakes test for university admission in Saudi Arabia. Frontiers in Psychology, 11, 1-13. https://doi.org/10.3389/fpsyg.2020.00622
Vandenberg, R.J., & Lance, C.E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4 69. https://doi.org/10.1177%2F109442810031002
Yun, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes [Unpublished doctoral dissertation]. University of California, Los Angeles.
Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: Flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, 2012(1), i 30. http://dx.doi.org/10.1002/j.2333 8504.2012.tb02290.x

There are 44 citations in total.

Details

Primary Language	English
Subjects	Other Fields of Education
Journal Section	Articles
Authors	Fatıma Münevver Saatçioğlu 0000-0003-4797-207X
Early Pub Date	August 31, 2022
Publication Date	September 30, 2022
Submission Date	February 20, 2022
Published in Issue	Year 2022 Volume: 9 Issue: 3

Cite

APA	Saatçioğlu, F. M. (2022). Differential item functioning across gender with MIMIC modeling: PISA 2018 financial literacy items. International Journal of Assessment Tools in Education, 9(3), 631-653. https://doi.org/10.21449/ijate.1076464

Cited By

Detection of differential item functioning with latent class analysis: PISA 2018 mathematical literacy test

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.1387041

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Journal of Educational Measurement

https://doi.org/10.1111/jedm.12388

Article Files

Full Text

23823 23825 23824