Research Article
BibTex RIS Cite
Year 2023, Volume: 14 Issue: 1, 19 - 32, 25.03.2023
https://doi.org/10.21031/epod.1101457

Abstract

References

  • Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495-508. https://doi.org/10.1080/10705511.2014.919210
  • Bock, R. D., & Zimowski, M. F. (1997). The multiple groups IRT. In Wim J. van der Linden, & Ronald K. Hambleton (Eds.), Handbook of modern item response theory. Springer-Verlag. https://doi.org/10.1007/978-1-4757-2691-6_25
  • Brown, T. A. (2006). Confirmatory factor analysis for applied research. Newyork: Guilford Publications.
  • Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross -national research. Annual Review of Sociology, 40, 55-75. https://doi.org/10.1146/annurev-soc-071913-043137
  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559. https://doi.org/10.1007/s11336-008-9092-x
  • Finch, W. H. (2016). Detection of differential item functioning for more than two groups: A Monte Carlo comparison of methods. Applied measurement in Education, 29(1), 30-45. https://doi.org/10.1080/08957347.2015.1102916
  • Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Springer Science & Business Media. https://doi.org/10.1007/978-1-4419-0742-4
  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied psychological measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000
  • Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental aging research, 18(3), 117-144. https://doi.org/10.1080/03610739208253916
  • Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical irt model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306. https://doi.org/10.3102/10769986025003285
  • Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford University Press.
  • Langer, M. M. (2008). A reexamination of Lord's Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, The University of North Carolina at Chapel Hill).
  • Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. https://doi.org/10.1007/BF02294825
  • Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge. https://doi.org/10.4324/9780203821961
  • R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
  • Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological bulletin, 114(3), 552.
  • Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237. https://doi.org/10.3758/PBR.16.2.225
  • Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. Journal of Applied Psychology, 91(6), 1292. https://doi.org/10.1037/0021-9010.91.6.1292
  • Steenkamp, J. B. E., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of consumer research, 25(1), 78-90. https://doi.org/10.1086/209528
  • Thompson, Y. T. (2018). Bayesian and Frequentist Approaches for Factorial Invariance Test (Doctoral dissertation, University of Oklahoma).
  • Van Doorn, J., van Den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š. L., Marsman, A., Matzke, M., Gupa, D., R, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E.-J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5
  • Verhagen, A. J., & Fox, J. P. (2013). Bayesian tests of measurement invariance. British Journal of Mathematical and Statistical Psychology, 66(3), 383-401. https://doi.org/10.1111/j.2044-8317.2012.02059.x
  • Verhagen, J., Levy, R., Millsap, R. E., & Fox, J. P. (2016). Evaluating evidence for invariant items: A Bayes factor applied to testing measurement invariance in IRT models. Journal of Mathematical Psychology, 72, 171-182. https://doi.org/10.1016/j.jmp.2015.06.005
  • Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive Psychology, 60(3), 158-189. https://doi.org/10.1016/j.cogpsych.2009.12.001
  • Wagenmakers, E. J., Verhagen, J., Ly, A., Matzke, D., Steingroever, H., Rouder, J. N., & Morey, R. D. (2017). The need for Bayesian hypothesis testing in psychological science. In S. O. Lilienfeld & I. D. Waldman (Eds.). Psychological science under scrutiny: Recent challenges and proposed solutions, (pp. 123-138). https://doi.org/10.1002/9781119095910.ch8
  • White, H. (2000). A reality check for data snooping. Econometrica, 68(5), 1097-1126. https://doi.org/10.1111/1468-0262.00152
  • Woods, C. M., Cai, L., & Wang, M. (2012). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547. https://doi.org/10.1177/0013164412464875
  • Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of educational measurement, 14(2) 97-116. https://www.jstor.org/stable/1434010

Examining Measurement Invariance in Bayesian Item Response Theory Models: A Simulation Study

Year 2023, Volume: 14 Issue: 1, 19 - 32, 25.03.2023
https://doi.org/10.21031/epod.1101457

Abstract

The aim of the study is to determine a measurement invariance cut-off point based on item parameter differences in Bayesian Item Response Theory Models. Within this scope, the Bayes factor is estimated for testing measurement invariance. For this purpose, a simulation study is conducted. The data were generated in the R software for each simulation condition under the one-parameter logistic model for 10 binary (1-0 scored) items. The invariance test was performed for various group sizes (n=500, 1000, 1500 and 2000) and difficulty parameters (dk=0, dk=0.1, dk=0.3, dk=0.5 and dk=0.7). The Bayesian analyzes were performed on the WINBUGS using the codes written in the R. A Bayes factor that provides evidence for measurement invariance was calculated depending on the parameter differences. The Savage–Dickey density ratio, one of the MCMC sampling schemas, was used to calculate the Bayes factor. As a result, if the item parameter difference is dk=0.3 and group sizes are 1500 or larger, the measurement invariance cannot be achieved. However, for small sample sizes (n=1000 or less) measurement invariance interpretation should be done carefully. When the dk=0.5, there are invariant items only in n=500. According to Bayes factor test results, evidence has been produced when dk=0.7 that measurement invariance cannot be achieved.

References

  • Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495-508. https://doi.org/10.1080/10705511.2014.919210
  • Bock, R. D., & Zimowski, M. F. (1997). The multiple groups IRT. In Wim J. van der Linden, & Ronald K. Hambleton (Eds.), Handbook of modern item response theory. Springer-Verlag. https://doi.org/10.1007/978-1-4757-2691-6_25
  • Brown, T. A. (2006). Confirmatory factor analysis for applied research. Newyork: Guilford Publications.
  • Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross -national research. Annual Review of Sociology, 40, 55-75. https://doi.org/10.1146/annurev-soc-071913-043137
  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559. https://doi.org/10.1007/s11336-008-9092-x
  • Finch, W. H. (2016). Detection of differential item functioning for more than two groups: A Monte Carlo comparison of methods. Applied measurement in Education, 29(1), 30-45. https://doi.org/10.1080/08957347.2015.1102916
  • Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Springer Science & Business Media. https://doi.org/10.1007/978-1-4419-0742-4
  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied psychological measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000
  • Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental aging research, 18(3), 117-144. https://doi.org/10.1080/03610739208253916
  • Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical irt model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306. https://doi.org/10.3102/10769986025003285
  • Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford University Press.
  • Langer, M. M. (2008). A reexamination of Lord's Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, The University of North Carolina at Chapel Hill).
  • Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. https://doi.org/10.1007/BF02294825
  • Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge. https://doi.org/10.4324/9780203821961
  • R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
  • Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological bulletin, 114(3), 552.
  • Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237. https://doi.org/10.3758/PBR.16.2.225
  • Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. Journal of Applied Psychology, 91(6), 1292. https://doi.org/10.1037/0021-9010.91.6.1292
  • Steenkamp, J. B. E., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of consumer research, 25(1), 78-90. https://doi.org/10.1086/209528
  • Thompson, Y. T. (2018). Bayesian and Frequentist Approaches for Factorial Invariance Test (Doctoral dissertation, University of Oklahoma).
  • Van Doorn, J., van Den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š. L., Marsman, A., Matzke, M., Gupa, D., R, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E.-J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5
  • Verhagen, A. J., & Fox, J. P. (2013). Bayesian tests of measurement invariance. British Journal of Mathematical and Statistical Psychology, 66(3), 383-401. https://doi.org/10.1111/j.2044-8317.2012.02059.x
  • Verhagen, J., Levy, R., Millsap, R. E., & Fox, J. P. (2016). Evaluating evidence for invariant items: A Bayes factor applied to testing measurement invariance in IRT models. Journal of Mathematical Psychology, 72, 171-182. https://doi.org/10.1016/j.jmp.2015.06.005
  • Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive Psychology, 60(3), 158-189. https://doi.org/10.1016/j.cogpsych.2009.12.001
  • Wagenmakers, E. J., Verhagen, J., Ly, A., Matzke, D., Steingroever, H., Rouder, J. N., & Morey, R. D. (2017). The need for Bayesian hypothesis testing in psychological science. In S. O. Lilienfeld & I. D. Waldman (Eds.). Psychological science under scrutiny: Recent challenges and proposed solutions, (pp. 123-138). https://doi.org/10.1002/9781119095910.ch8
  • White, H. (2000). A reality check for data snooping. Econometrica, 68(5), 1097-1126. https://doi.org/10.1111/1468-0262.00152
  • Woods, C. M., Cai, L., & Wang, M. (2012). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547. https://doi.org/10.1177/0013164412464875
  • Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of educational measurement, 14(2) 97-116. https://www.jstor.org/stable/1434010
There are 28 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Merve Ayvallı 0000-0002-7301-0096

Hülya Kelecioğlu 0000-0002-0741-9934

Publication Date March 25, 2023
Acceptance Date October 3, 2022
Published in Issue Year 2023 Volume: 14 Issue: 1

Cite

APA Ayvallı, M., & Kelecioğlu, H. (2023). Examining Measurement Invariance in Bayesian Item Response Theory Models: A Simulation Study. Journal of Measurement and Evaluation in Education and Psychology, 14(1), 19-32. https://doi.org/10.21031/epod.1101457