Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495-508. https://doi.org/10.1080/10705511.2014.919210
Bock, R. D., & Zimowski, M. F. (1997). The multiple groups IRT. In Wim J. van der Linden, & Ronald K. Hambleton (Eds.), Handbook of modern item response theory. Springer-Verlag. https://doi.org/10.1007/978-1-4757-2691-6_25
Brown, T. A. (2006). Confirmatory factor analysis for applied research. Newyork: Guilford Publications.
Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross -national research. Annual Review of Sociology, 40, 55-75. https://doi.org/10.1146/annurev-soc-071913-043137
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559. https://doi.org/10.1007/s11336-008-9092-x
Finch, W. H. (2016). Detection of differential item functioning for more than two groups: A Monte Carlo comparison of methods. Applied measurement in Education, 29(1), 30-45. https://doi.org/10.1080/08957347.2015.1102916
Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Springer Science & Business Media. https://doi.org/10.1007/978-1-4419-0742-4
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied psychological measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental aging research, 18(3), 117-144. https://doi.org/10.1080/03610739208253916
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical irt model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306. https://doi.org/10.3102/10769986025003285
Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford University Press.
Langer, M. M. (2008). A reexamination of Lord's Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, The University of North Carolina at Chapel Hill).
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. https://doi.org/10.1007/BF02294825
Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge. https://doi.org/10.4324/9780203821961
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological bulletin, 114(3), 552.
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237. https://doi.org/10.3758/PBR.16.2.225
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. Journal of Applied Psychology, 91(6), 1292. https://doi.org/10.1037/0021-9010.91.6.1292
Steenkamp, J. B. E., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of consumer research, 25(1), 78-90. https://doi.org/10.1086/209528
Thompson, Y. T. (2018). Bayesian and Frequentist Approaches for Factorial Invariance Test (Doctoral dissertation, University of Oklahoma).
Van Doorn, J., van Den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š. L., Marsman, A., Matzke, M., Gupa, D., R, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E.-J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5
Verhagen, A. J., & Fox, J. P. (2013). Bayesian tests of measurement invariance. British Journal of Mathematical and Statistical Psychology, 66(3), 383-401. https://doi.org/10.1111/j.2044-8317.2012.02059.x
Verhagen, J., Levy, R., Millsap, R. E., & Fox, J. P. (2016). Evaluating evidence for invariant items: A Bayes factor applied to testing measurement invariance in IRT models. Journal of Mathematical Psychology, 72, 171-182. https://doi.org/10.1016/j.jmp.2015.06.005
Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive Psychology, 60(3), 158-189. https://doi.org/10.1016/j.cogpsych.2009.12.001
Wagenmakers, E. J., Verhagen, J., Ly, A., Matzke, D., Steingroever, H., Rouder, J. N., & Morey, R. D. (2017). The need for Bayesian hypothesis testing in psychological science. In S. O. Lilienfeld & I. D. Waldman (Eds.). Psychological science under scrutiny: Recent challenges and proposed solutions, (pp. 123-138). https://doi.org/10.1002/9781119095910.ch8
White, H. (2000). A reality check for data snooping. Econometrica, 68(5), 1097-1126. https://doi.org/10.1111/1468-0262.00152
Woods, C. M., Cai, L., & Wang, M. (2012). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547. https://doi.org/10.1177/0013164412464875
Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of educational measurement, 14(2) 97-116. https://www.jstor.org/stable/1434010
Examining Measurement Invariance in Bayesian Item Response Theory Models: A Simulation Study
The aim of the study is to determine a measurement invariance cut-off point based on item parameter differences in Bayesian Item Response Theory Models. Within this scope, the Bayes factor is estimated for testing measurement invariance. For this purpose, a simulation study is conducted. The data were generated in the R software for each simulation condition under the one-parameter logistic model for 10 binary (1-0 scored) items. The invariance test was performed for various group sizes (n=500, 1000, 1500 and 2000) and difficulty parameters (dk=0, dk=0.1, dk=0.3, dk=0.5 and dk=0.7). The Bayesian analyzes were performed on the WINBUGS using the codes written in the R. A Bayes factor that provides evidence for measurement invariance was calculated depending on the parameter differences. The Savage–Dickey density ratio, one of the MCMC sampling schemas, was used to calculate the Bayes factor. As a result, if the item parameter difference is dk=0.3 and group sizes are 1500 or larger, the measurement invariance cannot be achieved. However, for small sample sizes (n=1000 or less) measurement invariance interpretation should be done carefully. When the dk=0.5, there are invariant items only in n=500. According to Bayes factor test results, evidence has been produced when dk=0.7 that measurement invariance cannot be achieved.
Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495-508. https://doi.org/10.1080/10705511.2014.919210
Bock, R. D., & Zimowski, M. F. (1997). The multiple groups IRT. In Wim J. van der Linden, & Ronald K. Hambleton (Eds.), Handbook of modern item response theory. Springer-Verlag. https://doi.org/10.1007/978-1-4757-2691-6_25
Brown, T. A. (2006). Confirmatory factor analysis for applied research. Newyork: Guilford Publications.
Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross -national research. Annual Review of Sociology, 40, 55-75. https://doi.org/10.1146/annurev-soc-071913-043137
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559. https://doi.org/10.1007/s11336-008-9092-x
Finch, W. H. (2016). Detection of differential item functioning for more than two groups: A Monte Carlo comparison of methods. Applied measurement in Education, 29(1), 30-45. https://doi.org/10.1080/08957347.2015.1102916
Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Springer Science & Business Media. https://doi.org/10.1007/978-1-4419-0742-4
Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied psychological measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental aging research, 18(3), 117-144. https://doi.org/10.1080/03610739208253916
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical irt model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306. https://doi.org/10.3102/10769986025003285
Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford University Press.
Langer, M. M. (2008). A reexamination of Lord's Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, The University of North Carolina at Chapel Hill).
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. https://doi.org/10.1007/BF02294825
Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge. https://doi.org/10.4324/9780203821961
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological bulletin, 114(3), 552.
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237. https://doi.org/10.3758/PBR.16.2.225
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. Journal of Applied Psychology, 91(6), 1292. https://doi.org/10.1037/0021-9010.91.6.1292
Steenkamp, J. B. E., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of consumer research, 25(1), 78-90. https://doi.org/10.1086/209528
Thompson, Y. T. (2018). Bayesian and Frequentist Approaches for Factorial Invariance Test (Doctoral dissertation, University of Oklahoma).
Van Doorn, J., van Den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š. L., Marsman, A., Matzke, M., Gupa, D., R, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E.-J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5
Verhagen, A. J., & Fox, J. P. (2013). Bayesian tests of measurement invariance. British Journal of Mathematical and Statistical Psychology, 66(3), 383-401. https://doi.org/10.1111/j.2044-8317.2012.02059.x
Verhagen, J., Levy, R., Millsap, R. E., & Fox, J. P. (2016). Evaluating evidence for invariant items: A Bayes factor applied to testing measurement invariance in IRT models. Journal of Mathematical Psychology, 72, 171-182. https://doi.org/10.1016/j.jmp.2015.06.005
Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive Psychology, 60(3), 158-189. https://doi.org/10.1016/j.cogpsych.2009.12.001
Wagenmakers, E. J., Verhagen, J., Ly, A., Matzke, D., Steingroever, H., Rouder, J. N., & Morey, R. D. (2017). The need for Bayesian hypothesis testing in psychological science. In S. O. Lilienfeld & I. D. Waldman (Eds.). Psychological science under scrutiny: Recent challenges and proposed solutions, (pp. 123-138). https://doi.org/10.1002/9781119095910.ch8
White, H. (2000). A reality check for data snooping. Econometrica, 68(5), 1097-1126. https://doi.org/10.1111/1468-0262.00152
Woods, C. M., Cai, L., & Wang, M. (2012). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547. https://doi.org/10.1177/0013164412464875
Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of educational measurement, 14(2) 97-116. https://www.jstor.org/stable/1434010
Ayvallı, M., & Kelecioğlu, H. (2023). Examining Measurement Invariance in Bayesian Item Response Theory Models: A Simulation Study. Journal of Measurement and Evaluation in Education and Psychology, 14(1), 19-32. https://doi.org/10.21031/epod.1101457