Cross-Lingual and Cross-Cultural Validity of the TIMSS 2019 Computer Use Questionnaire: Using Measurement Invariance and DIF Methods

Zeynep Neveser Kızılçim; Sevda Çetin

doi:10.21031/epod.1772214

Research Article

Cross-Lingual and Cross-Cultural Validity of the TIMSS 2019 Computer Use Questionnaire: Using Measurement Invariance and DIF Methods

Year 2025, Volume: 16 Issue: 4, 202 - 215, 31.12.2025

Zeynep Neveser Kızılçim , Sevda Çetin

https://doi.org/10.21031/epod.1772214

Abstract

This study investigates whether the TIMSS 2019 Computer Use Questionnaire functions equivalently across languages and cultures. Using responses from 8th-grade students in Turkey, England, and Qatar, we evaluated cross-group comparability with Multiple-Group Confirmatory Factor Analysis (MGCFA) and examined Differential Item Functioning (DIF) via Ordinal Logistic Regression (OLR) and Poly‑SIBTEST. The instrument comprises 11 Likert-type items organized into two factors—Computer Usage Frequency and Computer Usage Self‑Efficacy—supported by exploratory and confirmatory factor analyses. For the same‑culture/different‑language comparison (Qatar Arabic vs. English), configural and metric invariance were supported, whereas scalar invariance was not. For the different‑culture/different‑language comparison (England vs. Turkey), only configural invariance was obtained, indicating that factor loadings and intercepts were not fully comparable across these countries. DIF findings varied by method: OLR flagged mostly negligible DIF in the frequency items for the same‑culture comparison, while Poly‑SIBTEST identified several items with moderate to large DIF; in the cross‑culture comparison, both methods indicated DIF for most items, particularly within the self‑efficacy factor. The pattern of results suggests that linguistic adaptation, access to technology, and differences in technology‑related experiences contribute to nonequivalence. We propose revising culture‑sensitive terms, clarifying item contexts, and incorporating qualitative evidence to strengthen score comparability in future administrations.

Keywords

eTIMSS , computer use , measurement invariance , differential item function

References

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp.3–23). Lawrence Erlbaum Associates, Inc
Asil, M. & Gelbal, S. (2012). Cross-cultural Equivalence of the PISA Student Questionnaire. Education and Science, 37(166), 236-249.
Atalay, K. (2010). PISA 2006 öğrenci anketinde yer alan tutum maddelerinin değişen madde fonksiyonu açısından incelenmesi. (Yüksek Lisans Tezi) Hacettepe Üniversitesi, Ankara.
Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe Üniversitesi Eğitim Bilimleri Dergisi, 41, 36–47.
Belzak, W., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological methods, 25(6), 673.
Bialo, J. A., & Li, H. (2022). Fairness and Comparability in Achievement Motivation Items: A Differential Item Functioning Analysis. Journal of Psychoeducational Assessment, 40(6), 722-743. https://doi.org/10.1177/07342829221090113
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, N. J.: Guilford Press.
Cheung, G. W., and Rensvold, R. B. (2002). Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255.
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif : An R package for detecting differential item functioning using iterative ordinal logistic regression/ item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(8), 1–30.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Lawrence Erlbaum Associates, Inc.
Ercikan, K., & Koh, K. (2005). Examining the construct comparability of the English and French versions of TIMSS. International Journal of Testing, 5(1), 23-35.
Fishbein, B., Martin, M. O., Mullis, I. V., & Foy, P. (2018). The TIMSS 2019 item equivalence study: Examining mode effects for computer-based assessment and implications for measuring trends. Large-scale Assessments in Education, 6(1), 1–23.
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological assessment, 7(3), 286.
Gök, B., Kabasakal, K. A., & Kelecioglu, H. (2014). Analysis of attitude items in PISA 2009 Student Questionnaire in terms of differential item functioning based on culture. Journal of Measurement and Evaluation in Education and Psychology, 5(1), 72-87. https://doi.org/10.21031/epod.64124
Gören, S., Sayın, A., & Gelbal, S. (2024). An Analysis of Item Bias in the PISA 2018 Reading Understanding and Memorising Strategies Questionnaire, Kastamonu Education Journal, 32(2), 345-356. doi: 10.24106/kefdergi.X
Henze, N., & Zirkler, B. (1990). A class of invariant consistent tests for multivariate normality. Communications in statistics-Theory and Methods, 19(10), 3595-3617.
Huang, X. (2010). Differential Item Functioning: The Consequence of Language, Curriculum, or Culture? (Doktora Tezi). University of California, Berkeley.
Kaiser, H.F. (1974). An Index of Factorial Simplicity. Psychometrika, 39, 31-
Keengwe, J., & Hussein, F. (2014). Using computer-assisted instruction to enhance achievement of English language learners. Education and information technologies, 19, 295-306.
Kline, R. B. (2016). Principles and practice of structural equation modeling (4th ed.). The Guilford.
Liu, R. (2019). DIF Among English Language Learners on a Large-Scale
Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58(4), 525– 543. https://doi.org/10.1007/BF02294825
Mullen, M. R. (1995). Diagnosing measurement equivalence in cross-national research. Journal of International Business Studies, 26, 573-596.
Mohorić, T., & Takšić, V. (2016). DIF analiza upitnika emocionalne kompetentnosti Mantel-Haenszel metodom: kros-kulturalna usporedba (DIF analysis of ESCQ using Mantel-Haenszel procedure: Cross-cultural comparison). In XX. Danipsihologije u Zadru.
Molenaar, D., & Feskens, R. (2024). Relating violations of measurement invariance to group differences in response times. Psychological Methods. Advanced online publication. https://doi.org/10.1037/met0000655
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 International Results in Mathematics and Science.
OECD (2023). PISA 2022 Assessment and Analytical Framework, PISA, OECD Publishing, Paris, https://doi.org/10.1787/dfe0bf9c-en.
Oliveri, M.E., Olson, B.F., Ercikan, K. & Zumbo, B. (2012). Methodologies for Investigating Item- and Test-Level Measurement Equivalence in International Large-Scale Assessments, International Journal of Testing, 12:3, 203–223, doi: 10.1080/15305058.2011.617475
Opesemowo, O. A. G. (2025). Exploring undue advantage of differential item functioning in high-stakes assessments: Implications on sustainable development goal 4. Social Sciences & Humanities Open, 11, 101257.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33(2), 215-230. https://doi.org/10.1111/j.1745- 3984.1996.tb00490.x
Sharma, S., (1996). Applied Multivariate Techniques. John Wiley and Sons, Inc., New York
Shepard, L., Camilli, G., & Williams, D. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22(2), 77–105. https://doi.org/10.1111/j.1745-3984.1985.tb01050.x
Smith, P. B. (2004). Acquiescent response bias as an aspect of cultural communication style. Journal of cross-cultural psychology, 35(1), 50–61.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
Şekercioğlu, G. (2018). Measurement invariance: Concept and implementation. International Online Journal of Education and Teaching (IOJET), 5(3). 609-634.
Tabachnick, B. G., & Fidell, L. S. (2007). Using Multivariate Statistics (5th ed.). New York: Allyn and Bacon.
Uzun, N. B., & Gelbal, S. (2017). PISA fen başarı testinin madde yanlılığının kültür ve dil açısından incelenmesi. Kastamonu Eğitim Dergisi, 25(6), 2427-2446.
Valverde-Berrocoso, J., Acevedo-Borrega, J., & Cerezo-Pizarro, M. (2022, June). Educational technology and student performance: A systematic review. In Frontiers in Education (Vol. 7, p. 916502). Frontiers Media SA.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–69. https://doi.org/10.1177/109442810031002
Van Herk, H., Poortinga, Y. H., & Verhallen, T. M. (2004). Response styles in rating scales: Evidence of method bias in data from six EU countries. Journal of Cross-Cultural Psychology, 35(3), 346-360
Thomas, D. R., & Zumbo, B. D. (1996). Using a measure of variable importance to investigate the standardization of discriminant coefficients. Journal of Educational and Behavioral Statistics, 21(2), 110–130.
Yandı, A., Köse, İ. A., & Uysal, Ö. (2017). Farklı yöntemlerle ölçme değişmezliğinin incelenmesi: PISA 2012 örneği. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 13(1), 243-253.
Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of Educational Measurement, 30(3), 233-251.

Year 2025, Volume: 16 Issue: 4, 202 - 215, 31.12.2025

Zeynep Neveser Kızılçim , Sevda Çetin

https://doi.org/10.21031/epod.1772214

Abstract

References

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp.3–23). Lawrence Erlbaum Associates, Inc
Asil, M. & Gelbal, S. (2012). Cross-cultural Equivalence of the PISA Student Questionnaire. Education and Science, 37(166), 236-249.
Atalay, K. (2010). PISA 2006 öğrenci anketinde yer alan tutum maddelerinin değişen madde fonksiyonu açısından incelenmesi. (Yüksek Lisans Tezi) Hacettepe Üniversitesi, Ankara.
Atar, B., & Kamata, A. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe Üniversitesi Eğitim Bilimleri Dergisi, 41, 36–47.
Belzak, W., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological methods, 25(6), 673.
Bialo, J. A., & Li, H. (2022). Fairness and Comparability in Achievement Motivation Items: A Differential Item Functioning Analysis. Journal of Psychoeducational Assessment, 40(6), 722-743. https://doi.org/10.1177/07342829221090113
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, N. J.: Guilford Press.
Cheung, G. W., and Rensvold, R. B. (2002). Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255.
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif : An R package for detecting differential item functioning using iterative ordinal logistic regression/ item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(8), 1–30.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Lawrence Erlbaum Associates, Inc.
Ercikan, K., & Koh, K. (2005). Examining the construct comparability of the English and French versions of TIMSS. International Journal of Testing, 5(1), 23-35.
Fishbein, B., Martin, M. O., Mullis, I. V., & Foy, P. (2018). The TIMSS 2019 item equivalence study: Examining mode effects for computer-based assessment and implications for measuring trends. Large-scale Assessments in Education, 6(1), 1–23.
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological assessment, 7(3), 286.
Gök, B., Kabasakal, K. A., & Kelecioglu, H. (2014). Analysis of attitude items in PISA 2009 Student Questionnaire in terms of differential item functioning based on culture. Journal of Measurement and Evaluation in Education and Psychology, 5(1), 72-87. https://doi.org/10.21031/epod.64124
Gören, S., Sayın, A., & Gelbal, S. (2024). An Analysis of Item Bias in the PISA 2018 Reading Understanding and Memorising Strategies Questionnaire, Kastamonu Education Journal, 32(2), 345-356. doi: 10.24106/kefdergi.X
Henze, N., & Zirkler, B. (1990). A class of invariant consistent tests for multivariate normality. Communications in statistics-Theory and Methods, 19(10), 3595-3617.
Huang, X. (2010). Differential Item Functioning: The Consequence of Language, Curriculum, or Culture? (Doktora Tezi). University of California, Berkeley.
Kaiser, H.F. (1974). An Index of Factorial Simplicity. Psychometrika, 39, 31-
Keengwe, J., & Hussein, F. (2014). Using computer-assisted instruction to enhance achievement of English language learners. Education and information technologies, 19, 295-306.
Kline, R. B. (2016). Principles and practice of structural equation modeling (4th ed.). The Guilford.
Liu, R. (2019). DIF Among English Language Learners on a Large-Scale
Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58(4), 525– 543. https://doi.org/10.1007/BF02294825
Mullen, M. R. (1995). Diagnosing measurement equivalence in cross-national research. Journal of International Business Studies, 26, 573-596.
Mohorić, T., & Takšić, V. (2016). DIF analiza upitnika emocionalne kompetentnosti Mantel-Haenszel metodom: kros-kulturalna usporedba (DIF analysis of ESCQ using Mantel-Haenszel procedure: Cross-cultural comparison). In XX. Danipsihologije u Zadru.
Molenaar, D., & Feskens, R. (2024). Relating violations of measurement invariance to group differences in response times. Psychological Methods. Advanced online publication. https://doi.org/10.1037/met0000655
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 International Results in Mathematics and Science.
OECD (2023). PISA 2022 Assessment and Analytical Framework, PISA, OECD Publishing, Paris, https://doi.org/10.1787/dfe0bf9c-en.
Oliveri, M.E., Olson, B.F., Ercikan, K. & Zumbo, B. (2012). Methodologies for Investigating Item- and Test-Level Measurement Equivalence in International Large-Scale Assessments, International Journal of Testing, 12:3, 203–223, doi: 10.1080/15305058.2011.617475
Opesemowo, O. A. G. (2025). Exploring undue advantage of differential item functioning in high-stakes assessments: Implications on sustainable development goal 4. Social Sciences & Humanities Open, 11, 101257.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33(2), 215-230. https://doi.org/10.1111/j.1745- 3984.1996.tb00490.x
Sharma, S., (1996). Applied Multivariate Techniques. John Wiley and Sons, Inc., New York
Shepard, L., Camilli, G., & Williams, D. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22(2), 77–105. https://doi.org/10.1111/j.1745-3984.1985.tb01050.x
Smith, P. B. (2004). Acquiescent response bias as an aspect of cultural communication style. Journal of cross-cultural psychology, 35(1), 50–61.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.
Şekercioğlu, G. (2018). Measurement invariance: Concept and implementation. International Online Journal of Education and Teaching (IOJET), 5(3). 609-634.
Tabachnick, B. G., & Fidell, L. S. (2007). Using Multivariate Statistics (5th ed.). New York: Allyn and Bacon.
Uzun, N. B., & Gelbal, S. (2017). PISA fen başarı testinin madde yanlılığının kültür ve dil açısından incelenmesi. Kastamonu Eğitim Dergisi, 25(6), 2427-2446.
Valverde-Berrocoso, J., Acevedo-Borrega, J., & Cerezo-Pizarro, M. (2022, June). Educational technology and student performance: A systematic review. In Frontiers in Education (Vol. 7, p. 916502). Frontiers Media SA.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–69. https://doi.org/10.1177/109442810031002
Van Herk, H., Poortinga, Y. H., & Verhallen, T. M. (2004). Response styles in rating scales: Evidence of method bias in data from six EU countries. Journal of Cross-Cultural Psychology, 35(3), 346-360
Thomas, D. R., & Zumbo, B. D. (1996). Using a measure of variable importance to investigate the standardization of discriminant coefficients. Journal of Educational and Behavioral Statistics, 21(2), 110–130.
Yandı, A., Köse, İ. A., & Uysal, Ö. (2017). Farklı yöntemlerle ölçme değişmezliğinin incelenmesi: PISA 2012 örneği. Mersin Üniversitesi Eğitim Fakültesi Dergisi, 13(1), 243-253.
Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of Educational Measurement, 30(3), 233-251.

There are 45 citations in total.

Details

Primary Language	English
Subjects	Measurement Equivalence
Journal Section	Research Article
Authors	Zeynep Neveser Kızılçim 0000-0002-0164-5682 Sevda Çetin 0000-0001-5483-595X
Submission Date	August 26, 2025
Acceptance Date	November 2, 2025
Early Pub Date	December 2, 2025
Publication Date	December 31, 2025
Published in Issue	Year 2025 Volume: 16 Issue: 4

Cite

APA	Kızılçim, Z. N., & Çetin, S. (2025). Cross-Lingual and Cross-Cultural Validity of the TIMSS 2019 Computer Use Questionnaire: Using Measurement Invariance and DIF Methods. Journal of Measurement and Evaluation in Education and Psychology, 16(4), 202-215. https://doi.org/10.21031/epod.1772214

Article Files

Full Text