Research Article
BibTex RIS Cite
Year 2023, , 76 - 94, 25.03.2023
https://doi.org/10.21031/epod.1218144

Abstract

References

  • Akbay, L. (2021). Impact of retrofitting and item ordering on DIF. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 212-225. https://doi.org/10.21031/epod.886920
  • Asil, M., & Gelbal, S. (2012). Cross-cultural equivalence of the PISA student questionnaire. Education and Science, 37(166), 236-249. https://eb.ted.org.tr/index.php/EB/article/view/1501
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items.Sage.
  • Cokluk, O., Gul, E., & Dogan-Gul, Ç. (2016). Examining differential item functions of different item ordered test forms according to item difficulty levels. Educational Sciences: Theory and Practice, 16(1), 319-330. http://dx.doi.org/10.12738/estp.2016.1.0329
  • de la Torre, J. (2008). An empirically-based method of Q-matrix validation for the DINA model: development and applications. Journal of Educational Measurement, 45, 343–362. https://doi.org/10.1111/j.1745-3984.2008.00069.x
  • de la Torre, J., & Chiu, C. Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253-273. https://doi.org/10.1007/s11336-015-9467-8
  • de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353. https://doi.org/10.1007/BF02295640
  • de la Torre, J., & Minchen, N. (2014). Cognitively diagnostic assessments and the cognitive diagnosis model framework. Psicología Educativa, 20(2), 89-97. https://doi.org/10 .1016/j.pse.2014.11.001
  • DiBello, L. V., & Stout, W. (2007). IRT-based cognitive diagnostic models and related methods. Journal of Educational Measurement, 44, 285-291. https://doi.org/10.1111/j.1745-3984.2007.00039.x
  • Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland, & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Earlbaum. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x
  • Gierl, M. J., Alves, C., & Majeau, R. T. (2010). Using the attribute hierarchy method to make diagnostic inferences about examinees’ knowledge and skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing, 1, 318-341. https://doi.org/10.1080/15305058.2010.509554
  • Hasancebi, B. (2021). Farklı ölçek tiplerinde değişen madde fonksiyonunun belirlenmesi ve yöntemlerin karşılaştırılması üzerine bir çalışma [A study on determination of item response function in different scale types and comparison of methods] (Thesis No.687568) [Doctoral dissertation, Karadeniz Teknik University]. Council of Higher Education Thesis. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Hou, L., de la Torre, J. D., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: Application of the wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98-125. https://doi.org/10.1111/jedm.12036
  • Hou, L., Terzi R., & de la Torre, J. (2020). Wald test formulations in DIF detection of CDM data with the proportional reasoning test. International Journal of Assessment Tools in Education, 7(2), 145-158. https://doi.org/10.21449/ijate.689752
  • Kan, A., Sünbül, Ö., & Ömür, S. (2013). Examination of the item functions of the 6th - 8th grade exams subtests according to various methods. Mersin University Journal of the Faculty of Education, 9(2), 207-222. https://dergipark.org.tr/tr/download/article-file/160893
  • Kang, C., Yang, Y., & Zeng, P. (2018). Q-Matrix refinement based on item fit statistic RMSEA. Applied Psychological Measurement, 43(527-542). https://doi.org/10.1177/0146621618813104
  • Lee, S., Han, S., & Choi, S. W. (2021). DIF detection with zero-inflation under the factor mixture modeling framework. Educational and Psychological Measurement, 1(21). https://doi.org/10.1177/00131644211028995
  • Li, F. (2008). A modified higher-order DINA model for detecting differential item functioning and differential attribute functioning [Unpublished doctoral dissertation]. The University of Georgia.
  • Liu, Y., Yin, H., Xin, T., Shao, L., & Yuan, L. (2019). A comparison of differential item functioning detection methods in cognitive diagnostic models. Frontiers in Psychology, 10, 11-37. https://doi.org/ 10.3389/fpsyg.2019.01137
  • Lord F. M. (1980). Applications of item response theory to practical testing problems. Routledge. https://doi.org/10.4324/9780203056615
  • Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93, 1–26. https://doi.org/10.18637/jss.v093.i14
  • Ma, W., & de la Torre, J. (2019b). GDINA: The generalized DINA model framework. R package version (2.7.3). Retrieved from https://CRAN.R-project.org/package=GDINA
  • Ma, W., Terzi, R., & de la Torre, J. (2021). Detecting differential item functioning using multiple-group cognitive diagnosis models. Applied Psychological Measurement, 45(1), 37-53. https://doi.org/10.1177/0146621620965745
  • Magis, D., Beland, S., & Raiche, G. (2018). difR: collection of methods to detect dichotomous differential item functioning (DIF) (Version 5.0). https://CRAN.R-project.org/package=difR
  • Mantel, N. & Haenszel, W. M. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of National Cancer Institute, 22, 719- 748. https://doi.org/10.1093/jnci/22.4.719
  • Mehrazmay, R., Ghonsooly, B., & de la Torre, J. (2021) Detecting differential item functioning using cognitive diagnosis models: Applications of the wald test and likelihood ratio test in a university entrance examination, Applied Measurement in Education, 34(4), 262-284. https://doi.org/10.1080/08957347.2021.1987906
  • Milewski, G. B., & Baron, P. A. (2002, April, 2-4). Extending DIF methods to inform aggregate reports on cognitive skills. [Conference presentation]. The Annual Meeting of the National Council on Measurement in Education, New Orleans, LA. https://files.eric.ed.gov/fulltext/ED466712.pdf
  • Mullis, I. V.S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y. & Preuschoff, C. (2009). TIMSS 2011 assessment frameworks. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
  • Odabas, M. (2016). Değişen madde fonksiyonunu belirlemede DINA modelde işaretli alan indeksi, standardizasyon, ve lojistik regresyon tekniklerinin karşılaştırılması [The comparison of DINA model signed difference index, standardization and logistic regression techniques for detecting differential item functioning] (Thesis No.446894) [Doctoral dissertation, Hacettepe University]. Council of Higher Education Thesis. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Paulsen, J., Svetina, D., Feng, Y., & Valdivia, M. (2020). Examining the impact of differential item functioning on classification accuracy in cognitive diagnostic models. Applied Psychological Measurement, 44, 267–281. https://doi.org/10.1177/0146621619858675
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502. https://link.springer.com/article/10.1007/BF02294403
  • Ravand, H., & Baghaei, P. (2019). Diagnostic classification models: Recent developments, practical issues, and prospects. International Journal of Testing, 20(1), 24-56. https://doi.org/10.1080/15305058.2019.1588278
  • Robitzsch, A., Kiefer, T., George, A. C., & Ünlü, A. (2014). CDM: Cognitive Diagnosis Modeling (Version 3.12). https://CRAN.R-project.org/package=difR
  • Rupp, A., Templin, J., & Henson, R. (2010). Diagnostic measurement: Theory, methods, and applications.Guilford.
  • Sen, S., & Arıcan, M. (2015). A diagnostic comparison of Turkish and Korean students’ mathematics performances on the TIMSS 2011 assessment. Journal of Measurement and Evaluation in Education and Psychology, 6(2), 238-253. https://doi.org/10.21031/epod.65266
  • Svetina, D., Feng, Y., Paulsen, J., Valdivia, M., Valdivia, A., & Dai, S. (2018). Examining DIF in the context of CDMs when the Q-matrix is misspecified. Frontiers in Psychology, 9(696). https://doi.org/10.3389/fpsyg.2018.00696
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Terzi, R., & Sen, S. (2019). A nondiagnostic assessment for diagnostic purposes: Q-matrix validation and item based model fit evaluation for the TIMSS 2011 assessment. SAGE Open, 9, 1–11. https://doi.org/10.1177/2158244019832684
  • Zhang, W. (2006). Detecting differential item functioning using the DINA model (Unpublished doctoral dissertation]. University of North Carolina at Greensboro.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223-233. https://doi.org/10.1080/15434300701375832
  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. D., & Thomas, D. R. (1996, October). A measure of DIF effect size using logistic regression procedures [Conference presentation]. The National Board of Medical Examiners, Philadelphia. https://scholar.google.com/scholar?cluster=15614527111689986107&hl=tr&lr=lang_tr&as_sdt=2005&sciodt=0,5&as_ylo=20

Comparison of Methods Used in Detection of DIF in Cognitive Diagnostic Models with Traditional Methods: Applications in TIMSS 2011

Year 2023, , 76 - 94, 25.03.2023
https://doi.org/10.21031/epod.1218144

Abstract

This study aims to compare the Wald test and likelihood ratio test (LRT) approaches with Classical Test Theory (CTT) and Item Response Theory (IRT) based differential item functioning (DIF) detection methods in the context of cognitive diagnostic models (CDMs), using the TIMSS 2011 dataset as a retrofitting study. CDMs, which have a significant potential when determining the DIF and their contribution to validity, can give confidence under the strong methodological background condition is met. Therefore, it is hoped that this study will contribute to the literature to ensure the correct usage of CDMs and evaluate the compatibility of these new approaches with traditional methods. According to the analysis results, thirty-one items showed differences between the cognitive diagnosis assessments and the traditional methods. The item with the largest DIF was found in the Raju Unsigned Area Measures technique in IRT, whereas the item with the lowest DIF was found in the Wald test technique developed for CDMs. In general, the analyses show that methods not based on CDMs detect more items with DIF, but the Wald test and LRT methods based on CDMs detect fewer items with DIF. This study conducted DIF analyses to determine the test's psychometric properties within the framework of CDMs rather than the source of the bias. Researchers can take the study one step further and make more specific assessments about the items' bias regarding the test structure, test scope, and subgroups. In addition, DIF analyses in this study were carried out using only the gender variable, and researchers can use different variables to conduct studies specific to their purpose.

References

  • Akbay, L. (2021). Impact of retrofitting and item ordering on DIF. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 212-225. https://doi.org/10.21031/epod.886920
  • Asil, M., & Gelbal, S. (2012). Cross-cultural equivalence of the PISA student questionnaire. Education and Science, 37(166), 236-249. https://eb.ted.org.tr/index.php/EB/article/view/1501
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items.Sage.
  • Cokluk, O., Gul, E., & Dogan-Gul, Ç. (2016). Examining differential item functions of different item ordered test forms according to item difficulty levels. Educational Sciences: Theory and Practice, 16(1), 319-330. http://dx.doi.org/10.12738/estp.2016.1.0329
  • de la Torre, J. (2008). An empirically-based method of Q-matrix validation for the DINA model: development and applications. Journal of Educational Measurement, 45, 343–362. https://doi.org/10.1111/j.1745-3984.2008.00069.x
  • de la Torre, J., & Chiu, C. Y. (2016). A general method of empirical Q-matrix validation. Psychometrika, 81(2), 253-273. https://doi.org/10.1007/s11336-015-9467-8
  • de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353. https://doi.org/10.1007/BF02295640
  • de la Torre, J., & Minchen, N. (2014). Cognitively diagnostic assessments and the cognitive diagnosis model framework. Psicología Educativa, 20(2), 89-97. https://doi.org/10 .1016/j.pse.2014.11.001
  • DiBello, L. V., & Stout, W. (2007). IRT-based cognitive diagnostic models and related methods. Journal of Educational Measurement, 44, 285-291. https://doi.org/10.1111/j.1745-3984.2007.00039.x
  • Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland, & H. Wainer (Eds.), Differential item functioning (pp. 35–66). Earlbaum. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x
  • Gierl, M. J., Alves, C., & Majeau, R. T. (2010). Using the attribute hierarchy method to make diagnostic inferences about examinees’ knowledge and skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing, 1, 318-341. https://doi.org/10.1080/15305058.2010.509554
  • Hasancebi, B. (2021). Farklı ölçek tiplerinde değişen madde fonksiyonunun belirlenmesi ve yöntemlerin karşılaştırılması üzerine bir çalışma [A study on determination of item response function in different scale types and comparison of methods] (Thesis No.687568) [Doctoral dissertation, Karadeniz Teknik University]. Council of Higher Education Thesis. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Hou, L., de la Torre, J. D., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnostic modeling: Application of the wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98-125. https://doi.org/10.1111/jedm.12036
  • Hou, L., Terzi R., & de la Torre, J. (2020). Wald test formulations in DIF detection of CDM data with the proportional reasoning test. International Journal of Assessment Tools in Education, 7(2), 145-158. https://doi.org/10.21449/ijate.689752
  • Kan, A., Sünbül, Ö., & Ömür, S. (2013). Examination of the item functions of the 6th - 8th grade exams subtests according to various methods. Mersin University Journal of the Faculty of Education, 9(2), 207-222. https://dergipark.org.tr/tr/download/article-file/160893
  • Kang, C., Yang, Y., & Zeng, P. (2018). Q-Matrix refinement based on item fit statistic RMSEA. Applied Psychological Measurement, 43(527-542). https://doi.org/10.1177/0146621618813104
  • Lee, S., Han, S., & Choi, S. W. (2021). DIF detection with zero-inflation under the factor mixture modeling framework. Educational and Psychological Measurement, 1(21). https://doi.org/10.1177/00131644211028995
  • Li, F. (2008). A modified higher-order DINA model for detecting differential item functioning and differential attribute functioning [Unpublished doctoral dissertation]. The University of Georgia.
  • Liu, Y., Yin, H., Xin, T., Shao, L., & Yuan, L. (2019). A comparison of differential item functioning detection methods in cognitive diagnostic models. Frontiers in Psychology, 10, 11-37. https://doi.org/ 10.3389/fpsyg.2019.01137
  • Lord F. M. (1980). Applications of item response theory to practical testing problems. Routledge. https://doi.org/10.4324/9780203056615
  • Ma, W., & de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93, 1–26. https://doi.org/10.18637/jss.v093.i14
  • Ma, W., & de la Torre, J. (2019b). GDINA: The generalized DINA model framework. R package version (2.7.3). Retrieved from https://CRAN.R-project.org/package=GDINA
  • Ma, W., Terzi, R., & de la Torre, J. (2021). Detecting differential item functioning using multiple-group cognitive diagnosis models. Applied Psychological Measurement, 45(1), 37-53. https://doi.org/10.1177/0146621620965745
  • Magis, D., Beland, S., & Raiche, G. (2018). difR: collection of methods to detect dichotomous differential item functioning (DIF) (Version 5.0). https://CRAN.R-project.org/package=difR
  • Mantel, N. & Haenszel, W. M. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of National Cancer Institute, 22, 719- 748. https://doi.org/10.1093/jnci/22.4.719
  • Mehrazmay, R., Ghonsooly, B., & de la Torre, J. (2021) Detecting differential item functioning using cognitive diagnosis models: Applications of the wald test and likelihood ratio test in a university entrance examination, Applied Measurement in Education, 34(4), 262-284. https://doi.org/10.1080/08957347.2021.1987906
  • Milewski, G. B., & Baron, P. A. (2002, April, 2-4). Extending DIF methods to inform aggregate reports on cognitive skills. [Conference presentation]. The Annual Meeting of the National Council on Measurement in Education, New Orleans, LA. https://files.eric.ed.gov/fulltext/ED466712.pdf
  • Mullis, I. V.S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y. & Preuschoff, C. (2009). TIMSS 2011 assessment frameworks. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
  • Odabas, M. (2016). Değişen madde fonksiyonunu belirlemede DINA modelde işaretli alan indeksi, standardizasyon, ve lojistik regresyon tekniklerinin karşılaştırılması [The comparison of DINA model signed difference index, standardization and logistic regression techniques for detecting differential item functioning] (Thesis No.446894) [Doctoral dissertation, Hacettepe University]. Council of Higher Education Thesis. https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Paulsen, J., Svetina, D., Feng, Y., & Valdivia, M. (2020). Examining the impact of differential item functioning on classification accuracy in cognitive diagnostic models. Applied Psychological Measurement, 44, 267–281. https://doi.org/10.1177/0146621619858675
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502. https://link.springer.com/article/10.1007/BF02294403
  • Ravand, H., & Baghaei, P. (2019). Diagnostic classification models: Recent developments, practical issues, and prospects. International Journal of Testing, 20(1), 24-56. https://doi.org/10.1080/15305058.2019.1588278
  • Robitzsch, A., Kiefer, T., George, A. C., & Ünlü, A. (2014). CDM: Cognitive Diagnosis Modeling (Version 3.12). https://CRAN.R-project.org/package=difR
  • Rupp, A., Templin, J., & Henson, R. (2010). Diagnostic measurement: Theory, methods, and applications.Guilford.
  • Sen, S., & Arıcan, M. (2015). A diagnostic comparison of Turkish and Korean students’ mathematics performances on the TIMSS 2011 assessment. Journal of Measurement and Evaluation in Education and Psychology, 6(2), 238-253. https://doi.org/10.21031/epod.65266
  • Svetina, D., Feng, Y., Paulsen, J., Valdivia, M., Valdivia, A., & Dai, S. (2018). Examining DIF in the context of CDMs when the Q-matrix is misspecified. Frontiers in Psychology, 9(696). https://doi.org/10.3389/fpsyg.2018.00696
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Terzi, R., & Sen, S. (2019). A nondiagnostic assessment for diagnostic purposes: Q-matrix validation and item based model fit evaluation for the TIMSS 2011 assessment. SAGE Open, 9, 1–11. https://doi.org/10.1177/2158244019832684
  • Zhang, W. (2006). Detecting differential item functioning using the DINA model (Unpublished doctoral dissertation]. University of North Carolina at Greensboro.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223-233. https://doi.org/10.1080/15434300701375832
  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. D., & Thomas, D. R. (1996, October). A measure of DIF effect size using logistic regression procedures [Conference presentation]. The National Board of Medical Examiners, Philadelphia. https://scholar.google.com/scholar?cluster=15614527111689986107&hl=tr&lr=lang_tr&as_sdt=2005&sciodt=0,5&as_ylo=20
There are 42 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Büşra Eren 0000-0001-7565-1025

Tuba Gündüz 0000-0002-0921-9290

Şeref Tan 0000-0002-9892-3369

Publication Date March 25, 2023
Acceptance Date March 6, 2023
Published in Issue Year 2023

Cite

APA Eren, B., Gündüz, T., & Tan, Ş. (2023). Comparison of Methods Used in Detection of DIF in Cognitive Diagnostic Models with Traditional Methods: Applications in TIMSS 2011. Journal of Measurement and Evaluation in Education and Psychology, 14(1), 76-94. https://doi.org/10.21031/epod.1218144