Research Article
BibTex RIS Cite

Purification procedures used for the detection of gender DIF: Item bias in a foreign language test

Year 2023, Volume: 10 Issue: 4, 765 - 780, 23.12.2023
https://doi.org/10.21449/ijate.1250358

Abstract

In the current study, differential item functioning (DIF) detection using real data was conducted with the application of "Mantel-Haenszel (MH)", "Simultaneous item bias test (SIBTEST)", "Lord's chi-square", and "Raju's area" methods, both when item purification was carried out and when item purification was not. After detecting gender-related DIF, expert opinions were obtained for a bias study since it is important to conduct gender bias research in the English test. Additionally, in the relevant literature, there were some DIF studies, but not completely similar bias studies. The sample of the research consisted of 7,389 students who took the "Transition from Primary to Secondary Education Exam (TPSEE, referred to as "TEOG" in Turkish)" administered in April 2017. When gender-related DIF analysis was performed with the aforementioned four methods, the results were found to differ partially. DIF analysis results differed in different conditions based on whether item purification was performed or not. Furthermore, the detection of DIF was indicative of potential bias. In the second stage of the study, the opinions of seven experts were sought for item 11, for which DIF was detected at least at B level based on MH, SIBTEST. As a result of expert opinion, it was established that there was no bias based on gender in any of the items in the English test. It is advised that akin bias studies be carried out to enable test developers to be aware of characteristics that may result in item bias and construct unbiased items.

References

  • Akcan, R., & Atalay Kabasakal, K.A. (2019). An investigation of item bias of English test: The case of 2016 year undergraduate placement exam in Turkey. International Journal of Assessment Tools in Education, 6(1), 48-62. https://doi.org/10.21449/ijate.508581
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
  • Bakan Kalaycıoğlu, D. (2022). Gender-based differential item functioning analysis of the medical specialization education entrance examination. Journal of Measurement and Evaluation in Education and Psychology, 13(1), 1 13. https://doi.org/10.21031/epod.998592
  • Bakan Kalaycıoğlu, D., & Kelecioğlu, H. (2011). Item bias analysis of the university entrance examination. Education and Science, 36(161), 3–13.
  • Camilli, G. & Shepard, A.L. (1994). Methods for identifying biased test items (1st ed.). Sage.
  • Chalmers, R.P. (2018). Improving the crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika, 83(2), 376-386.
  • Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory (1st ed.). Holt, Rinehart and Winston.
  • Çepni, Z., & Kelecioğlu, H. (2021). Detecting differential item functioning using SIBTEST, MH, LR and IRT methods. Journal of Measurement and Evaluation in Education and Psychology, 12(3), 267-285. https://doi.org/10.21031/epod.988879
  • Emily, D., Brooks, G., & Johanson, G. (2021). Detecting differential ıtem functioning: Item response theory methods versus the Mantel-Haenszel procedure. International Journal of Assessment Tools in Education, 8(2), 376-393. https://doi.org/10.21449/ijate.730141
  • Fidalgo, A.M., Mellenbergh, G.J., & Muñiz, J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research Online, 5(3), 43-53.
  • Freelon, D. (2013). ReCal OIR: Ordinal, interval, and ratio intercoder reliability as a web service. International Journal of Internet Science, 8(1), 10-16.
  • Hambleton, R.K., & Rogers, H.J. (1989). Detecting potentially biased test items: Comparison of IRT area and Mantel-Haenszel methods. Applied Measurement in Education, 2(4), 313-334. https://doi.org/10.1207/s15324818ame0204_4
  • Holland, P.W., & Thayer, D.T. (1986). Differential item functioning and the Mantel‐Haenszel procedure. ETS Research Report Series, (2), i-24. https://doi.org/10.1002/j.2330-8516.1986.tb00186.x
  • Holland, P.W., & Wainer, H. (Eds.) (1993). Differential item functioning (1st ed.). Lawrence Erlbaum.
  • Karakaya, İ. (2012). An investigation of item bias in science and technology subtests and mathematic subtests in Level Determination Exam. Educational Sciences: Theory and Practice, 12(1), 215–229.
  • Karakaya, İ., & Kutlu, Ö. (2012). An investigation of item bias in Turkish subtests in Level Determination Exam. Education and Science, 37(165), 348–362.
  • Khalid, M.N., & Glas, C.A. (2014). A scale purification procedure for evaluation of differential item functioning. Measurement, 50, 186 197. https://doi.org/10.1016/j.measurement.2013.12.019
  • Li, H.H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647–677
  • Llach, M.P.A., & Gallego, M.T. (2012). Vocabulary knowledge development and gender differences in a second language. Elia, 12(1), 45-75.
  • Lord, F.M. (1980). Applications of item response theory to practical problems (1st edition). Erlbaum.
  • Magis, D., & Facon, B. (2013). Item purification does not always improve DIF detection: A counterexample with Angoff’s delta plot. Educational and Psychological Measurement, 73(2), 293-311. https://doi.org/10.1177/0013164412451903
  • Martinkova, P., & Drabinova, A. (2018). ShinyItemAnalysis for teaching psychometrics and to enforce routine analysis of educational tests. The R Journal, 10(2), 503-515. https://doi.org/10.32614/RJ-2018-074
  • Osterlind, S.J. (1983). Test item bias (1st ed.). Sage.
  • Özdemir, B. (2015). A comparison of IRT-based methods for examining differential item functioning in TIMSS 2011 mathematics subtest. Procedia-Social and Behavioral Sciences, 174, 2075-2083. https://doi.org/10.1016/j.sbspro.2015.02.004
  • R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
  • Raju, N.S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
  • Raju, N.S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197- 207. https://doi.org/10.1177/014662169001400208
  • Roussos, L., & Stout, W. (1996) A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371. https://doi.org/10.1177/014662169602000404
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159–194.
  • Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2 3), 170 187. https://doi.org/10.1080/13803611.2013.767621
  • Soysal, S., & Yılmaz Koğar, E.Y. (2021). An investigation of item position effects by means of IRT-based differential item functioning methods. International Journal of Assessment Tools in Education, 8(2), 239-256. https://doi.org/10.21449/ijate.779963
  • Tunc, E.B., Uluman, M., & Avcu, A. (2018). Revisiting the effect of ıtem purification on differantial ıtem functioning; real data findings. International Online Journal of Educational Sciences, 10(5), 139- 147. https://doi.org/10.15345/iojes.2018.05.010
  • Wang, W.C., & Su, Y.H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113 144. https://doi.org/10.1207/s15324818ame1702_2
  • Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A theoretic comparison of methods [Dissertation, Umea University]. Umea University Libraries EM No 60.
  • Yıldırım, H., & Büyüköztürk, Ş. (2018). Using the delphi technique and focus-group interviews to determine item bias on the mathematics section of the Level Determination Exam for 2012. Educational Sciences: Theory & Practice, 18(2), 447-470.
  • Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland, & H. Wainer, Differential Item Functioning (pp. 337-347). Erlbaum.
  • Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Ottawa: National Defense Headquarters, 160. https://faculty.educ.ubc.ca/zumbo/DIF/handbook.pdf

Purification procedures used for the detection of gender DIF: Item bias in a foreign language test

Year 2023, Volume: 10 Issue: 4, 765 - 780, 23.12.2023
https://doi.org/10.21449/ijate.1250358

Abstract

Differential item functioning (DIF) detection was handled based on “Mantel-Haenszel (MH)”, “Simultaneous item bias test (SIBTEST)”, “Lord's chi-square”, “Raju's area” methods when item purification was performed or item purification was not performed using real data in current study. After detecting gender-related DIF, expert opinions were taken for bias study. It is important to conduct the gender bias research in the English test when purification is performed and when purification is not performed, as there were DIF studies, but there were not completely similar bias studies in the literature. The sample of the research consists of 7389 students who took the “Transition from Primary to Secondary Education Exam (TPSEE, referred to as “TEOG” in Turkey)” administered in April 2017. When gender-related DIF analysis was performed with the four methods, the results were found to differ partially. DIF analysis results differed in the different conditions item purification was performed or not. Detection of DIF was indicative of possible bias. In the second stage of the study, the opinions of seven experts were taken for item 11, for which DIF was detected at least at B level based on MH, SIBTEST. As a result of expert opinion, it was found that there was no item bias according to gender in any item in the English test. It is recommended that similar bias studies can be conducted for test developers to be aware of the features that may lead to item bias and to construct unbiased items.

References

  • Akcan, R., & Atalay Kabasakal, K.A. (2019). An investigation of item bias of English test: The case of 2016 year undergraduate placement exam in Turkey. International Journal of Assessment Tools in Education, 6(1), 48-62. https://doi.org/10.21449/ijate.508581
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
  • Bakan Kalaycıoğlu, D. (2022). Gender-based differential item functioning analysis of the medical specialization education entrance examination. Journal of Measurement and Evaluation in Education and Psychology, 13(1), 1 13. https://doi.org/10.21031/epod.998592
  • Bakan Kalaycıoğlu, D., & Kelecioğlu, H. (2011). Item bias analysis of the university entrance examination. Education and Science, 36(161), 3–13.
  • Camilli, G. & Shepard, A.L. (1994). Methods for identifying biased test items (1st ed.). Sage.
  • Chalmers, R.P. (2018). Improving the crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika, 83(2), 376-386.
  • Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory (1st ed.). Holt, Rinehart and Winston.
  • Çepni, Z., & Kelecioğlu, H. (2021). Detecting differential item functioning using SIBTEST, MH, LR and IRT methods. Journal of Measurement and Evaluation in Education and Psychology, 12(3), 267-285. https://doi.org/10.21031/epod.988879
  • Emily, D., Brooks, G., & Johanson, G. (2021). Detecting differential ıtem functioning: Item response theory methods versus the Mantel-Haenszel procedure. International Journal of Assessment Tools in Education, 8(2), 376-393. https://doi.org/10.21449/ijate.730141
  • Fidalgo, A.M., Mellenbergh, G.J., & Muñiz, J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research Online, 5(3), 43-53.
  • Freelon, D. (2013). ReCal OIR: Ordinal, interval, and ratio intercoder reliability as a web service. International Journal of Internet Science, 8(1), 10-16.
  • Hambleton, R.K., & Rogers, H.J. (1989). Detecting potentially biased test items: Comparison of IRT area and Mantel-Haenszel methods. Applied Measurement in Education, 2(4), 313-334. https://doi.org/10.1207/s15324818ame0204_4
  • Holland, P.W., & Thayer, D.T. (1986). Differential item functioning and the Mantel‐Haenszel procedure. ETS Research Report Series, (2), i-24. https://doi.org/10.1002/j.2330-8516.1986.tb00186.x
  • Holland, P.W., & Wainer, H. (Eds.) (1993). Differential item functioning (1st ed.). Lawrence Erlbaum.
  • Karakaya, İ. (2012). An investigation of item bias in science and technology subtests and mathematic subtests in Level Determination Exam. Educational Sciences: Theory and Practice, 12(1), 215–229.
  • Karakaya, İ., & Kutlu, Ö. (2012). An investigation of item bias in Turkish subtests in Level Determination Exam. Education and Science, 37(165), 348–362.
  • Khalid, M.N., & Glas, C.A. (2014). A scale purification procedure for evaluation of differential item functioning. Measurement, 50, 186 197. https://doi.org/10.1016/j.measurement.2013.12.019
  • Li, H.H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647–677
  • Llach, M.P.A., & Gallego, M.T. (2012). Vocabulary knowledge development and gender differences in a second language. Elia, 12(1), 45-75.
  • Lord, F.M. (1980). Applications of item response theory to practical problems (1st edition). Erlbaum.
  • Magis, D., & Facon, B. (2013). Item purification does not always improve DIF detection: A counterexample with Angoff’s delta plot. Educational and Psychological Measurement, 73(2), 293-311. https://doi.org/10.1177/0013164412451903
  • Martinkova, P., & Drabinova, A. (2018). ShinyItemAnalysis for teaching psychometrics and to enforce routine analysis of educational tests. The R Journal, 10(2), 503-515. https://doi.org/10.32614/RJ-2018-074
  • Osterlind, S.J. (1983). Test item bias (1st ed.). Sage.
  • Özdemir, B. (2015). A comparison of IRT-based methods for examining differential item functioning in TIMSS 2011 mathematics subtest. Procedia-Social and Behavioral Sciences, 174, 2075-2083. https://doi.org/10.1016/j.sbspro.2015.02.004
  • R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
  • Raju, N.S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
  • Raju, N.S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197- 207. https://doi.org/10.1177/014662169001400208
  • Roussos, L., & Stout, W. (1996) A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371. https://doi.org/10.1177/014662169602000404
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159–194.
  • Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2 3), 170 187. https://doi.org/10.1080/13803611.2013.767621
  • Soysal, S., & Yılmaz Koğar, E.Y. (2021). An investigation of item position effects by means of IRT-based differential item functioning methods. International Journal of Assessment Tools in Education, 8(2), 239-256. https://doi.org/10.21449/ijate.779963
  • Tunc, E.B., Uluman, M., & Avcu, A. (2018). Revisiting the effect of ıtem purification on differantial ıtem functioning; real data findings. International Online Journal of Educational Sciences, 10(5), 139- 147. https://doi.org/10.15345/iojes.2018.05.010
  • Wang, W.C., & Su, Y.H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113 144. https://doi.org/10.1207/s15324818ame1702_2
  • Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A theoretic comparison of methods [Dissertation, Umea University]. Umea University Libraries EM No 60.
  • Yıldırım, H., & Büyüköztürk, Ş. (2018). Using the delphi technique and focus-group interviews to determine item bias on the mathematics section of the Level Determination Exam for 2012. Educational Sciences: Theory & Practice, 18(2), 447-470.
  • Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland, & H. Wainer, Differential Item Functioning (pp. 337-347). Erlbaum.
  • Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Ottawa: National Defense Headquarters, 160. https://faculty.educ.ubc.ca/zumbo/DIF/handbook.pdf
There are 37 citations in total.

Details

Primary Language English
Subjects Other Fields of Education
Journal Section Articles
Authors

Serap Büyükkıdık 0000-0003-4335-2949

Publication Date December 23, 2023
Submission Date February 12, 2023
Published in Issue Year 2023 Volume: 10 Issue: 4

Cite

APA Büyükkıdık, S. (2023). Purification procedures used for the detection of gender DIF: Item bias in a foreign language test. International Journal of Assessment Tools in Education, 10(4), 765-780. https://doi.org/10.21449/ijate.1250358

23823             23825             23824