Research Article
BibTex RIS Cite

An Investigation of Item Bias of English Test: The Case of 2016 Year Undergraduate Placement Exam in Turkey

Year 2019, Volume: 6 Issue: 1, 48 - 62, 21.03.2019
https://doi.org/10.21449/ijate.508581

Abstract

The purpose of this study is to determine whether English test items of Undergraduate Placement Exam (UPE) in 2016 contain differential item functioning (DIF) and differential bundle functioning (DBF) in terms of gender and school type and examine the possible sources of bias of DIF items. Mantel Haenszel (MH), Simultaneous Item Bias Test (SIBTEST) and Multiple Indicator and Multiple Causes (MIMIC) methods were used for DIF analyses. DBF analyses were conducted by MIMIC and SIBTEST methods. Expert opinions were consulted to determine the sources of bias. Data set of the study consisted of responses of 59818 students to 2016 UPE English test. As a result of the analyses carried out on 60 items, it was seen that one item in translation subtest contained DIF favoring male students. In school type based analyses, it was concluded that there were nine DIF items in vocabulary and grammar knowledge subtest, six DIF items in reading comprehension subtest and four DIF items in translation subtest. Experts stated that one item containing DIF by gender was unbiased, and evidence of bias was found in thirteen of nineteen items that contained DIF by school type. According to DBF analyses, it was found that some item bundles contained DBF with respect to gender and school type. As a result of research, it was discovered that there were differences with regard to the number of DIF items identified by three methods and the level of DIF that the items contained; however, methods were consistent in detecting uniform DIF.

References

  • Abbott, M. L. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7-36. DOI: 10.1177/0265532207071510
  • Akın Arıkan, Ç., Uğurlu, S., & Atar, B. (2016). A DIF and bias study by using MIMIC, SIBTEST, Logistic Regression and Mantel-Haenszel methods. Hacettepe University Journal of Education, 31(1), 34-52. DOI:10.16986/HUJE.2015014226
  • Arga, B. (2017). Gender and student achievement in Turkey: School types and regional differences based on PISA 2012 data (Master's Thesis). İhsan Doğramacı Bilkent University, Ankara.
  • Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (Type I error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning. Educational Sciences: Theory & Practice, 14(6), 2186-2193. DOI: 10.12738/estp.2014.6.2165
  • Bakan Kalaycıoğlu, D. (2008). Öğrenci Seçme Sınavı'nın madde yanlılığı açısından incelenmesi [Item bias analysis of the University Entrance Examination]. (Doctoral Dissertation). Hacettepe University, Ankara.
  • Berberoğlu, G., & Kalender, İ. (2005). Öğrenci başarısının yıllara, okul türlerine, bölgelere göre incelenmesi: ÖSS ve PISA analizi [Investigation of student achievement across years, school types and regions: The SSE and PISA anaylses]. Eğitim Bilimleri ve Uygulama, 4(7), 21-35.
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. London Sage.
  • Chalmers, R. P. (2017). Improving the crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika. DOI: 10.1007/s11336-017-9583-8
  • Chalmers, R. P. (2018). mirt,version 1.27.1: Multidimensional item response theory. Retrieved from https://cran.r-project.org/web/packages/mirt/index.html
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement Issues and Practice, 17(1), 31-44.
  • Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item-bundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33(4), 465-484.
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel,SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. DOI: 10.1177/0146621605275728
  • Finch, H. (2012). The MIMIC model as a tool for differential bundle functioning detection. Applied Psychological Measurement, 36(1), 40-59. DOI: 10.1177/0146621611432863
  • Finch, H. W., & French, B. F. (2007). Detection of crossing differential item functioning a comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. DOI: 10.1177/0013164406296975
  • Fox, J. (2016). polycor,version 0.7-9: Polychoric and polyserial correlations. Retrieved from https://cran.r-project.org/web/packages/polycor/index.html
  • Gierl, M. J., Bisanz, J., Bisanz, G. L., Boughton, K. A., & Khaliq, S. N. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement, 20(2), 26-36.
  • Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A Confirmatory analysis. Journal of Educational Measurement, 38(2), 164-187.
  • Gök, B., Kelecioğlu, H., & Doğan, N. (2010). Değişen madde fonksiyonunu belirlemede Mantel–Haenszel ve Lojistik Regresyon tekniklerinin karşılaştırılması [The comparison of Mantel Haenszel and Logistic Regression tecniques in determining the differential item functioning]. Eğitim ve Bilim, 35(156).
  • Hallquist, M., & Wiley, J. (2018). MplusAutomation,version 0.7-2: An R package for facilitating large-scale latent variable analyses in Mplus. Retrieved from https://cran.r-project.org/web/packages/MplusAutomation/index.html
  • Holland , P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer, & H. I. Braun, Test Validity (pp. 129-145). Hillsdale NJ: Erlbaum.
  • Kan, A. (2007). Test yansızlığı: H.Ü. Yabancı dil muafiyet sınavının cinsiyete ve bölümlere göre DMF analizi [Test fairness: DIF analysis across gender and department of H.U foreign language proficiency examination]. Eurasian Journal of Educational Research(29), 45-58.
  • Karakaya, İ. & Kutlu, Ö. (2012). Seviye belirleme sınavındaki Türkçe alt testlerinin madde yanlılığının incelenmesi [An investigation of item bias in Turkish sub tests in Level Determination Exam]. Eğitim ve Bilim 37(165).
  • Li, H.-H., & Stout, W. (1996). A new procedure for detection of crosssing DIF. Psychometrika, 61(4), 647-677.
  • Lin , J., & Wu, F. (2003). Differential performance by gender in foreign language testing. Paper presented at the Annual Meeting of the National Councilon Measurement in Education.
  • Magis, D., Beland, S., & Raiche , G. (2016). difR, version 4.7:Collection of methods to detect dichotomous differential item functioning(DIF). Retrieved from https://cran.r-project.org/web/packages/difR/index.html
  • Mcnamara, T., & Roever, C. (2006). Psychometric approaches to fairness: Bias and DIF. Language Learning, 56(S2), 81-128.
  • Muthen, L. K., & Muthen, B. O. (1998). Mplus user’s guide. Los Angeles.
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-haenszel and Simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315-328.
  • Osterlind, S. J. (1983). Test item bias. Sage Publications, Inc.
  • Raiche, G., & Magis, D. (2015). nFactors,version 2.3.3:Parallel analysis and non graphical solutions to the Cattell. Retrieved from https://cran.r-project.org/web/packages/nFactors/index.html
  • Rosseel, Y. (2017). lavaan,version 0.5-23.1097: Latent variable analysis. Retrieved from https://cran.r-project.org/web/packages/lavaan/index.html
  • Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230.
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/ DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.
  • Willse, J. T. (2018). CTT,version 2.3.2: Classical test theory functions. Retrieved from https://cran.r-project.org/web/packages/CTT/index.html
  • Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with Multiple Indicator Multiple Cause models. Applied Psychological Measurement, 35(5), 339-361. DOI: 10.1177/0146621611405984
  • Yalçın, S. (2011). Türk öğrencilerin PISA başarı düzeylerinin veri zarflama analizi ile yıllara göre karşılaştırılması[The comparison of Turkish students’ PISA achievement levels in relation to years via data envelopment analysis]. (Master's Thesis). Ankara University, Ankara.
  • Yiğit, S. (2010). PISA matematik alt test sorularına verilen cevapların bazı faktörlere göre incelenmesi (Kocaeli-Kartepe örneği) [The analysis of the answers to PISA maths subtest questions according to certain factors (Kocaeli-Kartepe case)]. (Master's Thesis). Sakarya University, Sakarya.
  • Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. P. W. Holland, & H. Wainer içinde, Differential Item Functioning (s. 337-347). Hillsdale NJ:Erlbaum.
  • Zumbo, B. D. (1999). A Handbook on the theory and methods of differencial item functioning (DIF):Logistic Regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottowa on Directorate of Human Resources Research and Evaluation,Deparment of National Defense.

An Investigation of Item Bias of English Test: The Case of 2016 Year Undergraduate Placement Exam in Turkey

Year 2019, Volume: 6 Issue: 1, 48 - 62, 21.03.2019
https://doi.org/10.21449/ijate.508581

Abstract

The
purpose of this study is to determine whether English test items of
Undergraduate Placement Exam (UPE) in 2016 conta
in
differential item functioning (DIF) and differential bundle functioning (DBF)
in terms of gender and school type and examine the possible sources of bias of
DIF items. Mantel Haenszel (MH), Simultaneous Item Bias Test (SIBTEST) and
Multiple Indicator and Multiple Causes (MIMIC) methods were used for DIF
analyses. DBF analyses were conducted by MIMIC and SIBTEST methods. Expert
opinions were consulted to determine the sources of bias. Data set of the study
consisted of responses of 59818 students to 2016 UPE English test.  As a result of the analyses carried out on 60
items, it was seen that one item in translation subtest contained DIF favoring
male students. In school type based analyses, it was concluded that there were
nine DIF items in vocabulary and grammar knowledge subtest, six DIF items in
reading comprehension subtest and four DIF items in translation subtest.
Experts stated that one item containing DIF by gender was unbiased, and
evidence of bias was found in thirteen of nineteen items that contained DIF by
school type. According to DBF analyses, it was found that some item bundles
contained DBF with respect to gender and school type.  As a result of research, it was discovered
that there were differences with regard to the number of DIF items identified
by three methods and the level of DIF that the items contained; however,
methods were consistent in detecting uniform DIF.

References

  • Abbott, M. L. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7-36. DOI: 10.1177/0265532207071510
  • Akın Arıkan, Ç., Uğurlu, S., & Atar, B. (2016). A DIF and bias study by using MIMIC, SIBTEST, Logistic Regression and Mantel-Haenszel methods. Hacettepe University Journal of Education, 31(1), 34-52. DOI:10.16986/HUJE.2015014226
  • Arga, B. (2017). Gender and student achievement in Turkey: School types and regional differences based on PISA 2012 data (Master's Thesis). İhsan Doğramacı Bilkent University, Ankara.
  • Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (Type I error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning. Educational Sciences: Theory & Practice, 14(6), 2186-2193. DOI: 10.12738/estp.2014.6.2165
  • Bakan Kalaycıoğlu, D. (2008). Öğrenci Seçme Sınavı'nın madde yanlılığı açısından incelenmesi [Item bias analysis of the University Entrance Examination]. (Doctoral Dissertation). Hacettepe University, Ankara.
  • Berberoğlu, G., & Kalender, İ. (2005). Öğrenci başarısının yıllara, okul türlerine, bölgelere göre incelenmesi: ÖSS ve PISA analizi [Investigation of student achievement across years, school types and regions: The SSE and PISA anaylses]. Eğitim Bilimleri ve Uygulama, 4(7), 21-35.
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. London Sage.
  • Chalmers, R. P. (2017). Improving the crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika. DOI: 10.1007/s11336-017-9583-8
  • Chalmers, R. P. (2018). mirt,version 1.27.1: Multidimensional item response theory. Retrieved from https://cran.r-project.org/web/packages/mirt/index.html
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement Issues and Practice, 17(1), 31-44.
  • Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item-bundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33(4), 465-484.
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel,SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. DOI: 10.1177/0146621605275728
  • Finch, H. (2012). The MIMIC model as a tool for differential bundle functioning detection. Applied Psychological Measurement, 36(1), 40-59. DOI: 10.1177/0146621611432863
  • Finch, H. W., & French, B. F. (2007). Detection of crossing differential item functioning a comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. DOI: 10.1177/0013164406296975
  • Fox, J. (2016). polycor,version 0.7-9: Polychoric and polyserial correlations. Retrieved from https://cran.r-project.org/web/packages/polycor/index.html
  • Gierl, M. J., Bisanz, J., Bisanz, G. L., Boughton, K. A., & Khaliq, S. N. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement, 20(2), 26-36.
  • Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A Confirmatory analysis. Journal of Educational Measurement, 38(2), 164-187.
  • Gök, B., Kelecioğlu, H., & Doğan, N. (2010). Değişen madde fonksiyonunu belirlemede Mantel–Haenszel ve Lojistik Regresyon tekniklerinin karşılaştırılması [The comparison of Mantel Haenszel and Logistic Regression tecniques in determining the differential item functioning]. Eğitim ve Bilim, 35(156).
  • Hallquist, M., & Wiley, J. (2018). MplusAutomation,version 0.7-2: An R package for facilitating large-scale latent variable analyses in Mplus. Retrieved from https://cran.r-project.org/web/packages/MplusAutomation/index.html
  • Holland , P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer, & H. I. Braun, Test Validity (pp. 129-145). Hillsdale NJ: Erlbaum.
  • Kan, A. (2007). Test yansızlığı: H.Ü. Yabancı dil muafiyet sınavının cinsiyete ve bölümlere göre DMF analizi [Test fairness: DIF analysis across gender and department of H.U foreign language proficiency examination]. Eurasian Journal of Educational Research(29), 45-58.
  • Karakaya, İ. & Kutlu, Ö. (2012). Seviye belirleme sınavındaki Türkçe alt testlerinin madde yanlılığının incelenmesi [An investigation of item bias in Turkish sub tests in Level Determination Exam]. Eğitim ve Bilim 37(165).
  • Li, H.-H., & Stout, W. (1996). A new procedure for detection of crosssing DIF. Psychometrika, 61(4), 647-677.
  • Lin , J., & Wu, F. (2003). Differential performance by gender in foreign language testing. Paper presented at the Annual Meeting of the National Councilon Measurement in Education.
  • Magis, D., Beland, S., & Raiche , G. (2016). difR, version 4.7:Collection of methods to detect dichotomous differential item functioning(DIF). Retrieved from https://cran.r-project.org/web/packages/difR/index.html
  • Mcnamara, T., & Roever, C. (2006). Psychometric approaches to fairness: Bias and DIF. Language Learning, 56(S2), 81-128.
  • Muthen, L. K., & Muthen, B. O. (1998). Mplus user’s guide. Los Angeles.
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-haenszel and Simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315-328.
  • Osterlind, S. J. (1983). Test item bias. Sage Publications, Inc.
  • Raiche, G., & Magis, D. (2015). nFactors,version 2.3.3:Parallel analysis and non graphical solutions to the Cattell. Retrieved from https://cran.r-project.org/web/packages/nFactors/index.html
  • Rosseel, Y. (2017). lavaan,version 0.5-23.1097: Latent variable analysis. Retrieved from https://cran.r-project.org/web/packages/lavaan/index.html
  • Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230.
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/ DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.
  • Willse, J. T. (2018). CTT,version 2.3.2: Classical test theory functions. Retrieved from https://cran.r-project.org/web/packages/CTT/index.html
  • Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with Multiple Indicator Multiple Cause models. Applied Psychological Measurement, 35(5), 339-361. DOI: 10.1177/0146621611405984
  • Yalçın, S. (2011). Türk öğrencilerin PISA başarı düzeylerinin veri zarflama analizi ile yıllara göre karşılaştırılması[The comparison of Turkish students’ PISA achievement levels in relation to years via data envelopment analysis]. (Master's Thesis). Ankara University, Ankara.
  • Yiğit, S. (2010). PISA matematik alt test sorularına verilen cevapların bazı faktörlere göre incelenmesi (Kocaeli-Kartepe örneği) [The analysis of the answers to PISA maths subtest questions according to certain factors (Kocaeli-Kartepe case)]. (Master's Thesis). Sakarya University, Sakarya.
  • Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. P. W. Holland, & H. Wainer içinde, Differential Item Functioning (s. 337-347). Hillsdale NJ:Erlbaum.
  • Zumbo, B. D. (1999). A Handbook on the theory and methods of differencial item functioning (DIF):Logistic Regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottowa on Directorate of Human Resources Research and Evaluation,Deparment of National Defense.
There are 39 citations in total.

Details

Primary Language English
Subjects Studies on Education
Journal Section Articles
Authors

Rabia Akcan This is me 0000-0003-3025-774X

Kübra Atalay Kabasakal 0000-0002-3580-5568

Publication Date March 21, 2019
Submission Date September 25, 2018
Published in Issue Year 2019 Volume: 6 Issue: 1

Cite

APA Akcan, R., & Atalay Kabasakal, K. (2019). An Investigation of Item Bias of English Test: The Case of 2016 Year Undergraduate Placement Exam in Turkey. International Journal of Assessment Tools in Education, 6(1), 48-62. https://doi.org/10.21449/ijate.508581

23823             23825             23824