An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods

Sümeyra Soysal; Esin Yılmaz Koğar

doi:10.21449/ijate.779963

Research Article

An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods

Year 2021, Volume: 8 Issue: 2, 239 - 256, 10.06.2021

Sümeyra Soysal , Esin Yılmaz Koğar

https://doi.org/10.21449/ijate.779963

Cited By: 1

Abstract

In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord’s chi-square and Raju’s unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that generally, the method of Lord’s chi-square identified more items with DIF than did the method of Raju’s unsigned area. The differentiation of the booklets with respect to item position resulted in a higher number of items displaying DIF with item purification conditions. Based on the findings of the present study, to avoid the occurrence of DIF due to item position effects, it is recommended to position the same items across different booklets in similar locations when forming different booklets.

Keywords

Item position effects, Item response theory, Differential item function, Unsigned Raju’s Area, Lord’s chi-square

References

Akayleh, A. S. A. (2018). Precision of the estimations for some methods of the CTT and IRT as a base to display the differential item functions on the different item ordered test formats. https://bit.ly/3aJeFKx
Avcu, A., Tunç, E. B., & Uluman, M. (2018). How the order of the items in a booklet affects item functioning: Empirical findings from course level data?. European Journal of Education Studies, 4(3), 227-239. http://doi.org/10.5281/zenodo.1199695
Balta, E., & Omur Sunbul, S. (2017). An investigation of ordering test items differently depending on their difficulty level by differential item functioning. Eurasian Journal of Educational Research, 72, 23-42. https://doi.org/doi:10.14689/ejer.2017.72.2
Brown, T. A. (2006). Confirmatory factor analysis for applied research (2nd ed.). The Guilford Press.
Bulut, O. (2015). An empirical analysis of gender-based DIF due to test booklet effect. European Journal of Research on Education, 3(1), 7-16. https://bit.ly/3cKkhqf
Bulut, O., Quo, Q., & Gierl, M. J. (2017). A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assessments in Education, 5(1), 8. http://doi.org/10.1186/s40536-017-0042-x
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12(3), 253-260. https://conservancy.umn.edu/bitstream/handle/11299/107645/v12n3p253.pdf?sequence=1
Clauser, B., Mazor, K., & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6(4), 269 279. https://doi.org/10.1207/s15324818ame0604_2
Choi, Y., Alexeev, N., & Cohen, A. (2014). DIF analysis using a mixture 3PL model with a covariate on the TIMSS 2007 mathematics test. In KAERA Research Forum, 1(1), 4-14. http://www.columbia.edu/~ld208/KAERA_2014.pdf#page=5
Çokluk, Ö., Gül, E., & Dogan-Gül, Ç. (2016). Examining differential item functions of different item ordered test forms according to item difficulty levels. Educational Sciences: Theory and Practice, 16(1), 319-330. http://dx.doi.org/10.12738/estp.2016.1.0329
Davis, J., & Ferdous, A. (2005). Using item difficulty and item position to measure test fatigue. American Institutes for Research. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.847&rep=rep1&type=pdf
Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164 185. https://ppw.kuleuven.be/okp/_pdf/DeBeer2013MIPEW.pdf
Doğan Gül, Ç., & Çokluk Bökeoğlu, Ö. (2018). The comparison of academic success of students with low and high anxiety levels in tests varying in item difficulty. Inonu University Journal of the Faculty of Education, 19(3), 252 265. https://doi.org/10.17679/inuefd.341477
Erdem, B. (2015). Ortaöğretime geçişte kullanılan ortak sınavların değişen madde fonksiyonu açısından kitapçık türlerine göre farklı yöntemlerle incelenmesi [Investigation of Common Exams Used in Transition to High Schools in Terms of Differential Item Functioning Regarding Booklet Types with Different Methods] [Unpublished master dissertation]. Hacettepe University. Ankara.
Freedle, R., & Kostin, I. (1991). The prediction of SAT reading comprehension item difficulty for expository prose passages (ETS Research Report, RR-91-29). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1991.tb01396.x
Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large‐scale assessments of student achievement: Theory and practice. Educational Measurement: Issues and Practice, 28(3), 39 53. https://doi.org/10.1111/j.1745 3992.2009.00154.x
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50(3), 379–390. https://bit.ly/3aHHyGD
Hambleton, R. K. (1968). The effects of item order and anxiety on test performance and stress. Paper presented at the meeting of American Educational Research Association. https://files.eric.ed.gov/fulltext/ED017960.pdf
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
Hartig, J., & Buchholz, J. (2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modeling, 54(4), 418-431. https://core.ac.uk/download/pdf/25705605.pdf
Hecht, M., Weirich, S., Siegle, T., & Frey, A. (2015). Effects of design properties on parameter estimation in large scale assessments. Educational and Psychological Measurement, 75(6), 1021 1044. https://doi.org/10.1177/0013164415573311
Hohensinn, C., Kubinger, K. D., Reif, M., Holocher-Ertl, S., Khorramdel, L., & Frebort, M. (2008). Examining item-position effects in large-scale assessment using the linear logistic test model. Psychology Science Quarterly, 50, 391-402. https://bit.ly/39Sb9xY
Hohensinn, C., Kubinger, K. D., Reif, M., Schleicher, E., & Khorramdel, L. (2011). Analysing item position effects due to test booklet design within large-scale assessment. Educational Research and Evaluation, 17(6), 497 509. https://doi.org/10.1080/13803611.2011.632668
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp.129-143). Erlbaum.
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Lawrence Erlbaum Associates.
Hooper, D., Coughlan, J., & Mullen, M. (2008). Structural equation modeling: Guidelines for determining model fit. Electronic Journal of Business Research Methods 6(1), 53-60. http://arrow.dit.ie/cgi/viewcontent.cgi?article=1001&context=buschmanart
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55. https://doi.org/10.1080/10705519909540118
Huck, S. W. (2012). Reading statistics and research (6th ed.). Pearson.
Kamata, A., & Vaughn, B. K. (2004). An introduction to differential item functioning analysis. Learning Disabilities: A Contemporary Journal, 2(2), 49-69. (EJ797693). ERIC. https://eric.ed.gov/?id=EJ797693
Karami, H. (2012). An introduction to differential item functioning. The International Journal of Educational and Psychological Assessment, 11(2), 59 76. https://psycnet.apa.org/record/2012-28410-004
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8(2), 147-154. https://conservancy.umn.edu/bitstream/handle/11299/101880/1/v08n2p147.pdf
Kleinke, D. J. (1980). Item order, response location, and examinee sex and handedness on performance on multiple-choice tests. Journal of Educational Research, 73(4), 225–229. https://doi.org/10.1080/00220671.1980.10885240
Kline, R. B. (2005). Principles and practice of structural equation modeling. The Guilford Press.
Klosner, N. C., & Gellman, E. K. (1973). The effect of item arrangement on classroom test performance: Implications for content validity. Educational and Psychological Measurement, 33, 413-418. https://doi.org/10.1177/001316447303300224
Le, L. T. (2007, July). Effects of item positions on their difficulty and discrimination: A study in PISA Science data across test language and countries. Paper presented at the 72nd Annual Meeting of the Psychometric Society, Tokyo.
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55(3), 387-413. https://doi.org/10.3102/00346543055003387
Li, F., Cohen, A., & Shen, L. (2012). Investigating the effect of item position in computer‐based tests. Journal of Educational Measurement, 49(4), 362 379. https://doi.org/10.1111/j.1745-3984.2012.00181.x
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.
Magis, D., Beland, S., & Raiche, G. (2015). Package ‘difR’ (Version: 5.0). [Computer software manual]. Retrieved May 14, 2018. Retrieved from https://cran.rproject.org/web/packages/difR/difR.pdf
Magis, D., & Facon, B. (2012). Item purification does not always improve DIF detection: A counterexample with Angoff’s delta plot. Educational and Psychological Measurement, 73(2), 293-311. https://doi.org/10.1177/0013164412451903
Martin, M. O., Mullis, I. V. S., & Chrostowski, S. J. (2004). Item analysis and review. In M. O. Martin, I. V. S. Mullis, & S. J. Chrostowski (Eds.), TIMSS 2003 technical report (pp. 224–251). TIMSS & PIRLS International Study Center, Boston College.
McNamara, T., & C. Roever (2006) Language testing: The social dimension. Blackwell.
Meyers, J. L., Miller, G. E., & Way, W. D. (2009). Item position and item difficulty change in an IRT based common item equating design. Applied Measurement in Education, 22(1), 38-60. https://doi.org/10.1080/08957340802558342
Ministry of National Education [MoNE], (2013). 2013-2014 Eğitim-öğretim yılı ortaöğretimi geçiş ortak sınavları e-klavuzu. Ankara.
Monahan, P. O., & Ankenmann, R. D. (2010). Alternative matching scores to control type I error of the Mantel–Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models. Applied Psychological Measurement, 34(3), 193-210. https://doi.org/10.1177/0146621609359283
Muthén, L. K., & Muthén, B. O. (2010). Mplus: Statistical analysis with latent variables user’s guide 6.0. Muthén & Muthén.
Newman, D. L., Kundert, D. K., Lane Jr, D. S., & Bull, K. S. (1988). Effect of varying item order on multiple-choice test scores: Importance of statistical and cognitive difficulty. Applied Measurement in Education, 1(1), 89 97. https://doi.org/10.1207/s15324818ame0101_8
Ollennu, S. N. N., & Etsey, Y. K. A. (2015). The impact of item position in multiple-choice test on student performance at the basic education certificate examination (BECE) level. Universal Journal of Educational Research, 3(10), 718 723. https://doi.org/10.13189/ujer.2015.031009
Özdemir, B. (2015). A comparison of IRT-based methods for examining differential item functioning in TIMSS 2011 mathematics subtest. Procedia-Social and Behavioral Sciences, 174, 2075-2083. https://doi.org/10.1016/j.sbspro.2015.02.004
Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement Issues and Practice, 19(3), 5–15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
Perlini, A. H., Lind, D. L., & Zumbo, B. D. (1998). Context effects on examinations: The effects of time, item order and item difficulty. Canadian Psychology/Psychologie Canadienne, 39(4), 299-307. https://doi.org/10.1037/h0086821
Plake, B. S., Patience, W. M., & Whitney, D. R. (1988). Differential item performance in mathematics achievement test items: Effect of item arrangement. Educational and Psychological Measurement, 48(4), 885 894. https://doi.org/10.1177/0013164488484003
Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38(7), 518 534. https://doi.org/10.1177/0146621614534312
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://link.springer.com/article/10.1007/BF02294403
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207.https://conservancy.umn.edu/bitstream/handle/11299/113559/v14n2p197.pdf?sequence=1
Rose, N., Nagy, G., Nagengast, B., Frey, A., & Becker, M. (2019). Modeling multiple item context effects with generalized linear mixed models. Frontiers in Psychology, 10,248. https://doi.org/10.3389/fpsyg.2019.00248
Rosseel, Y., Jorgensen, T., Oberski, D., Byrnes, J., Vanbrabant, L., Savalei, V., Merkle, E., Hallquist, M., Rhemtulla, M., Katsikatsou, M., Barendse, M., & Scharf, F. (2019). Package ‘lavaan’ (Version: 0.6 5) [Computer software manual]. https://cran.r project.org/web/packages/lavaan/lavaan.pdf
Rudner, L. M., Getson, P. R., & Knight, D. L. (1980). Biased item detection techniques. Journal of Educational Statistics, 5, 213-233. https://doi.org/10.2307/1164965
Ryan, K. E., & Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14(1), 73 90. https://doi.org/10.1207/S15324818AME1401_06
Salvucci, S., Walter, E., Conley, V., Fink, S., & Saba, M. (1997). Measurement error studies at the National Center for Education Statistics (NCES). U.S. Department of Education.
Schmitt, A. P., & Crone, C. R. (1991). Alternative mathematical aptitude item types: DIF issues. ETS Research Report Series, 1991(2), i-22. https://doi.org/10.1002/j.2333-8504.1991.tb01409.x
Sümer, N. (2000). Yapısal eşitlik modelleri: Temel kavramlar ve örnek uygulamalar [Structural Equation Modeling: Basic Concepts and Applications]. Türk Psikoloji Yazıları, 3(6), 49-73. https://psycnet.apa.org/record/2006-04302-005
Tal, I. R., Akers, K. G. & Hodge, K. G. (2008). Effect of Paper color and question order on exam performance. Teaching of Psychology, 35(1), 26 28. https://doi.org/10.1080/00986280701818482
The West African Examinations Council [WAEC] (1993). The effects of item position on performance in multiple choice tests. Research Report, Research Division, WAEC, Lagos.
Tippets, E., & Benson, J. (1989). The effect of item arrangement on test anxiety. Applied Measurement in Education, 2(4), 289 296. https://doi.org/10.1207/s15324818ame0204_2
Trendtel, M., & Robitzsch, A. (2018). Modeling item position effects with a Bayesian item response model applied to PISA 2009–2015 data. Psychological Test and Assessment Modeling, 60(2), 241-263. https://bit.ly/3cQWkh5
Uysal, I., Ertuna, L., Ertaş, F. G., & Kelecioğlu, H. (2019). Performances based on ability estimation of the methods of detecting differential item functioning: A simulation study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312
Verguts, T., & De Boeck, P. (2000). A Rasch model for detecting learning while solving an intelligence test. Applied Psychological Measurement, 24, 151 162. https://doi.org/10.1177/01466210022031589
Weirich, S., Hecht, M., & Böhme, K. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38, 535 548. https://doi.org/10.1177/0146621614534955
Weirich, S., Hecht, M., Penk, C., Roppelt, A., & Böhme, K. (2017). Item position effects are moderated by changes in test-taking effort. Applied Psychological Measurement, 41(2), 115-129. https://doi.org/10.1177/0146621616676791
Wu, Q., Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2019). Predictors of individual performance changes related to item positions in PISA assessments. Large-scale Assessments in Education, 7(5), 1-21. https://doi.org/10.1186/s40536-019-0073-6
Zumbo, B. D. (1999). A handbook on the theory and methods of differential ıtem functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) ıtem scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Thayer, D. T., & Wingersky, M. (1995). Effect of Rasch calibration on ability and DIF estimation in computer‐adaptive tests. Journal of Educational Measurement, 32(4), 341-363. https://www.jstor.org/stable/1435217

An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods

Year 2021, Volume: 8 Issue: 2, 239 - 256, 10.06.2021

Sümeyra Soysal , Esin Yılmaz Koğar

https://doi.org/10.21449/ijate.779963

Cited By: 1

Abstract

Keywords

Item position effects, Item Response Theory, Differential item functio, Raju’s unsigned area, Lord’s chi-square

References

Akayleh, A. S. A. (2018). Precision of the estimations for some methods of the CTT and IRT as a base to display the differential item functions on the different item ordered test formats. https://bit.ly/3aJeFKx
Avcu, A., Tunç, E. B., & Uluman, M. (2018). How the order of the items in a booklet affects item functioning: Empirical findings from course level data?. European Journal of Education Studies, 4(3), 227-239. http://doi.org/10.5281/zenodo.1199695
Balta, E., & Omur Sunbul, S. (2017). An investigation of ordering test items differently depending on their difficulty level by differential item functioning. Eurasian Journal of Educational Research, 72, 23-42. https://doi.org/doi:10.14689/ejer.2017.72.2
Brown, T. A. (2006). Confirmatory factor analysis for applied research (2nd ed.). The Guilford Press.
Bulut, O. (2015). An empirical analysis of gender-based DIF due to test booklet effect. European Journal of Research on Education, 3(1), 7-16. https://bit.ly/3cKkhqf
Bulut, O., Quo, Q., & Gierl, M. J. (2017). A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assessments in Education, 5(1), 8. http://doi.org/10.1186/s40536-017-0042-x
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12(3), 253-260. https://conservancy.umn.edu/bitstream/handle/11299/107645/v12n3p253.pdf?sequence=1
Clauser, B., Mazor, K., & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6(4), 269 279. https://doi.org/10.1207/s15324818ame0604_2
Choi, Y., Alexeev, N., & Cohen, A. (2014). DIF analysis using a mixture 3PL model with a covariate on the TIMSS 2007 mathematics test. In KAERA Research Forum, 1(1), 4-14. http://www.columbia.edu/~ld208/KAERA_2014.pdf#page=5
Çokluk, Ö., Gül, E., & Dogan-Gül, Ç. (2016). Examining differential item functions of different item ordered test forms according to item difficulty levels. Educational Sciences: Theory and Practice, 16(1), 319-330. http://dx.doi.org/10.12738/estp.2016.1.0329
Davis, J., & Ferdous, A. (2005). Using item difficulty and item position to measure test fatigue. American Institutes for Research. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.847&rep=rep1&type=pdf
Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164 185. https://ppw.kuleuven.be/okp/_pdf/DeBeer2013MIPEW.pdf
Doğan Gül, Ç., & Çokluk Bökeoğlu, Ö. (2018). The comparison of academic success of students with low and high anxiety levels in tests varying in item difficulty. Inonu University Journal of the Faculty of Education, 19(3), 252 265. https://doi.org/10.17679/inuefd.341477
Erdem, B. (2015). Ortaöğretime geçişte kullanılan ortak sınavların değişen madde fonksiyonu açısından kitapçık türlerine göre farklı yöntemlerle incelenmesi [Investigation of Common Exams Used in Transition to High Schools in Terms of Differential Item Functioning Regarding Booklet Types with Different Methods] [Unpublished master dissertation]. Hacettepe University. Ankara.
Freedle, R., & Kostin, I. (1991). The prediction of SAT reading comprehension item difficulty for expository prose passages (ETS Research Report, RR-91-29). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1991.tb01396.x
Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large‐scale assessments of student achievement: Theory and practice. Educational Measurement: Issues and Practice, 28(3), 39 53. https://doi.org/10.1111/j.1745 3992.2009.00154.x
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50(3), 379–390. https://bit.ly/3aHHyGD
Hambleton, R. K. (1968). The effects of item order and anxiety on test performance and stress. Paper presented at the meeting of American Educational Research Association. https://files.eric.ed.gov/fulltext/ED017960.pdf
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
Hartig, J., & Buchholz, J. (2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modeling, 54(4), 418-431. https://core.ac.uk/download/pdf/25705605.pdf
Hecht, M., Weirich, S., Siegle, T., & Frey, A. (2015). Effects of design properties on parameter estimation in large scale assessments. Educational and Psychological Measurement, 75(6), 1021 1044. https://doi.org/10.1177/0013164415573311
Hohensinn, C., Kubinger, K. D., Reif, M., Holocher-Ertl, S., Khorramdel, L., & Frebort, M. (2008). Examining item-position effects in large-scale assessment using the linear logistic test model. Psychology Science Quarterly, 50, 391-402. https://bit.ly/39Sb9xY
Hohensinn, C., Kubinger, K. D., Reif, M., Schleicher, E., & Khorramdel, L. (2011). Analysing item position effects due to test booklet design within large-scale assessment. Educational Research and Evaluation, 17(6), 497 509. https://doi.org/10.1080/13803611.2011.632668
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp.129-143). Erlbaum.
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Lawrence Erlbaum Associates.
Hooper, D., Coughlan, J., & Mullen, M. (2008). Structural equation modeling: Guidelines for determining model fit. Electronic Journal of Business Research Methods 6(1), 53-60. http://arrow.dit.ie/cgi/viewcontent.cgi?article=1001&context=buschmanart
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55. https://doi.org/10.1080/10705519909540118
Huck, S. W. (2012). Reading statistics and research (6th ed.). Pearson.
Kamata, A., & Vaughn, B. K. (2004). An introduction to differential item functioning analysis. Learning Disabilities: A Contemporary Journal, 2(2), 49-69. (EJ797693). ERIC. https://eric.ed.gov/?id=EJ797693
Karami, H. (2012). An introduction to differential item functioning. The International Journal of Educational and Psychological Assessment, 11(2), 59 76. https://psycnet.apa.org/record/2012-28410-004
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8(2), 147-154. https://conservancy.umn.edu/bitstream/handle/11299/101880/1/v08n2p147.pdf
Kleinke, D. J. (1980). Item order, response location, and examinee sex and handedness on performance on multiple-choice tests. Journal of Educational Research, 73(4), 225–229. https://doi.org/10.1080/00220671.1980.10885240
Kline, R. B. (2005). Principles and practice of structural equation modeling. The Guilford Press.
Klosner, N. C., & Gellman, E. K. (1973). The effect of item arrangement on classroom test performance: Implications for content validity. Educational and Psychological Measurement, 33, 413-418. https://doi.org/10.1177/001316447303300224
Le, L. T. (2007, July). Effects of item positions on their difficulty and discrimination: A study in PISA Science data across test language and countries. Paper presented at the 72nd Annual Meeting of the Psychometric Society, Tokyo.
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55(3), 387-413. https://doi.org/10.3102/00346543055003387
Li, F., Cohen, A., & Shen, L. (2012). Investigating the effect of item position in computer‐based tests. Journal of Educational Measurement, 49(4), 362 379. https://doi.org/10.1111/j.1745-3984.2012.00181.x
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.
Magis, D., Beland, S., & Raiche, G. (2015). Package ‘difR’ (Version: 5.0). [Computer software manual]. Retrieved May 14, 2018. Retrieved from https://cran.rproject.org/web/packages/difR/difR.pdf
Magis, D., & Facon, B. (2012). Item purification does not always improve DIF detection: A counterexample with Angoff’s delta plot. Educational and Psychological Measurement, 73(2), 293-311. https://doi.org/10.1177/0013164412451903
Martin, M. O., Mullis, I. V. S., & Chrostowski, S. J. (2004). Item analysis and review. In M. O. Martin, I. V. S. Mullis, & S. J. Chrostowski (Eds.), TIMSS 2003 technical report (pp. 224–251). TIMSS & PIRLS International Study Center, Boston College.
McNamara, T., & C. Roever (2006) Language testing: The social dimension. Blackwell.
Meyers, J. L., Miller, G. E., & Way, W. D. (2009). Item position and item difficulty change in an IRT based common item equating design. Applied Measurement in Education, 22(1), 38-60. https://doi.org/10.1080/08957340802558342
Ministry of National Education [MoNE], (2013). 2013-2014 Eğitim-öğretim yılı ortaöğretimi geçiş ortak sınavları e-klavuzu. Ankara.
Monahan, P. O., & Ankenmann, R. D. (2010). Alternative matching scores to control type I error of the Mantel–Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models. Applied Psychological Measurement, 34(3), 193-210. https://doi.org/10.1177/0146621609359283
Muthén, L. K., & Muthén, B. O. (2010). Mplus: Statistical analysis with latent variables user’s guide 6.0. Muthén & Muthén.
Newman, D. L., Kundert, D. K., Lane Jr, D. S., & Bull, K. S. (1988). Effect of varying item order on multiple-choice test scores: Importance of statistical and cognitive difficulty. Applied Measurement in Education, 1(1), 89 97. https://doi.org/10.1207/s15324818ame0101_8
Ollennu, S. N. N., & Etsey, Y. K. A. (2015). The impact of item position in multiple-choice test on student performance at the basic education certificate examination (BECE) level. Universal Journal of Educational Research, 3(10), 718 723. https://doi.org/10.13189/ujer.2015.031009
Özdemir, B. (2015). A comparison of IRT-based methods for examining differential item functioning in TIMSS 2011 mathematics subtest. Procedia-Social and Behavioral Sciences, 174, 2075-2083. https://doi.org/10.1016/j.sbspro.2015.02.004
Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement Issues and Practice, 19(3), 5–15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
Perlini, A. H., Lind, D. L., & Zumbo, B. D. (1998). Context effects on examinations: The effects of time, item order and item difficulty. Canadian Psychology/Psychologie Canadienne, 39(4), 299-307. https://doi.org/10.1037/h0086821
Plake, B. S., Patience, W. M., & Whitney, D. R. (1988). Differential item performance in mathematics achievement test items: Effect of item arrangement. Educational and Psychological Measurement, 48(4), 885 894. https://doi.org/10.1177/0013164488484003
Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38(7), 518 534. https://doi.org/10.1177/0146621614534312
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://link.springer.com/article/10.1007/BF02294403
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207.https://conservancy.umn.edu/bitstream/handle/11299/113559/v14n2p197.pdf?sequence=1
Rose, N., Nagy, G., Nagengast, B., Frey, A., & Becker, M. (2019). Modeling multiple item context effects with generalized linear mixed models. Frontiers in Psychology, 10,248. https://doi.org/10.3389/fpsyg.2019.00248
Rosseel, Y., Jorgensen, T., Oberski, D., Byrnes, J., Vanbrabant, L., Savalei, V., Merkle, E., Hallquist, M., Rhemtulla, M., Katsikatsou, M., Barendse, M., & Scharf, F. (2019). Package ‘lavaan’ (Version: 0.6 5) [Computer software manual]. https://cran.r project.org/web/packages/lavaan/lavaan.pdf
Rudner, L. M., Getson, P. R., & Knight, D. L. (1980). Biased item detection techniques. Journal of Educational Statistics, 5, 213-233. https://doi.org/10.2307/1164965
Ryan, K. E., & Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14(1), 73 90. https://doi.org/10.1207/S15324818AME1401_06
Salvucci, S., Walter, E., Conley, V., Fink, S., & Saba, M. (1997). Measurement error studies at the National Center for Education Statistics (NCES). U.S. Department of Education.
Schmitt, A. P., & Crone, C. R. (1991). Alternative mathematical aptitude item types: DIF issues. ETS Research Report Series, 1991(2), i-22. https://doi.org/10.1002/j.2333-8504.1991.tb01409.x
Sümer, N. (2000). Yapısal eşitlik modelleri: Temel kavramlar ve örnek uygulamalar [Structural Equation Modeling: Basic Concepts and Applications]. Türk Psikoloji Yazıları, 3(6), 49-73. https://psycnet.apa.org/record/2006-04302-005
Tal, I. R., Akers, K. G. & Hodge, K. G. (2008). Effect of Paper color and question order on exam performance. Teaching of Psychology, 35(1), 26 28. https://doi.org/10.1080/00986280701818482
The West African Examinations Council [WAEC] (1993). The effects of item position on performance in multiple choice tests. Research Report, Research Division, WAEC, Lagos.
Tippets, E., & Benson, J. (1989). The effect of item arrangement on test anxiety. Applied Measurement in Education, 2(4), 289 296. https://doi.org/10.1207/s15324818ame0204_2
Trendtel, M., & Robitzsch, A. (2018). Modeling item position effects with a Bayesian item response model applied to PISA 2009–2015 data. Psychological Test and Assessment Modeling, 60(2), 241-263. https://bit.ly/3cQWkh5
Uysal, I., Ertuna, L., Ertaş, F. G., & Kelecioğlu, H. (2019). Performances based on ability estimation of the methods of detecting differential item functioning: A simulation study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312
Verguts, T., & De Boeck, P. (2000). A Rasch model for detecting learning while solving an intelligence test. Applied Psychological Measurement, 24, 151 162. https://doi.org/10.1177/01466210022031589
Weirich, S., Hecht, M., & Böhme, K. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38, 535 548. https://doi.org/10.1177/0146621614534955
Weirich, S., Hecht, M., Penk, C., Roppelt, A., & Böhme, K. (2017). Item position effects are moderated by changes in test-taking effort. Applied Psychological Measurement, 41(2), 115-129. https://doi.org/10.1177/0146621616676791
Wu, Q., Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2019). Predictors of individual performance changes related to item positions in PISA assessments. Large-scale Assessments in Education, 7(5), 1-21. https://doi.org/10.1186/s40536-019-0073-6
Zumbo, B. D. (1999). A handbook on the theory and methods of differential ıtem functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) ıtem scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Thayer, D. T., & Wingersky, M. (1995). Effect of Rasch calibration on ability and DIF estimation in computer‐adaptive tests. Journal of Educational Measurement, 32(4), 341-363. https://www.jstor.org/stable/1435217

There are 74 citations in total.

Details

Primary Language	English
Subjects	Studies on Education
Journal Section	Articles
Authors	Sümeyra Soysal 0000-0002-7304-1722 Esin Yılmaz Koğar 0000-0001-6755-9018
Publication Date	June 10, 2021
Submission Date	August 13, 2020
Published in Issue	Year 2021 Volume: 8 Issue: 2

Cite

APA	Soysal, S., & Yılmaz Koğar, E. (2021). An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods. International Journal of Assessment Tools in Education, 8(2), 239-256. https://doi.org/10.21449/ijate.779963

Cited By

Purification procedures used for the detection of gender DIF: Item bias in a foreign language test

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.1250358

Article Files

Full Text

23823 23825 23824