Year 2021, Volume 8 , Issue 2, Pages 239 - 256 2021-06-10

An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods

Sümeyra SOYSAL [1] , Esin YILMAZ KOĞAR [2]


In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord’s chi-square and Raju’s unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that generally, the method of Lord’s chi-square identified more items with DIF than did the method of Raju’s unsigned area. The differentiation of the booklets with respect to item position resulted in a higher number of items displaying DIF with item purification conditions. Based on the findings of the present study, to avoid the occurrence of DIF due to item position effects, it is recommended to position the same items across different booklets in similar locations when forming different booklets.

Item position effects, Item response theory, Differential item function, Unsigned Raju’s Area, Lord’s chi-square
  • Akayleh, A. S. A. (2018). Precision of the estimations for some methods of the CTT and IRT as a base to display the differential item functions on the different item ordered test formats. https://bit.ly/3aJeFKx
  • Avcu, A., Tunç, E. B., & Uluman, M. (2018). How the order of the items in a booklet affects item functioning: Empirical findings from course level data?. European Journal of Education Studies, 4(3), 227-239. http://doi.org/10.5281/zenodo.1199695
  • Balta, E., & Omur Sunbul, S. (2017). An investigation of ordering test items differently depending on their difficulty level by differential item functioning. Eurasian Journal of Educational Research, 72, 23-42. https://doi.org/doi:10.14689/ejer.2017.72.2
  • Brown, T. A. (2006). Confirmatory factor analysis for applied research (2nd ed.). The Guilford Press.
  • Bulut, O. (2015). An empirical analysis of gender-based DIF due to test booklet effect. European Journal of Research on Education, 3(1), 7-16. https://bit.ly/3cKkhqf
  • Bulut, O., Quo, Q., & Gierl, M. J. (2017). A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assessments in Education, 5(1), 8. http://doi.org/10.1186/s40536-017-0042-x
  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Sage Publications.
  • Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12(3), 253-260. https://conservancy.umn.edu/bitstream/handle/11299/107645/v12n3p253.pdf?sequence=1
  • Clauser, B., Mazor, K., & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6(4), 269 279. https://doi.org/10.1207/s15324818ame0604_2
  • Choi, Y., Alexeev, N., & Cohen, A. (2014). DIF analysis using a mixture 3PL model with a covariate on the TIMSS 2007 mathematics test. In KAERA Research Forum, 1(1), 4-14. http://www.columbia.edu/~ld208/KAERA_2014.pdf#page=5
  • Çokluk, Ö., Gül, E., & Dogan-Gül, Ç. (2016). Examining differential item functions of different item ordered test forms according to item difficulty levels. Educational Sciences: Theory and Practice, 16(1), 319-330. http://dx.doi.org/10.12738/estp.2016.1.0329
  • Davis, J., & Ferdous, A. (2005). Using item difficulty and item position to measure test fatigue. American Institutes for Research. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.847&rep=rep1&type=pdf
  • Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164 185. https://ppw.kuleuven.be/okp/_pdf/DeBeer2013MIPEW.pdf
  • Doğan Gül, Ç., & Çokluk Bökeoğlu, Ö. (2018). The comparison of academic success of students with low and high anxiety levels in tests varying in item difficulty. Inonu University Journal of the Faculty of Education, 19(3), 252 265. https://doi.org/10.17679/inuefd.341477
  • Erdem, B. (2015). Ortaöğretime geçişte kullanılan ortak sınavların değişen madde fonksiyonu açısından kitapçık türlerine göre farklı yöntemlerle incelenmesi [Investigation of Common Exams Used in Transition to High Schools in Terms of Differential Item Functioning Regarding Booklet Types with Different Methods] [Unpublished master dissertation]. Hacettepe University. Ankara.
  • Freedle, R., & Kostin, I. (1991). The prediction of SAT reading comprehension item difficulty for expository prose passages (ETS Research Report, RR-91-29). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1991.tb01396.x
  • Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large‐scale assessments of student achievement: Theory and practice. Educational Measurement: Issues and Practice, 28(3), 39 53. https://doi.org/10.1111/j.1745 3992.2009.00154.x
  • Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50(3), 379–390. https://bit.ly/3aHHyGD
  • Hambleton, R. K. (1968). The effects of item order and anxiety on test performance and stress. Paper presented at the meeting of American Educational Research Association. https://files.eric.ed.gov/fulltext/ED017960.pdf
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
  • Hartig, J., & Buchholz, J. (2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modeling, 54(4), 418-431. https://core.ac.uk/download/pdf/25705605.pdf
  • Hecht, M., Weirich, S., Siegle, T., & Frey, A. (2015). Effects of design properties on parameter estimation in large scale assessments. Educational and Psychological Measurement, 75(6), 1021 1044. https://doi.org/10.1177/0013164415573311
  • Hohensinn, C., Kubinger, K. D., Reif, M., Holocher-Ertl, S., Khorramdel, L., & Frebort, M. (2008). Examining item-position effects in large-scale assessment using the linear logistic test model. Psychology Science Quarterly, 50, 391-402. https://bit.ly/39Sb9xY
  • Hohensinn, C., Kubinger, K. D., Reif, M., Schleicher, E., & Khorramdel, L. (2011). Analysing item position effects due to test booklet design within large-scale assessment. Educational Research and Evaluation, 17(6), 497 509. https://doi.org/10.1080/13803611.2011.632668
  • Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp.129-143). Erlbaum.
  • Holland, P. W., & Wainer, H. (1993). Differential item functioning. Lawrence Erlbaum Associates.
  • Hooper, D., Coughlan, J., & Mullen, M. (2008). Structural equation modeling: Guidelines for determining model fit. Electronic Journal of Business Research Methods 6(1), 53-60. http://arrow.dit.ie/cgi/viewcontent.cgi?article=1001&context=buschmanart
  • Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55. https://doi.org/10.1080/10705519909540118
  • Huck, S. W. (2012). Reading statistics and research (6th ed.). Pearson.
  • Kamata, A., & Vaughn, B. K. (2004). An introduction to differential item functioning analysis. Learning Disabilities: A Contemporary Journal, 2(2), 49-69. (EJ797693). ERIC. https://eric.ed.gov/?id=EJ797693
  • Karami, H. (2012). An introduction to differential item functioning. The International Journal of Educational and Psychological Assessment, 11(2), 59 76. https://psycnet.apa.org/record/2012-28410-004
  • Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8(2), 147-154. https://conservancy.umn.edu/bitstream/handle/11299/101880/1/v08n2p147.pdf
  • Kleinke, D. J. (1980). Item order, response location, and examinee sex and handedness on performance on multiple-choice tests. Journal of Educational Research, 73(4), 225–229. https://doi.org/10.1080/00220671.1980.10885240
  • Kline, R. B. (2005). Principles and practice of structural equation modeling. The Guilford Press.
  • Klosner, N. C., & Gellman, E. K. (1973). The effect of item arrangement on classroom test performance: Implications for content validity. Educational and Psychological Measurement, 33, 413-418. https://doi.org/10.1177/001316447303300224
  • Le, L. T. (2007, July). Effects of item positions on their difficulty and discrimination: A study in PISA Science data across test language and countries. Paper presented at the 72nd Annual Meeting of the Psychometric Society, Tokyo.
  • Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55(3), 387-413. https://doi.org/10.3102/00346543055003387
  • Li, F., Cohen, A., & Shen, L. (2012). Investigating the effect of item position in computer‐based tests. Journal of Educational Measurement, 49(4), 362 379. https://doi.org/10.1111/j.1745-3984.2012.00181.x
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.
  • Magis, D., Beland, S., & Raiche, G. (2015). Package ‘difR’ (Version: 5.0). [Computer software manual]. Retrieved May 14, 2018. Retrieved from https://cran.rproject.org/web/packages/difR/difR.pdf
  • Magis, D., & Facon, B. (2012). Item purification does not always improve DIF detection: A counterexample with Angoff’s delta plot. Educational and Psychological Measurement, 73(2), 293-311. https://doi.org/10.1177/0013164412451903
  • Martin, M. O., Mullis, I. V. S., & Chrostowski, S. J. (2004). Item analysis and review. In M. O. Martin, I. V. S. Mullis, & S. J. Chrostowski (Eds.), TIMSS 2003 technical report (pp. 224–251). TIMSS & PIRLS International Study Center, Boston College.
  • McNamara, T., & C. Roever (2006) Language testing: The social dimension. Blackwell.
  • Meyers, J. L., Miller, G. E., & Way, W. D. (2009). Item position and item difficulty change in an IRT based common item equating design. Applied Measurement in Education, 22(1), 38-60. https://doi.org/10.1080/08957340802558342
  • Ministry of National Education [MoNE], (2013). 2013-2014 Eğitim-öğretim yılı ortaöğretimi geçiş ortak sınavları e-klavuzu. Ankara.
  • Monahan, P. O., & Ankenmann, R. D. (2010). Alternative matching scores to control type I error of the Mantel–Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models. Applied Psychological Measurement, 34(3), 193-210. https://doi.org/10.1177/0146621609359283
  • Muthén, L. K., & Muthén, B. O. (2010). Mplus: Statistical analysis with latent variables user’s guide 6.0. Muthén & Muthén.
  • Newman, D. L., Kundert, D. K., Lane Jr, D. S., & Bull, K. S. (1988). Effect of varying item order on multiple-choice test scores: Importance of statistical and cognitive difficulty. Applied Measurement in Education, 1(1), 89 97. https://doi.org/10.1207/s15324818ame0101_8
  • Ollennu, S. N. N., & Etsey, Y. K. A. (2015). The impact of item position in multiple-choice test on student performance at the basic education certificate examination (BECE) level. Universal Journal of Educational Research, 3(10), 718 723. https://doi.org/10.13189/ujer.2015.031009
  • Özdemir, B. (2015). A comparison of IRT-based methods for examining differential item functioning in TIMSS 2011 mathematics subtest. Procedia-Social and Behavioral Sciences, 174, 2075-2083. https://doi.org/10.1016/j.sbspro.2015.02.004
  • Penfield, R. D., & Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement Issues and Practice, 19(3), 5–15. https://doi.org/10.1111/j.1745-3992.2000.tb00033.x
  • Perlini, A. H., Lind, D. L., & Zumbo, B. D. (1998). Context effects on examinations: The effects of time, item order and item difficulty. Canadian Psychology/Psychologie Canadienne, 39(4), 299-307. https://doi.org/10.1037/h0086821
  • Plake, B. S., Patience, W. M., & Whitney, D. R. (1988). Differential item performance in mathematics achievement test items: Effect of item arrangement. Educational and Psychological Measurement, 48(4), 885 894. https://doi.org/10.1177/0013164488484003
  • Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38(7), 518 534. https://doi.org/10.1177/0146621614534312
  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://link.springer.com/article/10.1007/BF02294403
  • Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207.https://conservancy.umn.edu/bitstream/handle/11299/113559/v14n2p197.pdf?sequence=1
  • Rose, N., Nagy, G., Nagengast, B., Frey, A., & Becker, M. (2019). Modeling multiple item context effects with generalized linear mixed models. Frontiers in Psychology, 10,248. https://doi.org/10.3389/fpsyg.2019.00248
  • Rosseel, Y., Jorgensen, T., Oberski, D., Byrnes, J., Vanbrabant, L., Savalei, V., Merkle, E., Hallquist, M., Rhemtulla, M., Katsikatsou, M., Barendse, M., & Scharf, F. (2019). Package ‘lavaan’ (Version: 0.6 5) [Computer software manual]. https://cran.r project.org/web/packages/lavaan/lavaan.pdf
  • Rudner, L. M., Getson, P. R., & Knight, D. L. (1980). Biased item detection techniques. Journal of Educational Statistics, 5, 213-233. https://doi.org/10.2307/1164965
  • Ryan, K. E., & Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14(1), 73 90. https://doi.org/10.1207/S15324818AME1401_06
  • Salvucci, S., Walter, E., Conley, V., Fink, S., & Saba, M. (1997). Measurement error studies at the National Center for Education Statistics (NCES). U.S. Department of Education.
  • Schmitt, A. P., & Crone, C. R. (1991). Alternative mathematical aptitude item types: DIF issues. ETS Research Report Series, 1991(2), i-22. https://doi.org/10.1002/j.2333-8504.1991.tb01409.x
  • Sümer, N. (2000). Yapısal eşitlik modelleri: Temel kavramlar ve örnek uygulamalar [Structural Equation Modeling: Basic Concepts and Applications]. Türk Psikoloji Yazıları, 3(6), 49-73. https://psycnet.apa.org/record/2006-04302-005
  • Tal, I. R., Akers, K. G. & Hodge, K. G. (2008). Effect of Paper color and question order on exam performance. Teaching of Psychology, 35(1), 26 28. https://doi.org/10.1080/00986280701818482
  • The West African Examinations Council [WAEC] (1993). The effects of item position on performance in multiple choice tests. Research Report, Research Division, WAEC, Lagos.
  • Tippets, E., & Benson, J. (1989). The effect of item arrangement on test anxiety. Applied Measurement in Education, 2(4), 289 296. https://doi.org/10.1207/s15324818ame0204_2
  • Trendtel, M., & Robitzsch, A. (2018). Modeling item position effects with a Bayesian item response model applied to PISA 2009–2015 data. Psychological Test and Assessment Modeling, 60(2), 241-263. https://bit.ly/3cQWkh5
  • Uysal, I., Ertuna, L., Ertaş, F. G., & Kelecioğlu, H. (2019). Performances based on ability estimation of the methods of detecting differential item functioning: A simulation study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312
  • Verguts, T., & De Boeck, P. (2000). A Rasch model for detecting learning while solving an intelligence test. Applied Psychological Measurement, 24, 151 162. https://doi.org/10.1177/01466210022031589
  • Weirich, S., Hecht, M., & Böhme, K. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38, 535 548. https://doi.org/10.1177/0146621614534955
  • Weirich, S., Hecht, M., Penk, C., Roppelt, A., & Böhme, K. (2017). Item position effects are moderated by changes in test-taking effort. Applied Psychological Measurement, 41(2), 115-129. https://doi.org/10.1177/0146621616676791
  • Wu, Q., Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2019). Predictors of individual performance changes related to item positions in PISA assessments. Large-scale Assessments in Education, 7(5), 1-21. https://doi.org/10.1186/s40536-019-0073-6
  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential ıtem functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) ıtem scores. Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zwick, R., Thayer, D. T., & Wingersky, M. (1995). Effect of Rasch calibration on ability and DIF estimation in computer‐adaptive tests. Journal of Educational Measurement, 32(4), 341-363. https://www.jstor.org/stable/1435217
Primary Language en
Subjects Education, Scientific Disciplines
Published Date June
Journal Section Articles
Authors

Orcid: 0000-0002-7304-1722
Author: Sümeyra SOYSAL (Primary Author)
Institution: Necmettin Erbakan University
Country: Turkey


Orcid: 0000-0001-6755-9018
Author: Esin YILMAZ KOĞAR
Institution: NIGDE OMER HALISDEMIR UNIVERSITY
Country: Turkey


Dates

Publication Date : June 10, 2021

APA Soysal, S , Yılmaz Koğar, E . (2021). An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods . International Journal of Assessment Tools in Education , 8 (2) , 239-256 . DOI: 10.21449/ijate.779963