Year 2025,
Volume: 16 Issue: 2, 71 - 87, 30.06.2025
Kübra Atalay Kabasakal
,
Nuri Doğan
References
-
Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. https://doi.org/10.1111/jedm.12026
-
Bolt, D. M., Deng, S., & Lee, S. (2014). IRT model misspecification and measurement of growth in vertical scaling. Journal of Educational Measurement, 51(2), 141–162. https://doi.org/10.1111/jedm.12039
-
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.
-
Bulut, O., Lei, M., & Guo, Q. (2016). Item and testlet position effects in computer-based alternate assessments for students with disabilities. International Journal of Research & Method in Education, 41(2), 169–183. https://doi.org/10.1080/1743727x.2016.1262341
-
Bulut, O., Quo, Q. & Gierl, M.J. A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assess Educ 5, 8 (2017). https://doi.org/10.1186/s40536-017-0042-x
-
Camara, W. J., & Echternacht, G. (2000). The SAT I and high school grades: Utility in predicting success in college (Research Notes RN-10). The College Board, Office of Research and Development. http://research.collegeboard.org/sites/default/files/publications/2012
-
Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE revised general test (Research Report 11–26). Princeton, NJ: Educational Testing Service.
-
Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. https://doi.org/10.1111/jedm.12009
-
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York, NY: Springer.
-
DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). https://doi.org/10.5206/cjsotl-rcacea.2011.2.4
-
Eignor, D. R., & Stocking, M. (1986). An investigation of possible causes for the inadequacy of IRT pre-equating (ETS RR-86-14). Princeton, NJ: Educational Testing Service
-
Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests in education: A comprehensive review. Review of Educational Research, 87(6), 1082–1116. https://doi.org/10.3102/0034654317726529
-
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50(3), 379–390.
-
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147–154.
-
Kolen, M., & Harris, D. (1990). Comparison of item pre-equating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement, 27(1), 27–29.
-
Kubinger, K. D. (2008). On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychology Science, 50(3), 311–327. http://www.psychologie-aktuell.com/fileadmin/download/PschologyScience/3-2008/01_Kubinger.pdf
-
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 136–153. https://doi.org/10.1080/10705510701758406
-
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55(3), 387. https://doi.org/10.2307/1170392
-
Meyers, J. L., Miller, G. E., & Way, W. D. (2009). Item position and item difficulty change in an IRT-Based common item equating design. Applied Measurement in Education, 22(1), 38–60. https://doi.org/10.1080/08957340802558342
-
Meulders, M., & Xie, Y. (2004). Person by item predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 213–240). New York, NY: Springer.
-
Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11(1), 3. https://doi.org/10.2307/1164846
-
Moses, T., Yang, W., & Wilson, C. (2007). Using kernel equating to assess item order effects on test scores. Journal of Educational Measurement, 44(2), 157–178. https://doi.org/10.1111/j.1745-3984.2007.00032.x
-
Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38(7), 518–534. https://doi.org/10.1177/0146621614534312
-
PISA. (2000). PISA 2000 Technical Report—Magnitude of Booklet Effects. Paris, France: OECD Programme for International Student Assessment (PISA).
-
Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285.
-
Stanke, L., & Bulut, O. (2019). Explanatory item response models for polytomous item responses. International Journal of Assessment Tools in Education, 6(2), 259–278. https://doi.org/10.21449/ijate.515085
-
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer.
-
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511618765
-
Way, W. D., Carey, P., & Golub-Smith, M. (1992). An exploratory study of characteristics related to IRT item parameter invariance with the test of english as a foreign language. (TOEFL Tech. Rep. No. 6.) Princeton, NJ: Educational Testing Service.
-
Whitely, E., & Dawis, R. (1976). The influence of test context on item difficulty. Educational and Psychological Measurement, 36, 329–337.
-
Willse, J. T. (2018). CTT: Classical test theory functions. R package version 2.3.3. https://CRAN.R-project.org/package=CTT
-
Wise, L., Chia, W., & Park, R. (1989a). Item position effects for test of work knowledge and arithmetic reasoning. Paper presentation at the annual meeting of the AERA, San Francisco.
-
Wise, S. L., Kingsbury, G. G., Thomason, J., & Kong, X. (1989b). An investigation of item difficulty order effects in computer adaptive testing. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.
-
Wollack, J. A., & Fremer, J. J. (Eds.). (2013). Handbook of test security. New York: Routledge.
-
Zwick, R. (1991). Effects of item order and context on estimation of NAEP reading proficiency. Educational Measurement: Issues and Practice, 10, 10–16.
The Impact of Item Position on Item Parameters: A Multi-Method Approach
Year 2025,
Volume: 16 Issue: 2, 71 - 87, 30.06.2025
Kübra Atalay Kabasakal
,
Nuri Doğan
Abstract
Multiple-choice tests are widely favored in large scale assessments and classroom evaluations due to their practicality and efficiency. Their ease of administration and scoring makes multiple-choice tests highly practical, but they also come with disadvantages, such as the potential for cheating and the possibility of guessing the correct answer. To reduce the risk of cheating, it is common practice to administer multiple versions of a test with items arranged in different orders. However, such variations can affect student performance across different test forms. For instance, students who begin with the most difficult items may feel discouraged at the start of the test. Additionally, the psychometric properties of test items may vary across different test forms where the same items are arranged in different orders. This study aims to examine the impact of this phenomenon on the estimation of test item psychometric properties. The study sample consists of 8TH grade students in Türkiye who took a national exam for high school admission. Data from three different subtests in four different booklets of this exam were analyzed. Item parameters were estimated using both classical test and item response theory and were compared across different test forms using the same items. Furthermore, the varying item positions were incorporated into a explanatory item response theory and structural equation model as a explanatory variable. The findings indicate that the psychometric properties of test items can differ significantly depending on their order within the test, highlighting the importance of considering item order effects in test design and interpretation.
Ethical Statement
The data is a pre-existing dataset, and permission for its use has been obtained. The necessary approval was granted by the Ministry of National Education in 2016 as part of a master's thesis supervised by one of the authors.
References
-
Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. https://doi.org/10.1111/jedm.12026
-
Bolt, D. M., Deng, S., & Lee, S. (2014). IRT model misspecification and measurement of growth in vertical scaling. Journal of Educational Measurement, 51(2), 141–162. https://doi.org/10.1111/jedm.12039
-
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.
-
Bulut, O., Lei, M., & Guo, Q. (2016). Item and testlet position effects in computer-based alternate assessments for students with disabilities. International Journal of Research & Method in Education, 41(2), 169–183. https://doi.org/10.1080/1743727x.2016.1262341
-
Bulut, O., Quo, Q. & Gierl, M.J. A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assess Educ 5, 8 (2017). https://doi.org/10.1186/s40536-017-0042-x
-
Camara, W. J., & Echternacht, G. (2000). The SAT I and high school grades: Utility in predicting success in college (Research Notes RN-10). The College Board, Office of Research and Development. http://research.collegeboard.org/sites/default/files/publications/2012
-
Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE revised general test (Research Report 11–26). Princeton, NJ: Educational Testing Service.
-
Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. https://doi.org/10.1111/jedm.12009
-
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York, NY: Springer.
-
DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). https://doi.org/10.5206/cjsotl-rcacea.2011.2.4
-
Eignor, D. R., & Stocking, M. (1986). An investigation of possible causes for the inadequacy of IRT pre-equating (ETS RR-86-14). Princeton, NJ: Educational Testing Service
-
Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests in education: A comprehensive review. Review of Educational Research, 87(6), 1082–1116. https://doi.org/10.3102/0034654317726529
-
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50(3), 379–390.
-
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147–154.
-
Kolen, M., & Harris, D. (1990). Comparison of item pre-equating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement, 27(1), 27–29.
-
Kubinger, K. D. (2008). On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychology Science, 50(3), 311–327. http://www.psychologie-aktuell.com/fileadmin/download/PschologyScience/3-2008/01_Kubinger.pdf
-
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 136–153. https://doi.org/10.1080/10705510701758406
-
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55(3), 387. https://doi.org/10.2307/1170392
-
Meyers, J. L., Miller, G. E., & Way, W. D. (2009). Item position and item difficulty change in an IRT-Based common item equating design. Applied Measurement in Education, 22(1), 38–60. https://doi.org/10.1080/08957340802558342
-
Meulders, M., & Xie, Y. (2004). Person by item predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 213–240). New York, NY: Springer.
-
Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11(1), 3. https://doi.org/10.2307/1164846
-
Moses, T., Yang, W., & Wilson, C. (2007). Using kernel equating to assess item order effects on test scores. Journal of Educational Measurement, 44(2), 157–178. https://doi.org/10.1111/j.1745-3984.2007.00032.x
-
Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38(7), 518–534. https://doi.org/10.1177/0146621614534312
-
PISA. (2000). PISA 2000 Technical Report—Magnitude of Booklet Effects. Paris, France: OECD Programme for International Student Assessment (PISA).
-
Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285.
-
Stanke, L., & Bulut, O. (2019). Explanatory item response models for polytomous item responses. International Journal of Assessment Tools in Education, 6(2), 259–278. https://doi.org/10.21449/ijate.515085
-
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer.
-
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511618765
-
Way, W. D., Carey, P., & Golub-Smith, M. (1992). An exploratory study of characteristics related to IRT item parameter invariance with the test of english as a foreign language. (TOEFL Tech. Rep. No. 6.) Princeton, NJ: Educational Testing Service.
-
Whitely, E., & Dawis, R. (1976). The influence of test context on item difficulty. Educational and Psychological Measurement, 36, 329–337.
-
Willse, J. T. (2018). CTT: Classical test theory functions. R package version 2.3.3. https://CRAN.R-project.org/package=CTT
-
Wise, L., Chia, W., & Park, R. (1989a). Item position effects for test of work knowledge and arithmetic reasoning. Paper presentation at the annual meeting of the AERA, San Francisco.
-
Wise, S. L., Kingsbury, G. G., Thomason, J., & Kong, X. (1989b). An investigation of item difficulty order effects in computer adaptive testing. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.
-
Wollack, J. A., & Fremer, J. J. (Eds.). (2013). Handbook of test security. New York: Routledge.
-
Zwick, R. (1991). Effects of item order and context on estimation of NAEP reading proficiency. Educational Measurement: Issues and Practice, 10, 10–16.