The Impact of Item Position on Item Parameters: A Multi-Method Approach

Kübra Atalay Kabasakal; Nuri Doğan

doi:10.21031/epod.1661135

Research Article

Year 2025, Volume: 16 Issue: 2, 71 - 87, 30.06.2025

Kübra Atalay Kabasakal , Nuri Doğan

https://doi.org/10.21031/epod.1661135

Abstract

References

Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. https://doi.org/10.1111/jedm.12026
Bolt, D. M., Deng, S., & Lee, S. (2014). IRT model misspecification and measurement of growth in vertical scaling. Journal of Educational Measurement, 51(2), 141–162. https://doi.org/10.1111/jedm.12039
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.
Bulut, O., Lei, M., & Guo, Q. (2016). Item and testlet position effects in computer-based alternate assessments for students with disabilities. International Journal of Research & Method in Education, 41(2), 169–183. https://doi.org/10.1080/1743727x.2016.1262341
Bulut, O., Quo, Q. & Gierl, M.J. A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assess Educ 5, 8 (2017). https://doi.org/10.1186/s40536-017-0042-x
Camara, W. J., & Echternacht, G. (2000). The SAT I and high school grades: Utility in predicting success in college (Research Notes RN-10). The College Board, Office of Research and Development. http://research.collegeboard.org/sites/default/files/publications/2012
Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE revised general test (Research Report 11–26). Princeton, NJ: Educational Testing Service.
Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. https://doi.org/10.1111/jedm.12009
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York, NY: Springer.
DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). https://doi.org/10.5206/cjsotl-rcacea.2011.2.4
Eignor, D. R., & Stocking, M. (1986). An investigation of possible causes for the inadequacy of IRT pre-equating (ETS RR-86-14). Princeton, NJ: Educational Testing Service
Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests in education: A comprehensive review. Review of Educational Research, 87(6), 1082–1116. https://doi.org/10.3102/0034654317726529
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50(3), 379–390.
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147–154.
Kolen, M., & Harris, D. (1990). Comparison of item pre-equating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement, 27(1), 27–29.
Kubinger, K. D. (2008). On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychology Science, 50(3), 311–327. http://www.psychologie-aktuell.com/fileadmin/download/PschologyScience/3-2008/01_Kubinger.pdf
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 136–153. https://doi.org/10.1080/10705510701758406
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55(3), 387. https://doi.org/10.2307/1170392
Meyers, J. L., Miller, G. E., & Way, W. D. (2009). Item position and item difficulty change in an IRT-Based common item equating design. Applied Measurement in Education, 22(1), 38–60. https://doi.org/10.1080/08957340802558342
Meulders, M., & Xie, Y. (2004). Person by item predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 213–240). New York, NY: Springer.
Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11(1), 3. https://doi.org/10.2307/1164846
Moses, T., Yang, W., & Wilson, C. (2007). Using kernel equating to assess item order effects on test scores. Journal of Educational Measurement, 44(2), 157–178. https://doi.org/10.1111/j.1745-3984.2007.00032.x
Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38(7), 518–534. https://doi.org/10.1177/0146621614534312
PISA. (2000). PISA 2000 Technical Report—Magnitude of Booklet Effects. Paris, France: OECD Programme for International Student Assessment (PISA).
Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285.
Stanke, L., & Bulut, O. (2019). Explanatory item response models for polytomous item responses. International Journal of Assessment Tools in Education, 6(2), 259–278. https://doi.org/10.21449/ijate.515085
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511618765
Way, W. D., Carey, P., & Golub-Smith, M. (1992). An exploratory study of characteristics related to IRT item parameter invariance with the test of english as a foreign language. (TOEFL Tech. Rep. No. 6.) Princeton, NJ: Educational Testing Service.
Whitely, E., & Dawis, R. (1976). The influence of test context on item difficulty. Educational and Psychological Measurement, 36, 329–337.
Willse, J. T. (2018). CTT: Classical test theory functions. R package version 2.3.3. https://CRAN.R-project.org/package=CTT
Wise, L., Chia, W., & Park, R. (1989a). Item position effects for test of work knowledge and arithmetic reasoning. Paper presentation at the annual meeting of the AERA, San Francisco.
Wise, S. L., Kingsbury, G. G., Thomason, J., & Kong, X. (1989b). An investigation of item difficulty order effects in computer adaptive testing. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.
Wollack, J. A., & Fremer, J. J. (Eds.). (2013). Handbook of test security. New York: Routledge.
Zwick, R. (1991). Effects of item order and context on estimation of NAEP reading proficiency. Educational Measurement: Issues and Practice, 10, 10–16.

The Impact of Item Position on Item Parameters: A Multi-Method Approach

Year 2025, Volume: 16 Issue: 2, 71 - 87, 30.06.2025

Kübra Atalay Kabasakal , Nuri Doğan

https://doi.org/10.21031/epod.1661135

Abstract

Multiple-choice tests are widely favored in large scale assessments and classroom evaluations due to their practicality and efficiency. Their ease of administration and scoring makes multiple-choice tests highly practical, but they also come with disadvantages, such as the potential for cheating and the possibility of guessing the correct answer. To reduce the risk of cheating, it is common practice to administer multiple versions of a test with items arranged in different orders. However, such variations can affect student performance across different test forms. For instance, students who begin with the most difficult items may feel discouraged at the start of the test. Additionally, the psychometric properties of test items may vary across different test forms where the same items are arranged in different orders. This study aims to examine the impact of this phenomenon on the estimation of test item psychometric properties. The study sample consists of 8TH grade students in Türkiye who took a national exam for high school admission. Data from three different subtests in four different booklets of this exam were analyzed. Item parameters were estimated using both classical test and item response theory and were compared across different test forms using the same items. Furthermore, the varying item positions were incorporated into a explanatory item response theory and structural equation model as a explanatory variable. The findings indicate that the psychometric properties of test items can differ significantly depending on their order within the test, highlighting the importance of considering item order effects in test design and interpretation.

Keywords

item order , position effects , explanatory item response model

Ethical Statement

The data is a pre-existing dataset, and permission for its use has been obtained. The necessary approval was granted by the Ministry of National Education in 2016 as part of a master's thesis supervised by one of the authors.

References

Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. https://doi.org/10.1111/jedm.12026
Bolt, D. M., Deng, S., & Lee, S. (2014). IRT model misspecification and measurement of growth in vertical scaling. Journal of Educational Measurement, 51(2), 141–162. https://doi.org/10.1111/jedm.12039
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.
Bulut, O., Lei, M., & Guo, Q. (2016). Item and testlet position effects in computer-based alternate assessments for students with disabilities. International Journal of Research & Method in Education, 41(2), 169–183. https://doi.org/10.1080/1743727x.2016.1262341
Bulut, O., Quo, Q. & Gierl, M.J. A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assess Educ 5, 8 (2017). https://doi.org/10.1186/s40536-017-0042-x
Camara, W. J., & Echternacht, G. (2000). The SAT I and high school grades: Utility in predicting success in college (Research Notes RN-10). The College Board, Office of Research and Development. http://research.collegeboard.org/sites/default/files/publications/2012
Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE revised general test (Research Report 11–26). Princeton, NJ: Educational Testing Service.
Debeer, D., & Janssen, R. (2013). Modeling item‐position effects within an IRT framework. Journal of Educational Measurement, 50(2), 164–185. https://doi.org/10.1111/jedm.12009
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York, NY: Springer.
DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2). https://doi.org/10.5206/cjsotl-rcacea.2011.2.4
Eignor, D. R., & Stocking, M. (1986). An investigation of possible causes for the inadequacy of IRT pre-equating (ETS RR-86-14). Princeton, NJ: Educational Testing Service
Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests in education: A comprehensive review. Review of Educational Research, 87(6), 1082–1116. https://doi.org/10.3102/0034654317726529
Hahne, J. (2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50(3), 379–390.
Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147–154.
Kolen, M., & Harris, D. (1990). Comparison of item pre-equating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement, 27(1), 27–29.
Kubinger, K. D. (2008). On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychology Science, 50(3), 311–327. http://www.psychologie-aktuell.com/fileadmin/download/PschologyScience/3-2008/01_Kubinger.pdf
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 136–153. https://doi.org/10.1080/10705510701758406
Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55(3), 387. https://doi.org/10.2307/1170392
Meyers, J. L., Miller, G. E., & Way, W. D. (2009). Item position and item difficulty change in an IRT-Based common item equating design. Applied Measurement in Education, 22(1), 38–60. https://doi.org/10.1080/08957340802558342
Meulders, M., & Xie, Y. (2004). Person by item predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 213–240). New York, NY: Springer.
Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11(1), 3. https://doi.org/10.2307/1164846
Moses, T., Yang, W., & Wilson, C. (2007). Using kernel equating to assess item order effects on test scores. Journal of Educational Measurement, 44(2), 157–178. https://doi.org/10.1111/j.1745-3984.2007.00032.x
Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38(7), 518–534. https://doi.org/10.1177/0146621614534312
PISA. (2000). PISA 2000 Technical Report—Magnitude of Booklet Effects. Paris, France: OECD Programme for International Student Assessment (PISA).
Rijmen, F., & De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285.
Stanke, L., & Bulut, O. (2019). Explanatory item response models for polytomous item responses. International Journal of Assessment Tools in Education, 6(2), 259–278. https://doi.org/10.21449/ijate.515085
van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. New York: Springer.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511618765
Way, W. D., Carey, P., & Golub-Smith, M. (1992). An exploratory study of characteristics related to IRT item parameter invariance with the test of english as a foreign language. (TOEFL Tech. Rep. No. 6.) Princeton, NJ: Educational Testing Service.
Whitely, E., & Dawis, R. (1976). The influence of test context on item difficulty. Educational and Psychological Measurement, 36, 329–337.
Willse, J. T. (2018). CTT: Classical test theory functions. R package version 2.3.3. https://CRAN.R-project.org/package=CTT
Wise, L., Chia, W., & Park, R. (1989a). Item position effects for test of work knowledge and arithmetic reasoning. Paper presentation at the annual meeting of the AERA, San Francisco.
Wise, S. L., Kingsbury, G. G., Thomason, J., & Kong, X. (1989b). An investigation of item difficulty order effects in computer adaptive testing. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.
Wollack, J. A., & Fremer, J. J. (Eds.). (2013). Handbook of test security. New York: Routledge.
Zwick, R. (1991). Effects of item order and context on estimation of NAEP reading proficiency. Educational Measurement: Issues and Practice, 10, 10–16.

There are 35 citations in total.

Details

Primary Language	English
Subjects	Measurement Equivalence
Journal Section	Articles
Authors	Kübra Atalay Kabasakal 0000-0002-3580-5568 Nuri Doğan 0000-0001-6274-2016
Publication Date	June 30, 2025
Submission Date	March 19, 2025
Acceptance Date	April 30, 2025
Published in Issue	Year 2025 Volume: 16 Issue: 2

Cite

APA	Atalay Kabasakal, K., & Doğan, N. (2025). The Impact of Item Position on Item Parameters: A Multi-Method Approach. Journal of Measurement and Evaluation in Education and Psychology, 16(2), 71-87. https://doi.org/10.21031/epod.1661135

Download Cover Image

Article Files

Full Text